Post-Training
Post-training refers to the suite of techniques applied to a large language model after its initial pre-training phase, transforming a raw next-token predictor into a capable, aligned assistant. Core methods include Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), and newer RL-based approaches such as GRPO (Group Relative Policy Optimization). The goal is to shape model behavior—improving instruction-following, safety, helpfulness, and reasoning—without retraining from scratch.
Post-training is the step that turns a capable base model into a deployable product, making it the most direct lever AI labs and companies have over model behavior, safety, and usefulness. Every major frontier model—GPT, Claude, Gemini, Llama, DeepSeek—ships a post-training pipeline, and roles focused on alignment, RLHF engineering, and fine-tuning infrastructure are among the most in-demand ML positions in 2026. Understanding post-training is increasingly a prerequisite for working on model development, red-teaming, or evaluation at AI companies.
🎓 Courses
Post-Training of LLMs
by DeepLearning.AI team
The most focused course specifically on post-training: covers SFT, DPO, PPO, and GRPO with hands-on labs using Hugging Face models. Directly addresses when and why to use each technique.
LLM Course (2025 Edition) — Chapters 10–12: Fine-Tuning, Datasets, Reasoning Models
by Hugging Face team
Free, regularly updated, and covers fine-tuning LLMs and building reasoning models (including DeepSeek R1 style). Uses the full Hugging Face ecosystem (TRL, Transformers, Datasets).
LLM Course by mlabonne (2025 Edition)
by Maxime Labonne
41k+ star community resource covering the full LLM Scientist roadmap including post-training, RLHF, DPO, quantization, and evaluation. Updated with 2025 trends like test-time compute scaling.
Fine-Tuning LLMs in 2025 (Notebook)
by Philipp Schmid (Hugging Face)
Practical, runnable notebook from a Hugging Face engineer showing current best practices for SFT and alignment fine-tuning with TRL.
📖 Books
The RLHF Book: Reinforcement Learning from Human Feedback, Alignment, and Post-Training LLMs
Nathan Lambert · 2025
The most comprehensive and directly focused book on LLM post-training. Covers SFT, reward modeling, PPO, DPO, RLVR, and the full alignment pipeline. Free to read online at rlhfbook.com.
Build a Large Language Model (From Scratch)
Sebastian Raschka · 2024
Hands-on book that walks through pre-training and fine-tuning an LLM from the ground up using PyTorch, giving the mechanical understanding needed before tackling advanced post-training.
🛠️ Tutorials & Guides
LLM Fine-Tuning Methods: A Complete Guide to Post-Training Optimization Techniques
Practitioner-oriented guide covering SFT, RLHF, DPO, and GRPO with implementation notes on HuggingFace TRL and the VeRL toolkit. Explains trade-offs between methods clearly.
Direct Preference Optimization (DPO) — Deep (Learning) Focus
Well-regarded deep-dive into DPO, one of the most widely adopted post-training alignment methods. Explains the math, intuition, and practical differences from PPO-based RLHF.
Introduction — RLHF and Post-Training Book by Nathan Lambert
Free online chapter from the RLHF Book providing a concise, well-structured introduction to why post-training exists and what problems it solves — ideal first read before the full book.
Learning resources last updated: June 18, 2026