Post-Training
- Definition
- The phase of model development that happens after initial pre-training — including supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and safety tuning. Post-training is what transforms a raw language model into a useful, aligned assistant.
- Why it matters
- Pre-training gets the headlines, but post-training is where models actually become products. The quality gap between frontier models often comes down to post-training recipes, not pre-training scale — a model with superior RLHF data and alignment tuning will outperform a larger model with mediocre post-training on real user tasks. This is also the most secretive and competitive phase of model development: labs guard their post-training pipelines more closely than their architectures. For enterprises evaluating models, post-training quality is what determines whether a model follows instructions reliably, handles edge cases gracefully, and refuses harmful requests appropriately. Ask your vendor about their post-training methodology — if they cannot answer, their model is a black box.
- In practice
- OpenAI's post-training pipeline involves tens of thousands of human raters producing preference data for RLHF, with multiple rounds of SFT and reward modeling. Anthropic's Constitutional AI approach uses AI-generated feedback (RLAIF) alongside human preferences, reducing cost while maintaining alignment quality. Meta open-sourced Llama's post-training recipes, revealing a multi-stage pipeline of SFT, rejection sampling, and DPO that other labs have built upon. DeepSeek's R1 demonstrated that innovative reinforcement learning during post-training could produce frontier reasoning capabilities at a fraction of the typical cost. The post-training phase now accounts for an estimated 20-40% of total model development cost at major labs.
We cover data & training every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Reinforcement Learning from Human Feedback (RLHF)
A training technique where human raters rank model outputs, and the model learns to prefer higher-ranked responses. RLHF is what makes AI assistants helpful, harmless, and conversational rather than just autocomplete.
SFT (Supervised Fine-Tuning)
The process of training a pre-trained model on a curated dataset of input-output examples that demonstrate the desired behavior. SFT is typically the first alignment step after pre-training, teaching the model to follow instructions and produce useful responses.
DPO (Direct Preference Optimization)
A training technique that aligns language models with human preferences by directly optimizing on preference data, without needing a separate reward model. DPO simplifies the RLHF pipeline while achieving comparable alignment quality.
Fine-tuning
The process of continuing to train a pre-trained model on a smaller, task-specific dataset. Fine-tuning customizes model behavior for specific domains or formats and is a key part of most enterprise AI deployments.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.