Technique · alignment
Reinforcement Learning from Human Feedback (RLHF)
A three-stage recipe (SFT → reward model from human comparisons → PPO) that aligns LM outputs with human preferences. InstructGPT is the canonical reference.
3
Products deploying
4y
Avg research → prod
4y
First commercial deploy
Deployment timeline
- GPT-5.2 Prohigh
Deployed 2026-02-17 · Velocity 4y
“OpenAI's alignment approach for flagship models is built on RLHF, as documented for GPT-4 and previous models.”
- GPT-5.3medium
Deployed 2026-02-26 · Velocity 4y
“OpenAI pioneered RLHF with InstructGPT; GPT-5.3 continues this alignment approach.”
- DeepSeek-R1high
Deployed 2026-03-17 · Velocity 4y
“Trained via reinforcement learning from human feedback to align with preferences.”