Technique · reasoning
Process Reward Models
Reward models trained to score each intermediate reasoning step rather than only the final answer, enabling superior reasoning policy learning.
1
Products deploying
3y
Avg research → prod
3y
First commercial deploy
Deployment timeline
- DeepSeek-R1high
Deployed 2026-03-17 · Velocity 3y
“Uses step-level reward models to evaluate intermediate reasoning steps.”