Skip to content
gentic.news — AI News Intelligence Platform

Technique · reasoning

Process Reward Models

Reward models trained to score each intermediate reasoning step rather than only the final answer, enabling superior reasoning policy learning.

Origin: OpenAI, 2023-05Read origin paper →Also known as: PRM, Let's Verify Step by Step
1
Products deploying
3y
Avg research → prod
3y
First commercial deploy

Deployment timeline

  1. DeepSeek-R1

    Deployed 2026-03-17 · Velocity 3y

    Uses step-level reward models to evaluate intermediate reasoning steps.

    high

Techniques built on this

Process Reward Models — Origin, Deployments, and Velocity | gentic.news