Skip to content
gentic.news — AI News Intelligence Platform

Technique · alignment

Deep RL from Human Preferences

Learning reward functions from pairwise human comparisons rather than hand-coded rewards. The direct precursor to RLHF.

Origin: OpenAI, 2017-06Read origin paper →Also known as: Preference Learning, Pref-RL
0
Products deploying
Avg research → prod
First commercial deploy

Deployment timeline

No verified deployments yet in our tracked product set.