Skip to content
gentic.news — AI News Intelligence Platform

Technique · alignment

Direct Preference Optimization (DPO)

Aligning LMs to preference data by directly optimizing a closed-form likelihood ratio, eliminating the reward model and RL loop of RLHF.

Origin: Stanford, 2023-05Read origin paper →Also known as: DPO
0
Products deploying
Avg research → prod
First commercial deploy

Deployment timeline

No verified deployments yet in our tracked product set.