Subgraph Atlas · centered on entity
Direct Preference Optimization (DPO)
technique0 mentions· velocity: stableAligning LMs to preference data by directly optimizing a closed-form likelihood ratio, eliminating the reward model and RL loop of RLHF.
Two-hop subgraph: this entity, every entity it directly relates to, and every entity those neighbors relate to. Drag a node, scroll to zoom, click to inspect — or click any neighbor and re-center the atlas there.
0 nodes · 0 edges · loading…
companypersonai_modelproductresearch_labbenchmarkframework
drag to move · scroll to zoom · click a node
Top connections
Reinforcement Learning from Human Feedback (RLHF)technique
4 mentions
→ Center atlas here
Direct Preference Optimization: Your Language Model is Secretly a Reward Modelpaper
0 mentions
→ Center atlas here
Stanfordcompany
0 mentions
→ Center atlas here
Identity Preference Optimization (IPO)technique
0 mentions
→ Center atlas here
KTO (Kahneman-Tversky Optimization)technique
0 mentions
→ Center atlas here
Self-Rewarding Language Modelstechnique
0 mentions
→ Center atlas here