Subgraph Atlas · centered on entity

Direct Preference Optimization (DPO)

technique0 mentions· velocity: stable

Aligning LMs to preference data by directly optimizing a closed-form likelihood ratio, eliminating the reward model and RL loop of RLHF.

Two-hop subgraph: this entity, every entity it directly relates to, and every entity those neighbors relate to. Drag a node, scroll to zoom, click to inspect — or click any neighbor and re-center the atlas there.

0 nodes · 0 edges · loading…

companypersonai_modelproductresearch_labbenchmarkframework

drag to move · scroll to zoom · click a node

Top connections

Reinforcement Learning from Human Feedback (RLHF)technique

4 mentions

→ Center atlas here

Direct Preference Optimization: Your Language Model is Secretly a Reward Modelpaper

Identity Preference Optimization (IPO)technique

0 mentions

→ Center atlas here

KTO (Kahneman-Tversky Optimization)technique

0 mentions

→ Center atlas here

Self-Rewarding Language Modelstechnique

0 mentions

→ Center atlas here

How to read this: the white-ringed node is Direct Preference Optimization (DPO). Surrounding nodes are direct relationships; the second ring is what those neighbors connect to. Edge thickness scales with source-article evidence. Click any node and choose Center graph here to walk the graph.