Subgraph Atlas · centered on entity
Reinforcement Learning from Human Feedback (RLHF)
technique4 mentions· velocity: stableA three-stage recipe (SFT → reward model from human comparisons → PPO) that aligns LM outputs with human preferences. InstructGPT is the canonical reference.
Two-hop subgraph: this entity, every entity it directly relates to, and every entity those neighbors relate to. Drag a node, scroll to zoom, click to inspect — or click any neighbor and re-center the atlas there.
0 nodes · 0 edges · loading…
companypersonai_modelproductresearch_labbenchmarkframework
drag to move · scroll to zoom · click a node
Top connections
OpenAIcompany
510 mentions
→ Center atlas here
large language modelstechnology
221 mentions
→ Center atlas here
GPT-5.3ai model
39 mentions
→ Center atlas here
GPT-5.2 Proai model
13 mentions
→ Center atlas here
DeepSeek-R1ai model
8 mentions
→ Center atlas here
Constitutional AItechnique
3 mentions
→ Center atlas here
AI Developmentresearch topic
2 mentions
→ Center atlas here
Training language models to follow instructions with human feedbackpaper
0 mentions
→ Center atlas here