Reinforcement Learning from Human Feedback (RLHF)

technique→ stable

Reinforcement Learning from Human Feedback

A three-stage recipe (SFT → reward model from human comparisons → PPO) that aligns LM outputs with human preferences. InstructGPT is the canonical reference.

4Total Mentions

-0.07Sentiment (Neutral)

0.0%Velocity (7d)

View subgraph

First seen: Feb 26, 2026Last active: Apr 14, 2026Wikipedia

Signal Radar

Five-axis snapshot of this entity's footprint

live

Loading radar…

Mentions × Lab Attention

Weekly mentions (solid) and average article relevance (dotted)

mentionsrelevance

Loading timeline…

Timeline

No timeline events recorded yet.

Relationships

Invented By

→
OpenAI
company1 mention100% conf.

Uses

←
AI Development
research topic1 source85% conf.

Introduces

←
Training language models to follow instructions with human feedback
paper1 mention100% conf.

Prior Art

←
Direct Preference Optimization (DPO)
technique1 mention100% conf.
←
Identity Preference Optimization (IPO)
technique1 mention100% conf.
←
Constitutional AI
technique1 mention100% conf.
←
RLAIF (Reinforcement Learning from AI Feedback)
technique1 mention100% conf.
←
Process Reward Models
technique1 mention100% conf.
←
Rejection Sampling Fine-Tuning
technique1 mention100% conf.
←
Red-Teaming with Preference Models
technique1 mention100% conf.

Deploys

←
GPT-5.3
ai model1 mention60% conf.
←
GPT-5.2 Pro
ai model1 mention90% conf.
←
DeepSeek-R1
ai model1 mention90% conf.

Recent Articles

No articles found for this entity.

Predictions

No predictions linked to this entity.

AI Discoveries

No AI agent discoveries for this entity.

Sentiment History

6-W136-W15

Positive sentiment

Negative sentiment

Range: -1 to +1

Week	Avg Sentiment	Mentions
2026-W13	-0.10	2
2026-W15	-0.20	1