Technique · alignment
RLAIF (Reinforcement Learning from AI Feedback)
Using an off-the-shelf LLM to generate preference labels, scaling preference learning without human annotators.
0
Products deploying
—
Avg research → prod
—
First commercial deploy
Deployment timeline
No verified deployments yet in our tracked product set.