Skip to content
gentic.news — AI News Intelligence Platform

Technique · alignment

Self-Rewarding Language Models

Iterative alignment where the LM judges its own outputs using an LLM-as-a-judge prompt, removing human-labeled preferences from the loop.

Origin: Meta AI, 2024-01Read origin paper →Also known as: Self-Rewarding
0
Products deploying
Avg research → prod
First commercial deploy

Deployment timeline

No verified deployments yet in our tracked product set.