Technique · alignment
Self-Rewarding Language Models
Iterative alignment where the LM judges its own outputs using an LLM-as-a-judge prompt, removing human-labeled preferences from the loop.
0
Products deploying
—
Avg research → prod
—
First commercial deploy
Deployment timeline
No verified deployments yet in our tracked product set.