Skip to content
gentic.news — AI News Intelligence Platform

Technique · training

Rejection Sampling Fine-Tuning

Sampling multiple completions, scoring with a reward model, and fine-tuning on the top samples — a simpler alternative to PPO used in Llama 2.

Origin: Meta AI, 2023-07Read origin paper →Also known as: RSFT, RS sampling
0
Products deploying
Avg research → prod
First commercial deploy

Deployment timeline

No verified deployments yet in our tracked product set.

Rejection Sampling Fine-Tuning — Origin, Deployments, and Velocity | gentic.news