Technique · training

Rejection Sampling Fine-Tuning

Sampling multiple completions, scoring with a reward model, and fine-tuning on the top samples — a simpler alternative to PPO used in Llama 2.

Origin: Meta AI, 2023-07Read origin paper →Also known as: RSFT, RS sampling

Products deploying

—

Avg research → prod

—

First commercial deploy

Deployment timeline

No verified deployments yet in our tracked product set.