Recipe ·
DeepSeek-R1
DeepSeek-R1 is a 671-billion-parameter reasoning model developed by DeepSeek, trained via reinforcement learning to achieve state-of-the-art performance on coding and reasoning benchmarks.
Ingredient list
Invented by Google DeepMind · 2024-08 · Velocity 1.6y
“Employs iterative refinement and multiple reasoning samples at inference time.”
reasoninghighInvented by OpenAI · 2023-05 · Velocity 3y
“Uses step-level reward models to evaluate intermediate reasoning steps.”
reasoninghighInvented by Google · 2022-03 · Velocity 4y
“Uses majority voting over multiple reasoning paths to improve answer accuracy.”
reasoninghighInvented by OpenAI · 2022-03 · Velocity 4y
“Trained via reinforcement learning from human feedback to align with preferences.”
alignmenthighInvented by Google · 2022-01 · Velocity 4y
“DeepSeek-R1 is a reasoning model that generates step-by-step reasoning traces.”
reasoninghighInvented by Google · 2017-06 · Velocity 9y
“Based on transformer architecture with self-attention mechanisms.”
architecturehighInvented by Google · 2017-01 · Velocity 9y
“671B parameter model uses sparse mixture-of-experts architecture.”
architecturehigh
This recipe is part of the gentic.news Deployment Atlas. Every ingredient has an origin paper + evidence. Methodology is public. Dataset is CC BY 4.0.