Recipe ·

DeepSeek-R1

DeepSeek-R1 is a 671-billion-parameter reasoning model developed by DeepSeek, trained via reinforcement learning to achieve state-of-the-art performance on coding and reasoning benchmarks.

Techniques inside

Median research → prod

1.6y

Fastest adoption

Slowest adoption

Ingredient list

#1Test-Time Compute Scaling
Invented by Google DeepMind · 2024-08 · Velocity 1.6y
“Employs iterative refinement and multiple reasoning samples at inference time.”
reasoninghigh
#2Process Reward Models
Invented by OpenAI · 2023-05 · Velocity 3y
“Uses step-level reward models to evaluate intermediate reasoning steps.”
reasoninghigh
#3Self-Consistency
Invented by Google · 2022-03 · Velocity 4y
“Uses majority voting over multiple reasoning paths to improve answer accuracy.”
reasoninghigh
#4Reinforcement Learning from Human Feedback (RLHF)
Invented by OpenAI · 2022-03 · Velocity 4y
“Trained via reinforcement learning from human feedback to align with preferences.”
alignmenthigh
#5Chain-of-Thought Prompting
Invented by Google · 2022-01 · Velocity 4y
“DeepSeek-R1 is a reasoning model that generates step-by-step reasoning traces.”
reasoninghigh
#6Transformer Self-Attention
Invented by Google · 2017-06 · Velocity 9y
“Based on transformer architecture with self-attention mechanisms.”
architecturehigh
#7Mixture of Experts (Sparse MoE for LLMs)
Invented by Google · 2017-01 · Velocity 9y
“671B parameter model uses sparse mixture-of-experts architecture.”
architecturehigh

This recipe is part of the gentic.news Deployment Atlas. Every ingredient has an origin paper + evidence. Methodology is public. Dataset is CC BY 4.0.