Recipe ·

DeepSeek-V3

DeepSeek-V3, developed by DeepSeek, is a highly efficient mixture-of-experts language model trained at a fraction of the cost of comparable systems while maintaining strong performance.

Techniques inside

Median research → prod

Fastest adoption

Slowest adoption

Ingredient list

#1YaRN RoPE Context Extension
Invented by Nous Research · 2023-08 · Velocity 3y
“DeepSeek-V3 uses YaRN for extended context length.”
architecturehigh
#2Grouped-Query Attention (GQA)
Invented by Google · 2023-05 · Velocity 3y
“DeepSeek-V3 uses Grouped-Query Attention (GQA).”
architecturehigh
#3FlashAttention
Invented by Stanford · 2022-05 · Velocity 4y
“DeepSeek-V3 uses FlashAttention-2 for efficient training.”
inferencehigh
#4Rotary Position Embedding (RoPE)
Invented by Zhuiyi Technology · 2021-04 · Velocity 5y
“DeepSeek-V3 uses Rotary Position Embedding (RoPE).”
architecturehigh
#5Mixture of Experts (Sparse MoE for LLMs)
Invented by Google · 2017-01 · Velocity 9y
“DeepSeek-V3 is a highly efficient mixture-of-experts language model.”
architecturehigh

This recipe is part of the gentic.news Deployment Atlas. Every ingredient has an origin paper + evidence. Methodology is public. Dataset is CC BY 4.0.