Recipe ·

Mistral Small 4

Mistral Small 4, developed by Mistral AI, is a 119B-parameter Mixture of Experts model that unifies reasoning, multimodal, and agentic capabilities into a single efficient model.

Techniques inside

Median research → prod

Fastest adoption

Slowest adoption

Ingredient list

#1PagedAttention (vLLM)
Invented by UC Berkeley · 2023-09 · Velocity 3y
“Mistral recommends vLLM for serving, which uses PagedAttention.”
inferencehigh
#2YaRN RoPE Context Extension
Invented by Nous Research · 2023-08 · Velocity 3y
“Mistral Small 4 uses YaRN for 128K context length.”
architecturehigh
#3Grouped-Query Attention (GQA)
Invented by Google · 2023-05 · Velocity 3y
“Mistral models use Grouped-Query Attention (GQA).”
architecturehigh
#4FlashAttention
Invented by Stanford · 2022-05 · Velocity 4y
“Mistral's inference stack supports FlashAttention.”
inferencehigh
#5Rotary Position Embedding (RoPE)
Invented by Zhuiyi Technology · 2021-04 · Velocity 5y
“Mistral models use Rotary Position Embeddings (RoPE).”
architecturehigh
#6Mixture of Experts (Sparse MoE for LLMs)
Invented by Google · 2017-01 · Velocity 9y
“Mistral Small 4 is a 119B-parameter Mixture of Experts model.”
architecturehigh

This recipe is part of the gentic.news Deployment Atlas. Every ingredient has an origin paper + evidence. Methodology is public. Dataset is CC BY 4.0.