Recipe ·

Llama 4 Scout

Meta's first natively multimodal open-weight MoE model with 17B active / 109B total params, 16 experts, and an industry-leading 10M token context. Multimodal (text+image), 12 languages, runs on a single H100 with Int4 quantization.

Techniques inside

Median research → prod

2.0y

Fastest adoption

Slowest adoption

Ingredient list

#1LLaVA (Visual Instruction Tuning)
Invented by University of Wisconsin · 2023-04 · Velocity 2.0y
“Natively multimodal (text+image) open-weight model, similar to LLaVA's approach of projecting vision features into LLM.”
multimodalmedium
#2Rotary Position Embedding (RoPE)
Invented by Zhuiyi Technology · 2021-04 · Velocity 4y
“Based on Llama 3 architecture which uses RoPE; Llama 4 Scout is a direct evolution.”
architecturehigh
#3CLIP (Contrastive Language-Image Pretraining)
Invented by OpenAI · 2021-02 · Velocity 4y
“Multimodal (text+image) capability suggests use of vision-language alignment similar to CLIP.”
multimodalmedium
#4Transformer Self-Attention
Invented by Google · 2017-06 · Velocity 8y
“All Llama models are Transformer-based; Llama 4 Scout is described as a multimodal MoE model.”
architecturehigh
#5Mixture of Experts (Sparse MoE for LLMs)
Invented by Google · 2017-01 · Velocity 8y
“Meta's first natively multimodal open-weight MoE model with 17B active / 109B total params, 16 experts”
architecturehigh

This recipe is part of the gentic.news Deployment Atlas. Every ingredient has an origin paper + evidence. Methodology is public. Dataset is CC BY 4.0.