Recipe ·
Llama 4 Scout
Meta's first natively multimodal open-weight MoE model with 17B active / 109B total params, 16 experts, and an industry-leading 10M token context. Multimodal (text+image), 12 languages, runs on a single H100 with Int4 quantization.
Ingredient list
Invented by University of Wisconsin · 2023-04 · Velocity 2.0y
“Natively multimodal (text+image) open-weight model, similar to LLaVA's approach of projecting vision features into LLM.”
multimodalmediumInvented by Zhuiyi Technology · 2021-04 · Velocity 4y
“Based on Llama 3 architecture which uses RoPE; Llama 4 Scout is a direct evolution.”
architecturehighInvented by OpenAI · 2021-02 · Velocity 4y
“Multimodal (text+image) capability suggests use of vision-language alignment similar to CLIP.”
multimodalmediumInvented by Google · 2017-06 · Velocity 8y
“All Llama models are Transformer-based; Llama 4 Scout is described as a multimodal MoE model.”
architecturehighInvented by Google · 2017-01 · Velocity 8y
“Meta's first natively multimodal open-weight MoE model with 17B active / 109B total params, 16 experts”
architecturehigh
This recipe is part of the gentic.news Deployment Atlas. Every ingredient has an origin paper + evidence. Methodology is public. Dataset is CC BY 4.0.