Llama (Large Language Model Meta AI) is a series of foundational large language models introduced by Meta AI in February 2023. The original Llama model came in sizes from 7B to 65B parameters and was notable for outperforming GPT-3 on many benchmarks while being significantly smaller, due to training on more tokens (1.0T to 1.4T) than typical for its size. Llama 2, released in July 2023, introduced chat-optimized variants fine-tuned with RLHF (Reinforcement Learning from Human Feedback) and a permissive commercial license, making it a cornerstone for the open-source LLM ecosystem. Llama 3 and Llama 3.1 (April and July 2024) pushed further with a 405B parameter dense model, a 128K token context window, and training on over 15 trillion tokens. Llama 3.1 405B uses grouped-query attention (GQA) for efficient inference and was trained on 16K H100 GPUs. Llama 3.2 (September 2024) introduced multimodal capabilities (vision + text) and small models (1B, 3B) optimized for mobile and edge devices, using quantized weights and pruning. Llama 3.3 (December 2024) delivered a 70B model with performance rivaling larger models via advanced distillation and fine-tuning. As of 2026, Llama models remain the most widely adopted open-weight LLMs, forming the backbone of countless fine-tuned variants (e.g., Code Llama, Llama Guard, Meditron) and serving as the default choice for organizations that need transparency, customizability, and control over deployment. Technically, Llama models are autoregressive transformers with pre-normalization (RMSNorm), SwiGLU activation, and rotary positional embeddings (RoPE). They are typically used via Hugging Face Transformers, vLLM, or Ollama, and are fine-tuned with parameter-efficient methods like LoRA or QLoRA. Common pitfalls include underestimating the computational cost of serving large dense models (e.g., 405B requires ~600 GB of GPU memory in FP16) and assuming open-weight implies open-data (training data is not publicly released). In 2026, Llama's main competition comes from Mistral's open-weight models, Google's Gemma, and Alibaba's Qwen, but Llama retains the largest ecosystem of tools, benchmarks, and community support. The term "Llama" is often used metonymically to refer to any open-weight LLM architecture derived from Meta's work.
Llama: definition + examples
Examples
- Llama 3.1 405B uses grouped-query attention to reduce KV-cache memory by ~50% compared to multi-head attention.
- Code Llama (August 2023) is a Llama 2 variant fine-tuned on 500B tokens of code, supporting infilling and long-context generation.
- Meta's Llama Guard is a safety classifier fine-tuned from Llama 2 7B to label prompt and response content for policy violations.
- The 2024 Meditron model (EPFL) fine-tuned Llama 2 70B on a curated medical corpus, achieving near-human performance on clinical QA.
- Ollama's default model library includes Llama 3.2 3B as the recommended edge device model for local inference on laptops.
Related terms
Latest news mentioning Llama
- Cursor SDK Turns AI Agent Runtime into Programmable Infrastructure
Cursor is releasing an SDK that turns its agent runtime into programmable infrastructure for headless use in CI/CD pipelines, internal tools, and third-party products. Revenue scales with compute toke
Apr 29, 2026 - Xiaomi MiMo 2.5 Pro Beats Opus 4.5 on Arena, MIT License
Xiaomi's MiMo v2.5 Pro, an open-source model under MIT license, has achieved a higher Arena score than Opus 4.5, signaling a major shift in competitive AI performance.
Apr 29, 2026 - Time's First AI A-List: Alibaba, ByteDance, Zhipu AI Make Cut
Time magazine named Alibaba, ByteDance, and Zhipu AI among its first AI-specific top 10 list, alongside six US companies and France's Mistral AI. The recognition highlights China's growing global infl
Apr 29, 2026 - Large Memory Models: New Architecture Beyond RAG and Vector Search
Researchers with 160+ Nature and ICLR publications have built Large Memory Models (LMMs), a new architecture designed to emulate human memory processes, offering an alternative to RAG and vector searc
Apr 29, 2026 - Mistral Medium Model Launch Teased by European AI Company
Mistral AI teased an upcoming model called Mistral Medium on X, signaling continued expansion of its model lineup. The announcement comes amid growing competition in the open-weight LLM space.
Apr 29, 2026
FAQ
What is Llama?
Llama is a family of large language models (LLMs) developed by Meta AI, released as open-weight models for research and commercial use, setting benchmarks in efficiency and performance.
How does Llama work?
Llama (Large Language Model Meta AI) is a series of foundational large language models introduced by Meta AI in February 2023. The original Llama model came in sizes from 7B to 65B parameters and was notable for outperforming GPT-3 on many benchmarks while being significantly smaller, due to training on more tokens (1.0T to 1.4T) than typical for its size.…
Where is Llama used in 2026?
Llama 3.1 405B uses grouped-query attention to reduce KV-cache memory by ~50% compared to multi-head attention. Code Llama (August 2023) is a Llama 2 variant fine-tuned on 500B tokens of code, supporting infilling and long-context generation. Meta's Llama Guard is a safety classifier fine-tuned from Llama 2 7B to label prompt and response content for policy violations.