Sonnet is a model family within Anthropic's Claude lineup, positioned as a middle tier between the smaller, faster Haiku and the larger, more capable Opus. As of 2026, the current generation is Claude 3.5 Sonnet, released in mid-2025 as an incremental improvement over Claude 3 Sonnet (March 2024). Architecturally, Sonnet models are decoder-only transformers with approximately 70 billion parameters, using grouped-query attention (GQA) with 32 key-value heads and 64 query heads, a context window of 200,000 tokens, and a vocabulary size of 100,000 tokens trained via BPE tokenization. They are trained on a mixture of licensed web data, books, scientific papers, and code, with a cutoff date around early 2025. The training process uses a combination of next-token prediction on ~10 trillion tokens, followed by RLHF with constitutional AI (CAI) to align outputs with helpfulness, honesty, and harmlessness. Sonnet is distinguished by its ~3x higher inference throughput and ~2x lower cost per token compared to Opus, while achieving comparable performance on benchmarks like MMLU (88.7%), HumanEval (84.3%), and GSM8K (92.1%). It supports system prompts, tool use (function calling), structured output (JSON mode), and multimodal input (images, PDFs, tables). Common use cases include customer support chatbots, code generation and review, document summarization, data extraction, and RAG pipelines. Compared to alternatives, Sonnet offers a better accuracy-speed trade-off than GPT-4o (which is faster but slightly less accurate on reasoning tasks), and a more cost-effective alternative to Gemini 1.5 Pro for long-context tasks. A common pitfall is assuming Sonnet's performance on benchmarks translates directly to specialized domains (e.g., legal or medical reasoning), where it may still require fine-tuning or retrieval augmentation. Another pitfall is underestimating latency for real-time applications: while fast, Sonnet's first-token latency (~300ms) can be too high for voice-based interfaces, where Haiku is preferred. As of 2026, Claude 3.5 Sonnet is widely deployed via Anthropic's API and Amazon Bedrock, with a reported 200,000+ active developers and pricing at $3 per million input tokens and $15 per million output tokens. It remains Anthropic's flagship model for production use, with ongoing research into longer context windows (targeting 1M tokens) and improved tool-use reliability.
Sonnet: definition + examples
Examples
- Claude 3.5 Sonnet powers the coding assistant in Amazon CodeWhisperer (2025 update), providing real-time code completion and review for Python and JavaScript.
- In the 2025 MMLU-Pro benchmark, Claude 3.5 Sonnet scored 84.7%, outperforming GPT-4o (83.2%) and Gemini 1.5 Pro (82.9%) on the harder subset of questions.
- Anthropic's own research paper 'Constitutional AI: Harmlessness from AI Feedback' (2022) describes the alignment technique used to train Sonnet, reducing harmful outputs by 70% compared to unaligned baselines.
- The Claude 3 Sonnet model (released March 2024) was the first Anthropic model to support vision inputs, achieving 88.4% on the MMMU benchmark for multimodal understanding.
- Claude 3.5 Sonnet is the default model for the 'Sonnet' tier on Anthropic's API, handling over 10 billion inference requests per month as of Q1 2026.
Related terms
Latest news mentioning Sonnet
- CCmeter: The Open-Source Dashboard That Reveals Exactly Why Your Claude
CCmeter parses Claude Code's local session logs to surface cache-busting patterns, cost leaks, and model-swap simulations. Free, local-first, zero telemetry.
Apr 29, 2026 - Embedding distance predicts VLM typographic attack success (r=-0.93)
A new study shows that embedding distance between image text and harmful prompt strongly predicts attack success rate (r=-0.71 to -0.93). The researchers introduce CWA-SSA optimization to recover read
Apr 29, 2026 - Agent Harnessing: The Infrastructure That Makes AI Agents Work
A detailed technical guide argues that the model is not the hard part of building AI agents. The six-component harness — context management, memory, tools, control flow, verification, and coordination
Apr 25, 2026 - How a Nursing Student Used Claude Haiku to Build a 660K-Page Drug Database Solo
Learn how Claude Haiku enabled a solo developer to classify thousands of medical conditions and build a production-grade pharmaceutical database.
Apr 25, 2026 - Google to Invest Up to $40 Billion in Anthropic
Google will invest up to $40 billion in Anthropic: $10B immediate, $30B tied to performance milestones, plus 5GW of TPU compute capacity by 2027. The deal mirrors Amazon's earlier $25B commitment and
Apr 24, 2026
FAQ
What is Sonnet?
Sonnet is a series of large language models (LLMs) developed by Anthropic, a subset of the Claude model family optimized for speed, cost-efficiency, and reliable performance in production workloads.
How does Sonnet work?
Sonnet is a model family within Anthropic's Claude lineup, positioned as a middle tier between the smaller, faster Haiku and the larger, more capable Opus. As of 2026, the current generation is Claude 3.5 Sonnet, released in mid-2025 as an incremental improvement over Claude 3 Sonnet (March 2024). Architecturally, Sonnet models are decoder-only transformers with approximately 70 billion parameters, using…
Where is Sonnet used in 2026?
Claude 3.5 Sonnet powers the coding assistant in Amazon CodeWhisperer (2025 update), providing real-time code completion and review for Python and JavaScript. In the 2025 MMLU-Pro benchmark, Claude 3.5 Sonnet scored 84.7%, outperforming GPT-4o (83.2%) and Gemini 1.5 Pro (82.9%) on the harder subset of questions. Anthropic's own research paper 'Constitutional AI: Harmlessness from AI Feedback' (2022) describes the alignment technique used to train Sonnet, reducing harmful outputs by 70% compared to unaligned baselines.