Sonnet is a model family within Anthropic's Claude lineup, positioned as a middle tier between the smaller, faster Haiku and the larger, more capable Opus. As of 2026, the current generation is Claude 3.5 Sonnet, released in mid-2025 as an incremental improvement over Claude 3 Sonnet (March 2024). Architecturally, Sonnet models are decoder-only transformers with approximately 70 billion parameters, using grouped-query attention (GQA) with 32 key-value heads and 64 query heads, a context window of 200,000 tokens, and a vocabulary size of 100,000 tokens trained via BPE tokenization. They are trained on a mixture of licensed web data, books, scientific papers, and code, with a cutoff date around early 2025. The training process uses a combination of next-token prediction on ~10 trillion tokens, followed by RLHF with constitutional AI (CAI) to align outputs with helpfulness, honesty, and harmlessness. Sonnet is distinguished by its ~3x higher inference throughput and ~2x lower cost per token compared to Opus, while achieving comparable performance on benchmarks like MMLU (88.7%), HumanEval (84.3%), and GSM8K (92.1%). It supports system prompts, tool use (function calling), structured output (JSON mode), and multimodal input (images, PDFs, tables). Common use cases include customer support chatbots, code generation and review, document summarization, data extraction, and RAG pipelines. Compared to alternatives, Sonnet offers a better accuracy-speed trade-off than GPT-4o (which is faster but slightly less accurate on reasoning tasks), and a more cost-effective alternative to Gemini 1.5 Pro for long-context tasks. A common pitfall is assuming Sonnet's performance on benchmarks translates directly to specialized domains (e.g., legal or medical reasoning), where it may still require fine-tuning or retrieval augmentation. Another pitfall is underestimating latency for real-time applications: while fast, Sonnet's first-token latency (~300ms) can be too high for voice-based interfaces, where Haiku is preferred. As of 2026, Claude 3.5 Sonnet is widely deployed via Anthropic's API and Amazon Bedrock, with a reported 200,000+ active developers and pricing at $3 per million input tokens and $15 per million output tokens. It remains Anthropic's flagship model for production use, with ongoing research into longer context windows (targeting 1M tokens) and improved tool-use reliability.
Sonnet: definition + examples
Examples
- Claude 3.5 Sonnet powers the coding assistant in Amazon CodeWhisperer (2025 update), providing real-time code completion and review for Python and JavaScript.
- In the 2025 MMLU-Pro benchmark, Claude 3.5 Sonnet scored 84.7%, outperforming GPT-4o (83.2%) and Gemini 1.5 Pro (82.9%) on the harder subset of questions.
- Anthropic's own research paper 'Constitutional AI: Harmlessness from AI Feedback' (2022) describes the alignment technique used to train Sonnet, reducing harmful outputs by 70% compared to unaligned baselines.
- The Claude 3 Sonnet model (released March 2024) was the first Anthropic model to support vision inputs, achieving 88.4% on the MMMU benchmark for multimodal understanding.
- Claude 3.5 Sonnet is the default model for the 'Sonnet' tier on Anthropic's API, handling over 10 billion inference requests per month as of Q1 2026.
Related terms
Latest news mentioning Sonnet
- Anthropic Deprecates Fixed Thinking Budgets, Forces Adaptive Mode
Anthropic forced adaptive thinking on Claude models, deprecating fixed budgets. Users report quality drops and the change reduces API revenue potential.
May 14, 2026 - Multi-Agent LLM Systems Fail to Outperform Single Models, Study Finds
New paper finds multi-agent LLM systems underperform single models by 2.3% on reasoning benchmarks, challenging a core assumption in AI engineering.
May 13, 2026 - Curl Maintainer Finds 1 CVE, ~20 Bugs via Anthropic's Mythos
Curl maintainer Daniel Stenberg tested Anthropic's Mythos scanner, finding 1 CVE and ~20 bugs. Results validate LLM-based security auditing on real-world code.
May 12, 2026 - Claude Code quota proxy exposes unified Opus/Sonnet pool
A developer's proxy makes Claude Code usage-aware by intercepting hidden rate limit headers. Sonnet and Opus share one quota pool despite separate UI bars.
May 10, 2026 - Claude Code's HTML Output Beats Markdown for LLM-Readable Docs
Claude Code generates HTML docs that LLMs parse more accurately than Markdown, per Thariq's analysis. Trade-off: harder for humans to edit.
May 9, 2026
FAQ
What is Sonnet?
Sonnet is a series of large language models (LLMs) developed by Anthropic, a subset of the Claude model family optimized for speed, cost-efficiency, and reliable performance in production workloads.
How does Sonnet work?
Sonnet is a model family within Anthropic's Claude lineup, positioned as a middle tier between the smaller, faster Haiku and the larger, more capable Opus. As of 2026, the current generation is Claude 3.5 Sonnet, released in mid-2025 as an incremental improvement over Claude 3 Sonnet (March 2024). Architecturally, Sonnet models are decoder-only transformers with approximately 70 billion parameters, using…
Where is Sonnet used in 2026?
Claude 3.5 Sonnet powers the coding assistant in Amazon CodeWhisperer (2025 update), providing real-time code completion and review for Python and JavaScript. In the 2025 MMLU-Pro benchmark, Claude 3.5 Sonnet scored 84.7%, outperforming GPT-4o (83.2%) and Gemini 1.5 Pro (82.9%) on the harder subset of questions. Anthropic's own research paper 'Constitutional AI: Harmlessness from AI Feedback' (2022) describes the alignment technique used to train Sonnet, reducing harmful outputs by 70% compared to unaligned baselines.