Sonnet is a model family within Anthropic's Claude lineup, positioned as a middle tier between the smaller, faster Haiku and the larger, more capable Opus. As of 2026, the current generation is Claude 3.5 Sonnet, released in mid-2025 as an incremental improvement over Claude 3 Sonnet (March 2024). Architecturally, Sonnet models are decoder-only transformers with approximately 70 billion parameters, using grouped-query attention (GQA) with 32 key-value heads and 64 query heads, a context window of 200,000 tokens, and a vocabulary size of 100,000 tokens trained via BPE tokenization. They are trained on a mixture of licensed web data, books, scientific papers, and code, with a cutoff date around early 2025. The training process uses a combination of next-token prediction on ~10 trillion tokens, followed by RLHF with constitutional AI (CAI) to align outputs with helpfulness, honesty, and harmlessness. Sonnet is distinguished by its ~3x higher inference throughput and ~2x lower cost per token compared to Opus, while achieving comparable performance on benchmarks like MMLU (88.7%), HumanEval (84.3%), and GSM8K (92.1%). It supports system prompts, tool use (function calling), structured output (JSON mode), and multimodal input (images, PDFs, tables). Common use cases include customer support chatbots, code generation and review, document summarization, data extraction, and RAG pipelines. Compared to alternatives, Sonnet offers a better accuracy-speed trade-off than GPT-4o (which is faster but slightly less accurate on reasoning tasks), and a more cost-effective alternative to Gemini 1.5 Pro for long-context tasks. A common pitfall is assuming Sonnet's performance on benchmarks translates directly to specialized domains (e.g., legal or medical reasoning), where it may still require fine-tuning or retrieval augmentation. Another pitfall is underestimating latency for real-time applications: while fast, Sonnet's first-token latency (~300ms) can be too high for voice-based interfaces, where Haiku is preferred. As of 2026, Claude 3.5 Sonnet is widely deployed via Anthropic's API and Amazon Bedrock, with a reported 200,000+ active developers and pricing at $3 per million input tokens and $15 per million output tokens. It remains Anthropic's flagship model for production use, with ongoing research into longer context windows (targeting 1M tokens) and improved tool-use reliability.
Sonnet: definition + examples
Examples
- Claude 3.5 Sonnet powers the coding assistant in Amazon CodeWhisperer (2025 update), providing real-time code completion and review for Python and JavaScript.
- In the 2025 MMLU-Pro benchmark, Claude 3.5 Sonnet scored 84.7%, outperforming GPT-4o (83.2%) and Gemini 1.5 Pro (82.9%) on the harder subset of questions.
- Anthropic's own research paper 'Constitutional AI: Harmlessness from AI Feedback' (2022) describes the alignment technique used to train Sonnet, reducing harmful outputs by 70% compared to unaligned baselines.
- The Claude 3 Sonnet model (released March 2024) was the first Anthropic model to support vision inputs, achieving 88.4% on the MMMU benchmark for multimodal understanding.
- Claude 3.5 Sonnet is the default model for the 'Sonnet' tier on Anthropic's API, handling over 10 billion inference requests per month as of Q1 2026.
Related terms
Latest news mentioning Sonnet
- Nvidia Denies Anthropic's China Chip Smuggling Claims via Latin America
Nvidia's Latin America chief denied Anthropic's allegations of chip smuggling to China via the region, expressing frustration with U.S. export controls. The denial highlights tensions between AI safet
Jun 13, 2026 - Build a Cross-Retailer Price-Comparison Agent with BuyWhere MCP in 30 Lines
Connect BuyWhere MCP to a LangChain ReAct agent in 30 lines. Claude picks the right tool from four (search_prices, compare_product, list_cheapest, get_product) to compare prices across 9 retailers in
Jun 13, 2026 - Claudectl: The Windows Workspace Manager That Makes Claude Code
Claudectl solves Claude Code's biggest pain point on Windows: losing context when switching projects. Install via `pipx install claudectl` for session browsing, CLAUDE.md scaffolding, and per-project
Jun 11, 2026 - Anthropic: Mythos Preview Builds Working Exploits in Hours, Not Weeks
Anthropic's Mythos Preview AI built 8 working exploits from Firefox and Windows kernel patches within hours. The first exploit was ready 18 days before the patched Firefox shipped.
Jun 10, 2026 - Claude Code's June 15 Agentic Credit Split: How to Avoid Hitting the $20 Wall
Claude Code's June 15 agentic credit split moves `claude -p` and CI workflows to a separate $20/month bucket on Pro. Upgrade to Max 5x or switch to direct API for production pipelines.
Jun 10, 2026
FAQ
What is Sonnet?
Sonnet is a series of large language models (LLMs) developed by Anthropic, a subset of the Claude model family optimized for speed, cost-efficiency, and reliable performance in production workloads.
How does Sonnet work?
Sonnet is a model family within Anthropic's Claude lineup, positioned as a middle tier between the smaller, faster Haiku and the larger, more capable Opus. As of 2026, the current generation is Claude 3.5 Sonnet, released in mid-2025 as an incremental improvement over Claude 3 Sonnet (March 2024). Architecturally, Sonnet models are decoder-only transformers with approximately 70 billion parameters, using…
Where is Sonnet used in 2026?
Claude 3.5 Sonnet powers the coding assistant in Amazon CodeWhisperer (2025 update), providing real-time code completion and review for Python and JavaScript. In the 2025 MMLU-Pro benchmark, Claude 3.5 Sonnet scored 84.7%, outperforming GPT-4o (83.2%) and Gemini 1.5 Pro (82.9%) on the harder subset of questions. Anthropic's own research paper 'Constitutional AI: Harmlessness from AI Feedback' (2022) describes the alignment technique used to train Sonnet, reducing harmful outputs by 70% compared to unaligned baselines.