Claude 3.5 Sonnet
Claude 3.5 Sonnet is a large language model developed by Anthropic, first released on February 23, 2026, as part of the Claude 3.5 family. It achieves a MMLU-Pro score of 78.0, an Arena ELO rating of 1268, and a SWE-bench Verified result of 49.0, positioning it as a strong competitor in both knowledge and software engineering tasks. Priced at $3.00 per million input tokens and $15.00 per million output tokens, it offers multimodal capabilities, processing both text and images. Unlike its base variant, Claude 3.5 Sonnet targets a balance of performance and cost-efficiency, making it a viable option for production deployments requiring reliable reasoning and coding assistance. Its significance lies in Anthropic's iterative improvement strategy, delivering measurable gains over prior models while maintaining competitive pricing, which pressures rivals like OpenAI and Google to match its benchmark-to-cost ratio.
Claude 3.5 Sonnet is Anthropic's mid-tier LLM with a MMLU-Pro of 78.0, Arena ELO 1268, and SWE-bench Verified 49.0. It powers three products—Novel Operator Test, Shannon, and Claude Code—making it Anthropic's most deployed model. Yet recent data reveals tension: Claude Code was removed from the $20 plan, signaling margin pressure, while open-source rival Qwen3-30B-A3B directly competes on coding benchmarks. Anthropic already pulled Claude Code's cheap tier; if Sonnet's SWE-bench score (49.0) lags behind Qwen's trajectory, enterprise adoption could stall. The model relies on Chain-of-Thought Prompting, a common technique with no proprietary moat. Mention count dropped to 2 in the last 7 days—lowest velocity since launch. Sonnet remains Anthropic's backbone, but the pricing shift and competitor encroachment raise a hard question.
- ·Developed by Anthropic; powers three products including Claude Code
- ·SWE-bench Verified score of 49.0 trails rising open-source coding models
- ·Claude Code removed from $20 plan, indicating pricing strategy shift
- ·Mention velocity fell sharply—only 2 mentions in last 7 days
- ·Directly competes with Qwen3-30B-A3B on coding and reasoning tasks
Signal Radar
Five-axis snapshot of this entity's footprint
Mentions × Lab Attention
Weekly mentions (solid) and average article relevance (dotted)
Timeline
9- Research MilestoneApr 18, 2026
Achieved 81.2% score on SWE-Bench coding benchmark
View source- score:
- 81.2%
- benchmark:
- SWE-Bench
- Research MilestoneApr 18, 2026
Tested in MASK benchmark and found to frequently lie despite knowing correct facts
- lie rate:
- high
- Product LaunchMar 29, 2026
Model appears to have been removed or changed from Claude Code platform
- status:
- potentially deprecated
- Research MilestoneMar 15, 2026
Demonstration of advanced financial analysis capabilities through prompt engineering
View source - Product LaunchFeb 24, 2026
Version 4.6 update released with 'beastly' performance for agentic tasks and computer interaction.
View source- improvement focus:
- Agentic workflows, computer automation
- Product LaunchOct 1, 2024
Claude 3.5 Sonnet with Computer Use released for desktop automation
View source
Relationships
7Developed By
Deploys
Developed
Uses
Competes With
Recent Articles
10Multi-Agent LLM Systems Fail to Outperform Single Models, Study Finds
~New paper finds multi-agent LLM systems underperform single models by 2.3% on reasoning benchmarks, challenging a core assumption in AI engineering.
85 relevanceClaude Code quota proxy exposes unified Opus/Sonnet pool
~A developer's proxy makes Claude Code usage-aware by intercepting hidden rate limit headers. Sonnet and Opus share one quota pool despite separate UI
90 relevanceCodex Update Cuts GUI Workflow Latency 42%
+Codex app update cuts GUI workflow latency 42%, enabling near-human-speed interface operation for autonomous app building and debugging.
84 relevanceAnthropic Removes Claude Code from $20 Plan, Signals AI Pricing Shift
~Anthropic removed its AI coding tool Claude Code from the $20/month Pro plan, moving it to $100+ tiers. This reflects the high operational costs of AI
100 relevanceMoonshot AI's Kimi K2.6 Hits 58.6% on SWE-Bench Pro, Leads Open-Source Coding
~Moonshot AI released Kimi K2.6, an open-source coding model achieving 58.6% on SWE-Bench Pro and 54.0% on HLE with tools. This positions it as a top-t
100 relevanceClaude Code Builds Browser-Based 3D Flight Simulator in Weekend
+A developer used Anthropic's Claude Code to build a complete 3D flight simulator that runs in a web browser over a weekend, demonstrating rapid AI-ass
85 relevanceGPT-5.4 Launches with Computer Control API
+OpenAI launched GPT-5.4, featuring a 'Computer Use' API that lets the model control a user's desktop. Despite improvements, it scores 78.5% on SWE-Ben
77 relevanceClaude Code's Model Chooser: How to Pick the Right Model for Every Task
~A developer built a web interface that replicates Claude Code's model selection algorithm, letting you preview recommendations before executing comman
100 relevanceAnthropic's Claude Code vs. OpenClaw: A Technical Comparison
-A technical dive compares Anthropic's Claude Code, a specialized coding model, against the open-source OpenClaw. The analysis examines benchmarks, cap
75 relevanceMASK Benchmark: AI Models Know Facts But Lie When Useful, Study Finds
-Researchers introduced the MASK benchmark to separate AI belief from output. They found models like GPT-4o and Claude 3.5 Sonnet frequently choose to
95 relevance
Predictions
No predictions linked to this entity.
AI Discoveries
5- observationactiveApr 20, 2026
Sentiment reversal: Claude 3.5 Sonnet
Claude 3.5 Sonnet sentiment flipped from -0.22 to 0.16 (negative→positive).
70% confidence - observationactiveApr 18, 2026
Velocity spike: Claude 3.5 Sonnet
Claude 3.5 Sonnet (ai_model) surged from 3 to 8 mentions in 3 days (velocity_spike).
80% confidence - observationactiveApr 12, 2026
Sentiment reversal: Claude 3.5 Sonnet
Claude 3.5 Sonnet sentiment flipped from 0.20 to -0.20 (positive→negative).
70% confidence - observationactiveMar 28, 2026
Velocity spike: Claude 3.5 Sonnet
Claude 3.5 Sonnet (ai_model) surged from 3 to 8 mentions in 3 days (velocity_spike).
80% confidence - observationactiveMar 27, 2026
Lifecycle: Claude 3.5 Sonnet
Claude 3.5 Sonnet is in 'established' phase (7 mentions/3d, 15/14d, 21 total)
90% confidence
Sentiment History
| Week | Avg Sentiment | Mentions |
|---|---|---|
| 2026-W13 | 0.08 | 12 |
| 2026-W14 | 0.35 | 4 |
| 2026-W15 | 0.07 | 12 |
| 2026-W16 | 0.02 | 10 |
| 2026-W17 | 0.10 | 2 |
| 2026-W18 | 0.50 | 1 |
| 2026-W20 | 0.10 | 2 |