claude opus
30 articles about claude opus in AI news
WorkBench Revisited: Claude Opus 4.8 Hits 89% Task Completion
Claude Opus 4.8 completes 89% of WorkBench tasks with 2.5% harm rate, up from GPT-4's 43% and 26% in 2024, showing capability and safety align.
Claude Opus 4.7 Matches Dedicated NMR Software on Chemistry Tasks
Claude Opus 4.7 matches NMR software on chemistry tasks per Anthropic blog, but methodology and benchmarks undisclosed.
Claude Opus 4.8 Launches Dynamic Workflows for Agentic Code
Claude Opus 4.8 launched with dynamic workflows for Claude Code, enabling multi-step agentic coding. The release addresses quality issues after a ~25% instruction miss rate post-4.6.
Claude Opus 4.8: 2.5x Faster, 3x Cheaper Fast Mode
Anthropic released Claude Opus 4.8 with 2.5x faster, 3x cheaper fast mode and a new dynamic workflows feature, undercutting GPT-4 Turbo on price.
Anthropic Ships Claude Opus 4.7: 80.1 SWE-Bench, 1M Context
Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified, a slight regression from Opus 4.6's 80.3. The release prioritizes safety tuning over benchmark leadership.
Anthropic Ships Claude Opus 4.7: 2.1% SWE-Bench Gain Over 4.6
Anthropic released Claude Opus 4.7 with a 2.1-point SWE-Bench gain to 82.9, the smallest jump between Opus versions yet, signaling diminishing returns.
Claude Opus 4.7 Builds AlphaZero-Style Self-Play on Consumer Hardware
Claude Opus 4.7 built AlphaZero self-play from scratch on consumer hardware in three hours, showing autonomous algorithmic code generation.
ThermoQA Benchmark Reveals LLM Reasoning Gaps: Claude Opus Leads at 94.1%
Researchers released ThermoQA, a 293-question benchmark testing thermodynamic reasoning. Claude Opus 4.6 scored 94.1% overall, but models showed significant degradation on complex cycle analysis versus simple property lookups.
Moonshot AI Ships Trillion-Parameter Open Model, Matches Claude Opus on Coding
Moonshot AI released a trillion-parameter open-source model that reportedly matches Anthropic's Claude Opus on most coding benchmarks. This follows the same day Anthropic committed $25B to AWS for compute, highlighting divergent AI scaling strategies.
Claude Opus Allegedly Refuses to Answer 'What is 2+2?'
A viral post claims Anthropic's Claude Opus refused to answer 'What is 2+2?', citing potential harm. The incident highlights tensions between AI safety protocols and basic utility.
Claude Opus 4.7 Launches with 3.75MP Vision, Agentic Coding, and New Tokenizer
Anthropic launched Claude Opus 4.7 today with 3x higher vision resolution (3.75MP), self-verifying coding outputs, and stricter instruction following. The update targets enterprise agentic workflows and knowledge work benchmarks.
Anthropic to Launch Claude Opus 4.7 & AI Design Tool This Week
Anthropic is launching Claude Opus 4.7 and a new AI design tool this week, according to a report. The company is also testing a more advanced model, Claude Mythos, for cybersecurity applications.
Claude Opus 4.7 Appears on Anthropic's Internal API, Hinting at Imminent Release
A new model identifier, 'Claude Opus 4.7', has been spotted on Anthropic's internal API. This suggests a forthcoming update to the flagship Opus line, potentially a minor version bump ahead of a larger release.
Claude Opus 4.6 Unlimited Access Deal Sparks Developer Interest
A developer reports finding a deal for unlimited Claude Opus 4.6 usage without rate limits, potentially offering significant cost savings for heavy users compared to Anthropic's official API pricing.
Claude Mythos Priced 5x Higher Than Claude Opus 4.6
Anthropic's newly detailed Claude Mythos model is priced at 5x the cost of Claude Opus 4.6. This premium pricing strategy suggests a focus on high-value enterprise use cases over raw performance-per-dollar.
Alibaba's Qwen3.6-Plus Reportedly Under Half the Size of Kimi K2.5, Nears Claude Opus 4.5 Performance
Alibaba's Tongyi Lab announced Qwen3.6-Plus, a model reportedly under half the size of Moonshot's Kimi K2.5 while approaching Claude Opus 4.5 performance, signaling major efficiency gains in China's LLM race.
Glass AI Coding Editor Expands to Windows, Bundles Claude Opus 4.6, GPT-5.4 & Gemini 3.1 Pro Access
The Glass AI coding editor is now available on Windows, offering developers a single subscription that includes usage of Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro without additional API costs. This expansion significantly broadens its potential user base beyond the Mac ecosystem.
Open-Source Code Editor 'Cline' Integrates Claude Opus, GPT-4, and Gemini Pro via Single API
Developer Hasan Tohar announced 'Cline', an open-source code editor that integrates multiple top-tier AI models through a unified interface. The tool allows switching between Claude Opus, GPT-4, and Gemini Pro without managing separate API keys or subscriptions.
Claude Opus 4.6 Is Live in Claude Code: Here's How to Use It for Maximum Coding Speed
Claude Opus 4.6 is now available in Claude Code. This update brings significant improvements to complex reasoning and autonomous coding tasks—here's how to configure it and what to prompt differently.
Glass AI IDE Emerges, Claims to Offer Free Access to Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro
A new AI-powered coding editor called Glass claims to provide free access to multiple top-tier LLMs, including Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro, without API fees. This positions it as a direct, cost-free competitor to established paid AI IDEs like Cursor and Windsurf.
Claude Opus 4.6's Security Audit Power Is Now in Claude Code
The new Claude Opus 4.6 model, which found 500+ high-severity open-source flaws, is now available in Claude Code for automated security auditing.
Claude Opus 4.6 Is Live: How to Use Its Improved Coding & Agentic Features in Claude Code
Claude Opus 4.6 is now available with better coding accuracy and agentic task handling. Here's how to configure Claude Code to use it and what to expect.
Cursor Announces Composer 2: Smaller, Cheaper Coding-Specific Model Targeting Claude Opus Performance
Cursor is launching Composer 2, a coding-specific AI model trained solely on programming data. The smaller, cheaper model is rumored to approach Claude Opus 4.6 performance, intensifying competition in the coding agent space.
AI Models Investigate Prehistoric Mysteries: How GPT-5.4, Claude Opus, and Gemini DeepThink Tackled the Dinosaur Civilization Question
Leading AI models including GPT-5.4 Pro, Claude Opus, and Gemini DeepThink were challenged to investigate whether advanced dinosaur civilizations existed. The experiment reveals how modern AI systems approach complex historical questions with original analysis and data gathering capabilities.
Beyond the Token Limit: How Claude Opus 4.6's Architectural Breakthrough Enables True Long-Context Reasoning
Anthropic's Claude Opus 4.6 represents a fundamental shift in large language model architecture, moving beyond simple token expansion to create genuinely autonomous reasoning systems. The breakthrough enables practical use of million-token contexts through novel memory management and hierarchical processing.
Claude Opus 4.6's New 'Personality' and How to Code with It Effectively
Opus 4.6 behaves differently than 4.5—more verbose and emotional. Here's how to adjust your Claude Code prompts to get the concise, technical responses you need.
Claude Opus 4.7: 3 Breaking Changes That Will Crash Your Code
Opus 4.7 introduces breaking changes that require immediate migration: extended thinking budgets removed, sampling parameters deleted, and vision coordinates now map 1:1.
Step-3.5-Flash: 196B Open-Source MoE Model Activates Only 11B Parameters, Outperforms Kimi K2.5 and Claude Opus 4.5 on Key Benchmarks
Shanghai-based StepFun's Step-3.5-Flash, a 196B parameter sparse mixture-of-experts model that activates only 11B parameters per token, achieves top scores on AIME 2025 (97.3) and LiveCodeBench-V6 (86.4) while costing 18.9x less to run than Kimi K2.5.
Opus 4.7's Tokenizer Change: How to Measure Your Real Claude Code Costs
Claude Opus 4.7's updated tokenizer means the same input can cost 40%+ more than 4.6. Use the Claude Token Counter to measure real costs before upgrading.
How Claude Code Users Can Apply Opus 4.6's Security Analysis to Their Own Codebases
Claude Opus 4.6's ability to find 500+ high-severity open-source flaws isn't just news—it's a capability you can use in Claude Code today to audit your dependencies and code.