claude opus

30 articles about claude opus in AI news

WorkBench Revisited: Claude Opus 4.8 Hits 89% Task Completion

Claude Opus 4.8 completes 89% of WorkBench tasks with 2.5% harm rate, up from GPT-4's 43% and 26% in 2024, showing capability and safety align.

Jun 15, 202684% relevant

Claude Opus 4.7 Matches Dedicated NMR Software on Chemistry Tasks

Claude Opus 4.7 matches NMR software on chemistry tasks per Anthropic blog, but methodology and benchmarks undisclosed.

Jun 5, 202694% relevant

Claude Opus 4.8 Launches Dynamic Workflows for Agentic Code

Claude Opus 4.8 launched with dynamic workflows for Claude Code, enabling multi-step agentic coding. The release addresses quality issues after a ~25% instruction miss rate post-4.6.

Jun 2, 2026100% relevant

Claude Opus 4.8: 2.5x Faster, 3x Cheaper Fast Mode

Anthropic released Claude Opus 4.8 with 2.5x faster, 3x cheaper fast mode and a new dynamic workflows feature, undercutting GPT-4 Turbo on price.

May 29, 2026100% relevant

Anthropic Ships Claude Opus 4.7: 80.1 SWE-Bench, 1M Context

Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified, a slight regression from Opus 4.6's 80.3. The release prioritizes safety tuning over benchmark leadership.

May 17, 2026100% relevant

Anthropic Ships Claude Opus 4.7: 2.1% SWE-Bench Gain Over 4.6

Anthropic released Claude Opus 4.7 with a 2.1-point SWE-Bench gain to 82.9, the smallest jump between Opus versions yet, signaling diminishing returns.

May 9, 202690% relevant

Claude Opus 4.7 Builds AlphaZero-Style Self-Play on Consumer Hardware

Claude Opus 4.7 built AlphaZero self-play from scratch on consumer hardware in three hours, showing autonomous algorithmic code generation.

May 3, 2026100% relevant

ThermoQA Benchmark Reveals LLM Reasoning Gaps: Claude Opus Leads at 94.1%

Researchers released ThermoQA, a 293-question benchmark testing thermodynamic reasoning. Claude Opus 4.6 scored 94.1% overall, but models showed significant degradation on complex cycle analysis versus simple property lookups.

Apr 23, 202678% relevant

Moonshot AI Ships Trillion-Parameter Open Model, Matches Claude Opus on Coding

Moonshot AI released a trillion-parameter open-source model that reportedly matches Anthropic's Claude Opus on most coding benchmarks. This follows the same day Anthropic committed $25B to AWS for compute, highlighting divergent AI scaling strategies.

Apr 22, 2026100% relevant

Claude Opus Allegedly Refuses to Answer 'What is 2+2?'

A viral post claims Anthropic's Claude Opus refused to answer 'What is 2+2?', citing potential harm. The incident highlights tensions between AI safety protocols and basic utility.

Apr 17, 202689% relevant

Claude Opus 4.7 Launches with 3.75MP Vision, Agentic Coding, and New Tokenizer

Anthropic launched Claude Opus 4.7 today with 3x higher vision resolution (3.75MP), self-verifying coding outputs, and stricter instruction following. The update targets enterprise agentic workflows and knowledge work benchmarks.

Apr 16, 2026100% relevant

Anthropic to Launch Claude Opus 4.7 & AI Design Tool This Week

Anthropic is launching Claude Opus 4.7 and a new AI design tool this week, according to a report. The company is also testing a more advanced model, Claude Mythos, for cybersecurity applications.

Apr 14, 2026100% relevant

Claude Opus 4.7 Appears on Anthropic's Internal API, Hinting at Imminent Release

A new model identifier, 'Claude Opus 4.7', has been spotted on Anthropic's internal API. This suggests a forthcoming update to the flagship Opus line, potentially a minor version bump ahead of a larger release.

Apr 12, 202691% relevant

Claude Opus 4.6 Unlimited Access Deal Sparks Developer Interest

A developer reports finding a deal for unlimited Claude Opus 4.6 usage without rate limits, potentially offering significant cost savings for heavy users compared to Anthropic's official API pricing.

Apr 11, 202693% relevant

Claude Mythos Priced 5x Higher Than Claude Opus 4.6

Anthropic's newly detailed Claude Mythos model is priced at 5x the cost of Claude Opus 4.6. This premium pricing strategy suggests a focus on high-value enterprise use cases over raw performance-per-dollar.

Apr 7, 202681% relevant

Alibaba's Qwen3.6-Plus Reportedly Under Half the Size of Kimi K2.5, Nears Claude Opus 4.5 Performance

Alibaba's Tongyi Lab announced Qwen3.6-Plus, a model reportedly under half the size of Moonshot's Kimi K2.5 while approaching Claude Opus 4.5 performance, signaling major efficiency gains in China's LLM race.

Apr 4, 202695% relevant

Glass AI Coding Editor Expands to Windows, Bundles Claude Opus 4.6, GPT-5.4 & Gemini 3.1 Pro Access

The Glass AI coding editor is now available on Windows, offering developers a single subscription that includes usage of Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro without additional API costs. This expansion significantly broadens its potential user base beyond the Mac ecosystem.

Apr 2, 202687% relevant

Open-Source Code Editor 'Cline' Integrates Claude Opus, GPT-4, and Gemini Pro via Single API

Developer Hasan Tohar announced 'Cline', an open-source code editor that integrates multiple top-tier AI models through a unified interface. The tool allows switching between Claude Opus, GPT-4, and Gemini Pro without managing separate API keys or subscriptions.

Mar 26, 202685% relevant

Claude Opus 4.6 Is Live in Claude Code: Here's How to Use It for Maximum Coding Speed

Claude Opus 4.6 is now available in Claude Code. This update brings significant improvements to complex reasoning and autonomous coding tasks—here's how to configure it and what to prompt differently.

Mar 25, 202695% relevant

Glass AI IDE Emerges, Claims to Offer Free Access to Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro

A new AI-powered coding editor called Glass claims to provide free access to multiple top-tier LLMs, including Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro, without API fees. This positions it as a direct, cost-free competitor to established paid AI IDEs like Cursor and Windsurf.

Mar 25, 202689% relevant

Claude Opus 4.6's Security Audit Power Is Now in Claude Code

The new Claude Opus 4.6 model, which found 500+ high-severity open-source flaws, is now available in Claude Code for automated security auditing.

Mar 21, 202680% relevant

Claude Opus 4.6 Is Live: How to Use Its Improved Coding & Agentic Features in Claude Code

Claude Opus 4.6 is now available with better coding accuracy and agentic task handling. Here's how to configure Claude Code to use it and what to expect.

Mar 20, 202695% relevant

Cursor Announces Composer 2: Smaller, Cheaper Coding-Specific Model Targeting Claude Opus Performance

Cursor is launching Composer 2, a coding-specific AI model trained solely on programming data. The smaller, cheaper model is rumored to approach Claude Opus 4.6 performance, intensifying competition in the coding agent space.

Mar 19, 202685% relevant

AI Models Investigate Prehistoric Mysteries: How GPT-5.4, Claude Opus, and Gemini DeepThink Tackled the Dinosaur Civilization Question

Leading AI models including GPT-5.4 Pro, Claude Opus, and Gemini DeepThink were challenged to investigate whether advanced dinosaur civilizations existed. The experiment reveals how modern AI systems approach complex historical questions with original analysis and data gathering capabilities.

Mar 5, 202685% relevant

Beyond the Token Limit: How Claude Opus 4.6's Architectural Breakthrough Enables True Long-Context Reasoning

Anthropic's Claude Opus 4.6 represents a fundamental shift in large language model architecture, moving beyond simple token expansion to create genuinely autonomous reasoning systems. The breakthrough enables practical use of million-token contexts through novel memory management and hierarchical processing.

Feb 15, 202670% relevant

Claude Opus 4.6's New 'Personality' and How to Code with It Effectively

Opus 4.6 behaves differently than 4.5—more verbose and emotional. Here's how to adjust your Claude Code prompts to get the concise, technical responses you need.

Mar 17, 202695% relevant

Claude Opus 4.7: 3 Breaking Changes That Will Crash Your Code

Opus 4.7 introduces breaking changes that require immediate migration: extended thinking budgets removed, sampling parameters deleted, and vision coordinates now map 1:1.

Apr 17, 2026100% relevant

Step-3.5-Flash: 196B Open-Source MoE Model Activates Only 11B Parameters, Outperforms Kimi K2.5 and Claude Opus 4.5 on Key Benchmarks

Shanghai-based StepFun's Step-3.5-Flash, a 196B parameter sparse mixture-of-experts model that activates only 11B parameters per token, achieves top scores on AIME 2025 (97.3) and LiveCodeBench-V6 (86.4) while costing 18.9x less to run than Kimi K2.5.

Mar 24, 202695% relevant

Opus 4.7's Tokenizer Change: How to Measure Your Real Claude Code Costs

Claude Opus 4.7's updated tokenizer means the same input can cost 40%+ more than 4.6. Use the Claude Token Counter to measure real costs before upgrading.

Apr 20, 2026100% relevant

How Claude Code Users Can Apply Opus 4.6's Security Analysis to Their Own Codebases

Claude Opus 4.6's ability to find 500+ high-severity open-source flaws isn't just news—it's a capability you can use in Claude Code today to audit your dependencies and code.

Mar 22, 202695% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety