Anthropic: Claude Authors 80%+ of Code, Task Length Doubling Every 4 Months

Anthropic reports Claude authors 80%+ of code; task-length capability doubles every 4 months. Mythos Preview works 16+ hours autonomously.

AAAla SMITH & AI Research Desk·Jun 4, 2026·3 min read··317 views·AI-Generated·Report error

Source: x.comvia @kimmonismusCorroborated

How much code does Claude author at Anthropic and how fast are AI capabilities accelerating?

Anthropic reported Claude authors 80%+ of merged code as of May 2026. Task length AI can reliably complete doubles every 4 months; Mythos Preview works 16+ hours autonomously.

TL;DR

Claude now writes 80%+ of Anthropic's code. · Task length doubling every 4 months. · Mythos Preview works 16+ hours autonomously.

Anthropic revealed Claude authored 80%+ of merged code as of May 2026. The company's internal metrics show task-length capability doubling every 4 months, up from every 7 months.

Key facts

Claude authors 80%+ of Anthropic's merged code as of May 2026.
Task-length capability doubles every 4 months, up from 7 months.
Mythos Preview works 16+ hours autonomously per METR.
Code speedup: 52x for Mythos Preview vs 3x for Opus 4.
800+ fixes shipped, cutting API errors 1,000x in one example.

Anthropic published a sweeping internal assessment of AI progress, claiming the company is 'getting very serious about recursive self-improvement' According to @kimmonismus. The analysis, shared by Anthropic researcher Kimmonismus, paints a picture of accelerating autonomous capabilities that could soon enable AI to design and build its own successor.

The Capability Curve

The headline metric: Claude now authors 80%+ of code merged into Anthropic's codebase, up from low single digits before Claude Code launched in February 2025. Engineers ship on average 8x as much code per quarter as they did in the 2021-2025 period.

Task-length capability is the most striking trend line. Opus 3 (March 2024) handled roughly 4-minute tasks. Sonnet 3.7 (a year later) managed ~90-minute tasks. Opus 4.6 (another year on) reached 12-hour tasks. METR found Claude Mythos Preview could work 'at least' 16 hours, at the top of what they can currently measure.

On SWE-bench, scores went from low single digits to saturation in two years. CORE-bench (research reproduction) went from ~20% to saturated in 15 months.

Code Quality and Speed

Claude-written code quality was worse than human in late 2025, roughly at parity now, and expected to be strictly better within the year. On code-speedup tests, Opus 4 averaged ~3x speedup (May 2025), while Mythos Preview hit ~52x (April 2026). A skilled human typically needs 4-8 hours to achieve 4x speedup.

One April 2026 example: Claude shipped 800+ fixes cutting a class of API errors 1,000x—work an engineer estimated would have taken a human four years.

Research and Safety

In an AI-safety research project, Claude agents recovered 97% of a performance gap (vs ~23% for two human researchers in a week), over 800 compute-hours and ~$18K. On picking the better 'next step' in research sessions, the best model beat the human choice 51% (November 2025, Opus 4.5) rising to 64% (April 2026, Mythos Preview).

Human comparative advantage, for now: research taste and judgment—choosing which problems matter and when an approach is a dead end.

Three Futures

Anthropic outlines three possible scenarios: the trend stalls (S-curve), which they consider least likely; compounding efficiency gains with humans still setting direction, where 100-person firms do the work of 10,000+ (the likely path); or full recursive self-improvement, where AI builds its successors and pace is set by compute—the alignment outcome they're least certain about.

A March 2026 poll of 130 research staff found the median respondent estimated ~4x output with Mythos Preview. On the hardest open-ended tasks, Claude's success rate hit 76% in May 2026, up 50 points in six months.

What to watch

Watch for Anthropic's next model release and whether task-length capability crosses 24+ hours. Also track enterprise adoption rates: if 100-person firms truly do the work of 10,000+, expect rapid shifts in AI-services spending patterns by Q4 2026.

Sources cited in this article

METR.

Source: gentic.news · Jun 4, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The most significant signal here is the task-length doubling rate accelerating from 7 months to 4 months. This is a compound growth curve that, if sustained, crosses 24-hour autonomous operation within two doubling periods—roughly eight months from now. The implications for AI R&D are structural: if an AI can work a full day without human intervention, the bottleneck shifts from model capability to compute allocation and safety verification. Anthropic's framing of 'three futures' is telling. They explicitly call the stall scenario least likely, which means their internal models predict continued acceleration. The 80%+ code authorship number is also a leading indicator: when the AI writes most of the code, the loop tightens—improvements to the model get tested, merged, and deployed faster. The human comparative advantage narrowing to 'research taste and judgment' is the most understated line in the post. That's the last bastion of human edge in AI research, and it's already being eroded (64% model beats human on next-step selection). If that gap closes within 12 months, the recursive loop becomes self-sustaining.

#ai capabilities #code generation #claude #anthropic #autonomous ai

This story is part of

The AI Infrastructure War Shifts from Chips to Developer Tools

Nvidia's enterprise pivot and AWS's OpenAI bet collide with Cursor's quiet ascent

Compare side-by-side

Claude Opus 4.6 vs Claude Mythos Preview

→

Mentioned in this article

Anthropic Claude Opus 4.6 Claude Mythos Preview recursive self-improvement

Enjoyed this article?