Superforecasters predicted 3-4 hour METR 80% task horizons by year-end 2026. Claude Mythos hit that benchmark in late May, compressing the timeline by seven months.
Key facts
- Superforecasters predicted 3-4h METR 80% by year-end 2026
- Claude Mythos achieved it in late May 2026
- Timeline compressed from ~7 months to ~3 weeks
- Best 2025 systems had ~30-60 minute horizons
- 24-hour horizon may come before year-end 2026
In early May 2026, the best superforecasters — a cohort known for outperforming intelligence analysts and prediction markets — forecast that the longest METR 80% task horizon would reach 3-4 hours by the end of the year According to @emollick. METR's 80% task horizon measures the longest duration task an AI can complete with at least 80% success rate; it is a key proxy for autonomous agent capability.
By late May, Anthropic's Claude Mythos had already achieved that number. The gap between expert human judgment and actual AI capability deployment has collapsed to roughly three weeks.
What the METR 80% Horizon Measures
METR (Model Evaluation and Threat Research) defines the 80% task horizon as the maximum task duration an AI system can reliably complete. A 3-4 hour horizon implies an agent can autonomously execute multi-step workflows — software engineering tasks, data analysis pipelines, research subtasks — that previously required sustained human oversight. In 2025, the best systems hovered around 30-60 minutes.
Why This Compression Matters
The superforecasters were not pessimistic — they were optimistic by any historical standard. Their year-end prediction assumed continued but bounded progress. Claude Mythos' achievement suggests the capability curve is steeper than even elite forecasters model. The question is whether this represents a one-time leap (perhaps from a new architecture or training method) or a sustained acceleration.
Anthropic has not disclosed the specific technical details behind Claude Mythos' task horizon performance [Anthropic's blog posts do not address the benchmark]. The company's typical release cadence suggests a technical report may follow within weeks, but no timeline has been given.
What to Watch
The next milestone — a 24-hour METR 80% horizon — may arrive before the superforecasters' original year-end target. If Claude Mythos' successor or a competitor (OpenAI's GPT-5, Google's Gemini 3) reaches that threshold by Q3 2026, it would signal that autonomous agent capabilities are doubling faster than even aggressive models predict. Watch for the next METR public benchmark release, expected late June, and any Anthropic technical report detailing the Mythos training run.








