Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A line graph showing METR 80% task horizons over time, with a blue curve rising from 2024 to 2026; a red dot marks…
AI ResearchScore: 85

Superforecasters Predicted 3-4h AI Task Horizons by Year-End; Claude Hit It in May

Superforecasters predicted 3-4h METR 80% task horizons by year-end 2026. Claude Mythos hit that in late May, compressing the timeline by seven months.

·6h ago·3 min read··7 views·AI-Generated·Report error
Share:
Did Claude Mythos reach the superforecasters' predicted METR 80% task horizon for 2026?

In early May 2026, top superforecasters predicted the longest METR 80% task horizon would reach 3-4 hours by year-end. Claude Mythos hit that benchmark in late May, compressing the timeline by roughly seven months.

TL;DR

Superforecasters predicted 3-4h METR 80% horizons · Claude Mythos achieved that in late May · Timeline compressed from year-end to 3 weeks

Superforecasters predicted 3-4 hour METR 80% task horizons by year-end 2026. Claude Mythos hit that benchmark in late May, compressing the timeline by seven months.

Key facts

  • Superforecasters predicted 3-4h METR 80% by year-end 2026
  • Claude Mythos achieved it in late May 2026
  • Timeline compressed from ~7 months to ~3 weeks
  • Best 2025 systems had ~30-60 minute horizons
  • 24-hour horizon may come before year-end 2026

In early May 2026, the best superforecasters — a cohort known for outperforming intelligence analysts and prediction markets — forecast that the longest METR 80% task horizon would reach 3-4 hours by the end of the year According to @emollick. METR's 80% task horizon measures the longest duration task an AI can complete with at least 80% success rate; it is a key proxy for autonomous agent capability.

By late May, Anthropic's Claude Mythos had already achieved that number. The gap between expert human judgment and actual AI capability deployment has collapsed to roughly three weeks.

What the METR 80% Horizon Measures

METR (Model Evaluation and Threat Research) defines the 80% task horizon as the maximum task duration an AI system can reliably complete. A 3-4 hour horizon implies an agent can autonomously execute multi-step workflows — software engineering tasks, data analysis pipelines, research subtasks — that previously required sustained human oversight. In 2025, the best systems hovered around 30-60 minutes.

Why This Compression Matters

The superforecasters were not pessimistic — they were optimistic by any historical standard. Their year-end prediction assumed continued but bounded progress. Claude Mythos' achievement suggests the capability curve is steeper than even elite forecasters model. The question is whether this represents a one-time leap (perhaps from a new architecture or training method) or a sustained acceleration.

Anthropic has not disclosed the specific technical details behind Claude Mythos' task horizon performance [Anthropic's blog posts do not address the benchmark]. The company's typical release cadence suggests a technical report may follow within weeks, but no timeline has been given.

What to Watch

The next milestone — a 24-hour METR 80% horizon — may arrive before the superforecasters' original year-end target. If Claude Mythos' successor or a competitor (OpenAI's GPT-5, Google's Gemini 3) reaches that threshold by Q3 2026, it would signal that autonomous agent capabilities are doubling faster than even aggressive models predict. Watch for the next METR public benchmark release, expected late June, and any Anthropic technical report detailing the Mythos training run.

Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The superforecasters' miss is not about pessimism — it's about the shape of the capability curve. Their year-end prediction assumed a roughly linear extrapolation from 2025's 30-60 minute horizons. Claude Mythos' achievement suggests a step-function improvement, possibly from a new training methodology or architecture shift. The key question: is this a one-time leap or a new slope? Anthropic's silence on technical details is telling. When a company hits a benchmark this far ahead of expert consensus, they typically rush out a paper to claim credit. The absence may indicate the improvement came from scale (more compute, more data) rather than a novel insight — or that they are holding details for a strategic release. Comparing to prior art: the jump from 30-60 minutes to 3-4 hours in roughly 6 months mirrors the 2023-2024 transition from GPT-3.5 to GPT-4 in coding benchmarks. That leap was attributed to RLHF scaling and chain-of-thought prompting. If Mythos follows the same pattern, the 24-hour horizon may come from further scale rather than breakthrough invention.

Mentioned in this article

Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all