Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A glowing blue digital shield with a mythical winged figure in the center, surrounded by abstract network lines and…
AI ResearchBreakthroughScore: 92

Claude Mythos Clears All UK Cyberattack Simulators, Doubling Speed Revised

Claude Mythos Preview became the first AI model to clear all UK AISI cyberattack simulations, forcing the agency to double its capability-doubling estimate twice in five months.

·9h ago·3 min read··8 views·AI-Generated·Report error
Share:
Source: the-decoder.comvia the_decoder, ai_business, the_verge_techSingle Source
Which AI model first cleared all UK AI Security Institute cyberattack simulations?

Anthropic's Claude Mythos Preview became the first AI model to clear all cyberattack simulations from the UK AI Security Institute, completing a 32-step attack in 6 of 10 attempts and solving the 'Cooling Tower' industrial control simulation that no prior model had passed.

TL;DR

Claude Mythos first to clear all AISI cyber ranges. · UK agency doubled capability timeline estimate twice. · Mythos cut false negatives by 42% vs Opus 4.6.

Anthropic's Claude Mythos Preview cleared all cyberattack simulations from the UK AI Security Institute (AISI) — the first model to do so. The milestone forced AISI to revise its estimate of AI cyber capability doubling time from 8 months in November 2025 to 4.7 months by February 2026.

Key facts

  • Claude Mythos cleared all AISI cyber ranges — first model to do so.
  • AISI revised capability doubling time from 8 to 4.7 months in five months.
  • Mythos completed 32-step attack in 6 of 10 attempts.
  • Solved 'Cooling Tower' simulation no prior model passed.
  • XBOW found 42% false negative reduction vs Opus 4.6.

The capability acceleration

AISI's cyber ranges simulate complex multi-step attacks. One range models a 32-step corporate network intrusion that human experts need about 20 hours to complete, according to AISI. Claude Mythos Preview finished the full attack in 6 out of 10 attempts, up from 3 out of 10 for the earlier Mythos checkpoint. The model also solved "Cooling Tower," an industrial control system simulation, in 3 out of 10 attempts — no other model had ever passed this simulation [According to AISI's report].

OpenAI's GPT-5.5 also "substantially exceeded" the accelerated timeline, per AISI, though it did not clear all ranges.

Independent validation and limits

Offensive security firm XBOW independently tested Mythos Preview with a team of ten experts. The model showed "unprecedented precision" in vulnerability detection, cutting false negatives by 42% compared to Anthropic's Opus 4.6, and by 55% with additional source code access [XBOW report].

Bar chart comparing AI models on vulnerability detection odds. The y-axis shows the ratio of successful finds to misses, scaled from 0 to 12.5. Mythos

Mythos Preview found vulnerabilities in Chromium's V8 sandbox, an area where previous models produced only false positives. However, XBOW noted that removing live system access hurt performance more than removing source code access — the model's strength is code reading, not autonomous exploitation.

The structural read

The unique take here isn't that AI cyber capabilities are improving — it's that the rate of improvement itself is accelerating faster than safety agencies can measure. AISI revised its doubling estimate twice in five months, then watched both Mythos and GPT-5.5 blow past the latest projection. This creates a measurement problem: if the yardstick keeps moving, how do you know when to sound the alarm? Anthropic's head of red teaming, Logan Graham, told The Decoder: "Within a year, Mythos will probably look quite dumb."

Line chart tracking completed steps on AISI's

What to watch

AISI is building harder evaluations with active defenses. Watch for its next benchmark release — likely within 90 days — and whether Mythos's successor or GPT-5.5 can clear those as well. Also monitor XBOW's next independent red-teaming report for real-world exploit success rates.

Line chart showing AISI cybersecurity time horizons on a log scale. The x-axis shows model release dates from early 2025 to late 2026, the y-axis show


Sources cited in this article

  1. AISI. Claude Mythos Preview
  2. AISI's
  3. AISI
  4. XBOW
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 4 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The story here isn't that Mythos is good at hacking — it's that the measurement system itself is breaking. AISI's doubling-time revision from 8 to 4.7 months in five months suggests either the agency's initial estimates were wildly conservative, or the capability curve is genuinely super-exponential. The fact that both Mythos and GPT-5.5 exceeded even the revised timeline points to the latter. XBOW's findings are more nuanced than the headline. Mythos's strength is code reading, not autonomous exploitation. Removing live system access hurt performance more than removing source code access — meaning the model still needs a human-in-the-loop for real-world attacks. This mirrors the pattern seen in coding benchmarks where models excel at static analysis but struggle with dynamic environments. Graham's comment that Mythos will 'look quite dumb' within a year is either refreshing honesty or strategic positioning. If Anthropic believes capability gains are still accelerating, it implies their next model will make Mythos look like a toy — which would make current safety evaluations irrelevant. The contrarian read: AISI may need to abandon fixed benchmarks entirely and switch to relative capability tracking against the SOTA model at time of test.
Compare side-by-side
Anthropic vs XBOW
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all