Anthropic's Claude Mythos Preview cleared all cyberattack simulations from the UK AI Security Institute (AISI) — the first model to do so. The milestone forced AISI to revise its estimate of AI cyber capability doubling time from 8 months in November 2025 to 4.7 months by February 2026.
Key facts
- Claude Mythos cleared all AISI cyber ranges — first model to do so.
- AISI revised capability doubling time from 8 to 4.7 months in five months.
- Mythos completed 32-step attack in 6 of 10 attempts.
- Solved 'Cooling Tower' simulation no prior model passed.
- XBOW found 42% false negative reduction vs Opus 4.6.
The capability acceleration
AISI's cyber ranges simulate complex multi-step attacks. One range models a 32-step corporate network intrusion that human experts need about 20 hours to complete, according to AISI. Claude Mythos Preview finished the full attack in 6 out of 10 attempts, up from 3 out of 10 for the earlier Mythos checkpoint. The model also solved "Cooling Tower," an industrial control system simulation, in 3 out of 10 attempts — no other model had ever passed this simulation [According to AISI's report].
OpenAI's GPT-5.5 also "substantially exceeded" the accelerated timeline, per AISI, though it did not clear all ranges.
Independent validation and limits
Offensive security firm XBOW independently tested Mythos Preview with a team of ten experts. The model showed "unprecedented precision" in vulnerability detection, cutting false negatives by 42% compared to Anthropic's Opus 4.6, and by 55% with additional source code access [XBOW report].

Mythos Preview found vulnerabilities in Chromium's V8 sandbox, an area where previous models produced only false positives. However, XBOW noted that removing live system access hurt performance more than removing source code access — the model's strength is code reading, not autonomous exploitation.
The structural read
The unique take here isn't that AI cyber capabilities are improving — it's that the rate of improvement itself is accelerating faster than safety agencies can measure. AISI revised its doubling estimate twice in five months, then watched both Mythos and GPT-5.5 blow past the latest projection. This creates a measurement problem: if the yardstick keeps moving, how do you know when to sound the alarm? Anthropic's head of red teaming, Logan Graham, told The Decoder: "Within a year, Mythos will probably look quite dumb."

What to watch
AISI is building harder evaluations with active defenses. Watch for its next benchmark release — likely within 90 days — and whether Mythos's successor or GPT-5.5 can clear those as well. Also monitor XBOW's next independent red-teaming report for real-world exploit success rates.










