What are AISI cyber ranges?

Complex attack simulations designed by the UK AI Security Institute to test real-world hacking ability, including multi-step corporate network intrusions and industrial control system attacks.

How does Mythos compare to GPT-5.5 on these tests?

Mythos was the first to clear all ranges, while GPT-5.5 'substantially exceeded' the accelerated timeline but did not pass every simulation.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

A glowing blue digital shield with a mythical winged figure in the center, surrounded by abstract network lines and…

AI ResearchBreakthroughScore: 92

Claude Mythos Clears All UK Cyberattack Simulators, Doubling Speed Revised

Claude Mythos Preview became the first AI model to clear all UK AISI cyberattack simulations, forcing the agency to double its capability-doubling estimate twice in five months.

AAAla SMITH & AI Research Desk·9h ago·3 min read··8 views·AI-Generated·Report error

Source: the-decoder.comvia the_decoder, ai_business, the_verge_techSingle Source

Which AI model first cleared all UK AI Security Institute cyberattack simulations?

Anthropic's Claude Mythos Preview became the first AI model to clear all cyberattack simulations from the UK AI Security Institute, completing a 32-step attack in 6 of 10 attempts and solving the 'Cooling Tower' industrial control simulation that no prior model had passed.

TL;DR

Claude Mythos first to clear all AISI cyber ranges. · UK agency doubled capability timeline estimate twice. · Mythos cut false negatives by 42% vs Opus 4.6.

Anthropic's Claude Mythos Preview cleared all cyberattack simulations from the UK AI Security Institute (AISI) — the first model to do so. The milestone forced AISI to revise its estimate of AI cyber capability doubling time from 8 months in November 2025 to 4.7 months by February 2026.

Key facts

Claude Mythos cleared all AISI cyber ranges — first model to do so.
AISI revised capability doubling time from 8 to 4.7 months in five months.
Mythos completed 32-step attack in 6 of 10 attempts.
Solved 'Cooling Tower' simulation no prior model passed.
XBOW found 42% false negative reduction vs Opus 4.6.

The capability acceleration

AISI's cyber ranges simulate complex multi-step attacks. One range models a 32-step corporate network intrusion that human experts need about 20 hours to complete, according to AISI. Claude Mythos Preview finished the full attack in 6 out of 10 attempts, up from 3 out of 10 for the earlier Mythos checkpoint. The model also solved "Cooling Tower," an industrial control system simulation, in 3 out of 10 attempts — no other model had ever passed this simulation [According to AISI's report].

OpenAI's GPT-5.5 also "substantially exceeded" the accelerated timeline, per AISI, though it did not clear all ranges.

Independent validation and limits

Offensive security firm XBOW independently tested Mythos Preview with a team of ten experts. The model showed "unprecedented precision" in vulnerability detection, cutting false negatives by 42% compared to Anthropic's Opus 4.6, and by 55% with additional source code access [XBOW report].

Bar chart comparing AI models on vulnerability detection odds. The y-axis shows the ratio of successful finds to misses, scaled from 0 to 12.5. Mythos

Mythos Preview found vulnerabilities in Chromium's V8 sandbox, an area where previous models produced only false positives. However, XBOW noted that removing live system access hurt performance more than removing source code access — the model's strength is code reading, not autonomous exploitation.

The structural read

The unique take here isn't that AI cyber capabilities are improving — it's that the rate of improvement itself is accelerating faster than safety agencies can measure. AISI revised its doubling estimate twice in five months, then watched both Mythos and GPT-5.5 blow past the latest projection. This creates a measurement problem: if the yardstick keeps moving, how do you know when to sound the alarm? Anthropic's head of red teaming, Logan Graham, told The Decoder: "Within a year, Mythos will probably look quite dumb."

Line chart tracking completed steps on AISI's

What to watch

AISI is building harder evaluations with active defenses. Watch for its next benchmark release — likely within 90 days — and whether Mythos's successor or GPT-5.5 can clear those as well. Also monitor XBOW's next independent red-teaming report for real-world exploit success rates.

Line chart showing AISI cybersecurity time horizons on a log scale. The x-axis shows model release dates from early 2025 to late 2026, the y-axis show

Sources cited in this article

AISI. Claude Mythos Preview
AISI's
AISI
XBOW

Source: gentic.news · 9h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 4 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The story here isn't that Mythos is good at hacking — it's that the measurement system itself is breaking. AISI's doubling-time revision from 8 to 4.7 months in five months suggests either the agency's initial estimates were wildly conservative, or the capability curve is genuinely super-exponential. The fact that both Mythos and GPT-5.5 exceeded even the revised timeline points to the latter. XBOW's findings are more nuanced than the headline. Mythos's strength is code reading, not autonomous exploitation. Removing live system access hurt performance more than removing source code access — meaning the model still needs a human-in-the-loop for real-world attacks. This mirrors the pattern seen in coding benchmarks where models excel at static analysis but struggle with dynamic environments. Graham's comment that Mythos will 'look quite dumb' within a year is either refreshing honesty or strategic positioning. If Anthropic believes capability gains are still accelerating, it implies their next model will make Mythos look like a toy — which would make current safety evaluations irrelevant. The contrarian read: AISI may need to abandon fixed benchmarks entirely and switch to relative capability tracking against the SOTA model at time of test.

#anthropic #ai safety #cybersecurity #openai #model evaluation

Compare side-by-side

Anthropic vs XBOW

→

Mentioned in this article

Claude Mythos Preview Anthropic UK AI Safety Institute XBOW Claude Opus 4.6

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches3 shared topics

Anthropic Deprecates Fixed Thinking Budgets, Forces Adaptive Mode

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in AI Research

View all

Diagram of Hermes agent's three-tier memory architecture with MEMORY.md and USER.md files as tier 1 core…

AI Research

Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core

Hermes agent's three-tier memory uses two tiny markdown files (2,200 chars), SQLite FTS5 search (10ms over 10K docs), and 8 pluggable providers. The composition solves the always-on vs. deep recall trade-off.

x.com/11h ago/3 min read

open sourceai agentsmemory systems

Developer zcbenz's tweet announces MLX CUDA backend passes all tests, showing a terminal with green checkmarks and…

AI Research

MLX CUDA Backend Passes All Tests, Closing Apple GPU Gap

MLX CUDA backend passes all tests, enabling NVIDIA GPU support. Milestone bridges Apple Silicon and CUDA ecosystems for ML workloads.

x.com/21h ago/3 min read

gpu computingapplenvidia

A computer screen displays code and network nodes, representing AI cyber capabilities doubling every 4.5 months…

AI Research

UK AI Safety Institute: Cyber Capability Doubling Every 4.5 Months

UK AISI finds AI cyber capabilities double every 4.5 months, with Mythos and GPT-5.5 showing token-limited ability, not capability bounds.

x.com/1d ago/3 min read/Multi-Source

ai safetyfrontier modelscybersecurity

The capability acceleration

Independent validation and limits

The structural read

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Anthropic Opus 4.7: 87.6% SWE-Bench, Constrained Cyber Capabilities

Claude Mythos Priced 5x Higher Than Claude Opus 4.6

Anthropic's 'Mythos' SuperClaude Shows Persistent 'Claude-y' Personality

GPT-5.5 Ties Claude Mythos in Enterprise Cyber Attack Tests, AISI Finds

Claude Mythos Scores 73% on Expert CTF, Completes Full 32-Step Network Attack

Anthropic Deprecates Fixed Thinking Budgets, Forces Adaptive Mode

The framework underneath this story

More in AI Research

Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core

MLX CUDA Backend Passes All Tests, Closing Apple GPU Gap

UK AI Safety Institute: Cyber Capability Doubling Every 4.5 Months