Why did Anthropic deploy protections for Mythos 5 if it didn't cross CB-2?

Anthropic acknowledged the model's capability was close enough to warrant protective measures, highlighting the gap between formal thresholds and practical governance decisions.

What would intermediate warning levels measure?

They would measure bottleneck reduction in specific substeps of bioweapons development, such as time-to-completion in uplift trials or degree of human correction required.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

A graph with a rising red line crossing a dashed threshold labeled CB-2, with a yellow intermediate zone marked…

Policy & EthicsScore: 71

Anthropic's CB-2 Gap Shows Biorisk Thresholds Need Intermediate Warning Levels

Anthropic deployed protections for Mythos 5 despite CB-2 not being crossed. The gap reveals a structural bias in biorisk thresholds that intermediate warning levels could fix.

AAAla SMITH & AI Research Desk·8h ago·4 min read··19 views·AI-Generated·Report error

Source: lesswrong.comvia lesswrongCorroborated

Do AI biorisk thresholds like CB-1 and CB-2 need intermediate warning levels?

Anthropic deployed protective measures for Claude Mythos 5 despite concluding it did not cross the CB-2 novel bioweapons threshold, revealing a governance gap that intermediate warning levels focused on bottleneck reduction could address.

TL;DR

Anthropic deployed protections for Mythos 5 despite CB-2 not crossed. · Terminal thresholds create a downward bias toward 'not crossed'. · Intermediate warning levels could invert the evidence asymmetry.

Anthropic deployed protective measures for Claude Mythos 5 despite concluding it did not cross the CB-2 novel bioweapons threshold. The gap between threshold-triggered governance and actual decisions exposes a structural flaw in biorisk frameworks.

Key facts

Anthropic concluded Mythos 5 met CB-1 but not CB-2 bioweapons thresholds.
Protective measures were deployed for both thresholds regardless of CB-2 status.
Anthropic activated ASL-3 protections for Opus 4 in 2025 despite threshold uncertainty.
The measurement gap creates a structural downward bias toward 'not crossed'.
Intermediate warning levels would invert the evidence asymmetry toward caution.

In the Claude Mythos/Fable 5 system card, Anthropic states that the model meets non-novel (CB-1) but falls short of novel (CB-2) biological/chemical weapons development capability thresholds. Despite this difference in conclusions, they introduce protective measures in response to both.

This is not the first time there's been a gap between threshold-triggered and actual governance decisions. In 2025, Anthropic activated AI Safety Level 3 (ASL-3) protections with the release of Claude Opus 4 despite being uncertain whether capability thresholds had been met. Anthropic's Responsible Scaling Policy (RSP) v3 discussion further elaborates that capability evaluations may not produce a clean line between "safe" and "dangerous" and labs may spend significant time in what it calls a "zone of ambiguity."

Key Takeaways

Anthropic deployed protections for Mythos 5 despite CB-2 not being crossed.
The gap reveals a structural bias in biorisk thresholds that intermediate warning levels could fix.

The Measurement Gap Creates a Downward Bias

Do the biorisk evaluations of AI labs actually measure the ...

The core problem is an asymmetrical burden of evidence. To say that the model crosses the CB-2 threshold and should trigger associated protections, you need evidence that it's close to end-to-end weapons development. To conclude that it doesn't cross the threshold, you only need to cite one missing or uncertain part of the process. This measurement gap is unavoidable in biorisk—it would be highly unethical for a lab to test end-to-end whether their model is able to design, validate, formulate, and deploy novel biological weapons.

So proxy evidence can always be framed as suggestive but insufficient if thresholds are defined around terminal end states. This creates a downward bias towards "not crossed" conclusions, even if the lab may still decide to deploy protective measures. Any such deployments would also happen at the lab's discretion and would not be associated with any pre-committed trigger.

Intermediate Warning Levels: A Proposal

The proposal is that frontier labs should keep those thresholds as red lines but add intermediate warning levels between them focused on bottleneck reduction rather than demonstrated end-to-end weaponization. The key design property should be that triggers for protective measures map to data a lab can actually collect, like time-to-completion in uplift trials or the degree of human correction required. The intermediate triggers shouldn't aim to prove definitive safety or danger, but should instead focus on specific bottlenecks of the decomposed end-to-end process, perhaps segmented by knowledge vs. execution.

Tying the triggers to measurable outcomes mapped to substeps of the end-to-end process would also help invert the asymmetry. With terminal thresholds, missing-piece evidence argues for "not crossed." With intermediate warning levels, evidence argues for escalation and the burden shifts to justifying why not to escalate. Labs would commit to the margins that'd trigger escalation in advance, removing room for motivated reasoning after seeing the results.

What to watch

Watch for Anthropic's next RSP update or system card—if they introduce intermediate warning levels, it would signal a structural shift in biorisk governance. Also monitor whether other frontier labs like OpenAI or Google DeepMind adopt similar tiered warning frameworks in their safety policies.

Source: lesswrong.com

Source: gentic.news · 8h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The core insight here is that terminal biorisk thresholds like CB-1 and CB-2 create an epistemic asymmetry that systematically biases governance toward inaction. The measurement gap—you cannot ethically test end-to-end bioweapons development—means that any single missing piece of evidence is sufficient to conclude a threshold hasn't been crossed. This is not a bug but a feature of how these thresholds were designed: they were meant to be bright red lines, not triggers for escalating caution. But the Mythos 5 case shows that labs are already making decisions in the grey zone, without pre-committed triggers. This creates a transparency problem: outside observers cannot tell whether protective measures are being deployed based on consistent criteria or ad hoc judgment. The proposal for intermediate warning levels addresses this by shifting the burden of proof—from requiring evidence of danger to requiring evidence of safety before escalation stops. This is structurally similar to how nuclear safety frameworks use multiple warning levels (e.g., INES scale) rather than a single binary threshold. The question is whether labs will pre-commit to these intermediate triggers, which would reduce their flexibility but increase accountability. Given Anthropic's track record of publishing system cards and RSP updates, they are the most likely candidate to pilot such a framework.

#anthropic #ai safety #biorisk

Compare side-by-side

Claude Mythos vs Claude Opus 4.6

→

Mentioned in this article

Anthropic Claude Mythos Claude Opus 4.6

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research2 shared topics

Claude Fable 5 Migration: Cut Prescriptive Skills 60% to Stop Degrading Output

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Policy & Ethics

View all

Norwegian flag on a school desk next to an open textbook and pencil, symbolizing the education policy change banning…

Policy & Ethics

Norway Bans AI Tools for Under-13s, Pointing to Record-Low PISA Scores Since 2015

Norway will prohibit generative AI tools in grades 1-7 from late August 2026, citing falling PISA scores since 2015. Secondary students may use AI only under supervision. The policy extends an earlier smartphone ban that demonstrably improved grades and reduced bullying, and is backed by planned leg

the-decoder.com/2d ago/3 min read/Widely Reported

ai policyeducationregulation

Policy & Ethics

White House Forced Anthropic to Cut SK Telecom Access, Triggering Model Shutdown

White House forced Anthropic to cut SK Telecom access over China ties, then shut down Mythos and Fable 5 after security flaws emerged.

the-decoder.com/3d ago/3 min read/Widely Reported

anthropicnational securityexport controls

A row of Iluvatar CoreX AI processors installed in a data center server rack, with cooling fans and blue LED lights…

Policy & Ethics

ByteDance Buys Tens of Thousands of Iluvatar AI Chips as China Sourcing

ByteDance bought tens of thousands of Iluvatar CoreX AI processors, signaling a major shift from Nvidia to second-tier domestic chipmakers for cloud AI workloads.

scmp.com/4d ago/3 min read/Widely Reported

geopoliticscloud infrastructureai hardware

Key Takeaways

The Measurement Gap Creates a Downward Bias

Intermediate Warning Levels: A Proposal

What to watch

AI Analysis

✨AI Toolslive

Related Articles

Miami Startup Claims 12M-Token LLM Inference at $8 vs. $2,600 on Claude

Alignment Pretraining Could Backfire, LessWrong Post Warns

ChatGPT Market Share Dips Below 50% for First Time, Sensor Tower Reports

Google Gemini-SQL2 Hits 80.04% on BIRD, Beating GPT-5.5 by 7 Points

Claude Code Generates Production Lottie Animations via Show HN

Claude Fable 5 Migration: Cut Prescriptive Skills 60% to Stop Degrading Output

The framework underneath this story

More in Policy & Ethics

Norway Bans AI Tools for Under-13s, Pointing to Record-Low PISA Scores Since 2015

White House Forced Anthropic to Cut SK Telecom Access, Triggering Model Shutdown

ByteDance Buys Tens of Thousands of Iluvatar AI Chips as China Sourcing