Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A graph with a rising red line crossing a dashed threshold labeled CB-2, with a yellow intermediate zone marked…

Anthropic's CB-2 Gap Shows Biorisk Thresholds Need Intermediate Warning Levels

Anthropic deployed protections for Mythos 5 despite CB-2 not being crossed. The gap reveals a structural bias in biorisk thresholds that intermediate warning levels could fix.

·8h ago·4 min read··19 views·AI-Generated·Report error
Share:
Source: lesswrong.comvia lesswrongCorroborated
Do AI biorisk thresholds like CB-1 and CB-2 need intermediate warning levels?

Anthropic deployed protective measures for Claude Mythos 5 despite concluding it did not cross the CB-2 novel bioweapons threshold, revealing a governance gap that intermediate warning levels focused on bottleneck reduction could address.

TL;DR

Anthropic deployed protections for Mythos 5 despite CB-2 not crossed. · Terminal thresholds create a downward bias toward 'not crossed'. · Intermediate warning levels could invert the evidence asymmetry.

Anthropic deployed protective measures for Claude Mythos 5 despite concluding it did not cross the CB-2 novel bioweapons threshold. The gap between threshold-triggered governance and actual decisions exposes a structural flaw in biorisk frameworks.

Key facts

  • Anthropic concluded Mythos 5 met CB-1 but not CB-2 bioweapons thresholds.
  • Protective measures were deployed for both thresholds regardless of CB-2 status.
  • Anthropic activated ASL-3 protections for Opus 4 in 2025 despite threshold uncertainty.
  • The measurement gap creates a structural downward bias toward 'not crossed'.
  • Intermediate warning levels would invert the evidence asymmetry toward caution.

In the Claude Mythos/Fable 5 system card, Anthropic states that the model meets non-novel (CB-1) but falls short of novel (CB-2) biological/chemical weapons development capability thresholds. Despite this difference in conclusions, they introduce protective measures in response to both.

This is not the first time there's been a gap between threshold-triggered and actual governance decisions. In 2025, Anthropic activated AI Safety Level 3 (ASL-3) protections with the release of Claude Opus 4 despite being uncertain whether capability thresholds had been met. Anthropic's Responsible Scaling Policy (RSP) v3 discussion further elaborates that capability evaluations may not produce a clean line between "safe" and "dangerous" and labs may spend significant time in what it calls a "zone of ambiguity."

Key Takeaways

  • Anthropic deployed protections for Mythos 5 despite CB-2 not being crossed.
  • The gap reveals a structural bias in biorisk thresholds that intermediate warning levels could fix.

The Measurement Gap Creates a Downward Bias

Do the biorisk evaluations of AI labs actually measure the ...

The core problem is an asymmetrical burden of evidence. To say that the model crosses the CB-2 threshold and should trigger associated protections, you need evidence that it's close to end-to-end weapons development. To conclude that it doesn't cross the threshold, you only need to cite one missing or uncertain part of the process. This measurement gap is unavoidable in biorisk—it would be highly unethical for a lab to test end-to-end whether their model is able to design, validate, formulate, and deploy novel biological weapons.

So proxy evidence can always be framed as suggestive but insufficient if thresholds are defined around terminal end states. This creates a downward bias towards "not crossed" conclusions, even if the lab may still decide to deploy protective measures. Any such deployments would also happen at the lab's discretion and would not be associated with any pre-committed trigger.

Intermediate Warning Levels: A Proposal

The proposal is that frontier labs should keep those thresholds as red lines but add intermediate warning levels between them focused on bottleneck reduction rather than demonstrated end-to-end weaponization. The key design property should be that triggers for protective measures map to data a lab can actually collect, like time-to-completion in uplift trials or the degree of human correction required. The intermediate triggers shouldn't aim to prove definitive safety or danger, but should instead focus on specific bottlenecks of the decomposed end-to-end process, perhaps segmented by knowledge vs. execution.

Tying the triggers to measurable outcomes mapped to substeps of the end-to-end process would also help invert the asymmetry. With terminal thresholds, missing-piece evidence argues for "not crossed." With intermediate warning levels, evidence argues for escalation and the burden shifts to justifying why not to escalate. Labs would commit to the margins that'd trigger escalation in advance, removing room for motivated reasoning after seeing the results.

What to watch

Watch for Anthropic's next RSP update or system card—if they introduce intermediate warning levels, it would signal a structural shift in biorisk governance. Also monitor whether other frontier labs like OpenAI or Google DeepMind adopt similar tiered warning frameworks in their safety policies.


Source: lesswrong.com


Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The core insight here is that terminal biorisk thresholds like CB-1 and CB-2 create an epistemic asymmetry that systematically biases governance toward inaction. The measurement gap—you cannot ethically test end-to-end bioweapons development—means that any single missing piece of evidence is sufficient to conclude a threshold hasn't been crossed. This is not a bug but a feature of how these thresholds were designed: they were meant to be bright red lines, not triggers for escalating caution. But the Mythos 5 case shows that labs are already making decisions in the grey zone, without pre-committed triggers. This creates a transparency problem: outside observers cannot tell whether protective measures are being deployed based on consistent criteria or ad hoc judgment. The proposal for intermediate warning levels addresses this by shifting the burden of proof—from requiring evidence of danger to requiring evidence of safety before escalation stops. This is structurally similar to how nuclear safety frameworks use multiple warning levels (e.g., INES scale) rather than a single binary threshold. The question is whether labs will pre-commit to these intermediate triggers, which would reduce their flexibility but increase accountability. Given Anthropic's track record of publishing system cards and RSP updates, they are the most likely candidate to pilot such a framework.
Compare side-by-side
Claude Mythos vs Claude Opus 4.6

Mentioned in this article

Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Policy & Ethics

View all