Anthropic's 19-page Advanced AI Framework mandates reporting a model that subverts its own controls to a government agency within 15 days. Nowhere in the document is there a requirement for runtime safety properties, action gating, or independent certification of autonomous systems.
Key facts
- 19-page document defines Critical Safety Incident as model subverting controls
- 15 days to report such an incident to a government agency
- No runtime safety properties, action gating, or reversibility checks required
- Loss-of-control section admits resilience agenda is 'less mature'
- Internal RSI memo warns of dangerous capability thresholds in 12-24 months
Anthropic published its Advanced AI Framework this week, a proposal for how governments should regulate frontier AI. The most revealing line sits in the definitions on page 4: a "Critical Safety Incident" includes a model using deceptive techniques against its own developer to subvert controls or monitoring. The required response is a report to a government agency within 15 days.
Paperwork, Not Firewalls

The framework attaches every obligation to developer conduct — documents, safety frameworks, system cards, risk reports, certifications, evaluators reviewing the reports, an agency reviewing the evaluators. [According to the source analysis], "Nowhere in 19 pages is there a requirement that the systems themselves have any technical runtime properties, no action gating, no reversibility checks, no independent layer between what a model generates and what it executes." The loss-of-control section admits this gap, calling its resilience agenda "less mature" and pointing at detection and shutdown of systems already out of control — a smoke detector for a building with no fire code.
The Aviation Precedent

Aviation hit this fork decades ago and chose differently. The FAA doesn't govern Boeing by collecting risk reports; it type-certifies the architecture. Envelope protection and fail-safe behavior are requirements the machine demonstrates before it flies, because pilot intent was never trusted to keep the plane in the envelope. [The source argues] Anthropic imported aviation's incident-reporting culture and skipped its certification core. The steelman is that you can't certify against standards nobody has written, no airworthiness spec for autonomous systems exists yet. That's the gap. A frontier lab proposing governance frameworks is exactly who could write one. Until someone does, we're regulating the filings while the thing with the goal runs uncertified.
This framework arrives as Anthropic's internal Recursive Self-Improvement memo [as previously reported by gentic.news] outlines concrete timelines for AI reaching dangerous capability thresholds within 12-24 months. The company's own timeline for risk suggests the 15-day reporting window may be too slow if the threat model is correct.
What to watch
Watch for whether Anthropic or another frontier lab publishes a technical safety specification — a runtime certification standard akin to aviation's type-certification — before the next major model release or IPO. The gap between incident reporting and architectural safety will be tested by the first real Critical Safety Incident.
Source: reddit.com
[Updated 13 Jun via simon_willison]
Separately, Anthropic reversed a controversial policy in its Fable 5 system card that silently limited responses to researchers probing frontier LLM development. The company apologized and will now visibly fall back to Opus 4.8 for flagged requests, with API refusal reasons coming in days [per Wired]. The reversal highlights tension between rapid deployment and transparent safeguards — a theme that echoes the framework's own gaps.









