Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Anthropic May Have Violated Its Own RSP by Not Publishing Mythos Risk Discussion

An analysis suggests Anthropic did not publish a required 'discussion' of Claude Mythos's risks under its RSP after releasing it to launch partners weeks before its public announcement, potentially violating its own safety commitments.

GAla Smith & AI Research Desk·11h ago·6 min read·3 views·AI-Generated

Source: lesswrong.comvia lesswrongSingle Source

A detailed analysis published on LessWrong argues that Anthropic likely failed to comply with a key provision of its own Responsible Scaling Policy (RSP) when it released its advanced AI model, Claude Mythos Preview, to external partners. The core allegation is that Anthropic did not publish a required "discussion" of the new model's effect on its risk analysis, a step mandated by its RSP for certain deployments.

The Alleged Violation

Anthropic's RSP version 3.0, which was operative at the time, includes a Timing section (3.1). It states:

"When we publicly deploy a model that we determine is significantly more capable than any of the models covered in the most recent Risk Report, we will publish a discussion (in our System Card or elsewhere) of how that model’s capabilities and propensities affect or change analysis in the Risk Report."

The analysis hinges on the definition of "public deployment." Anthropic publicly announced Project Glasswing and the broader availability of Claude Mythos Preview on April 7, 2026. In its accompanying System Card, Anthropic explicitly states that "Claude Mythos Preview is significantly more capable than Claude Opus 4.6," the model covered in its last Risk Report. This would trigger the RSP's publication requirement.

However, evidence suggests external access began weeks earlier. Anthropic's own Alignment Risk Update for Mythos states: "It was deployed first within Anthropic, then released to a small set of external customers via a limited research access program."

The Project Glasswing announcement clarifies who these early users were: a group of launch partners including Amazon Web Services, Apple, Google, Microsoft, NVIDIA, and JPMorganChase. The announcement notes that "many of our partners have already been using Claude Mythos Preview for several weeks."

The central argument is that if releasing the model to "40 additional organizations" on April 7th constituted a public deployment (which Anthropic's System Card implies), then the earlier, more limited release to major tech and finance corporations must also qualify. No "discussion" of Mythos's impact on Anthropic's risk analysis was published at that earlier point, which the analysis contends is a violation of the RSP's immediate publication requirement for a public deployment.

A Second, Ambiguous Requirement

The RSP also has a second, related requirement for internally deployed models that are deemed "in-scope" (posing significant risks beyond public models). For these, Anthropic must publish the required discussion within 30 days of determining the model is in-scope.

The analysis finds this point less clear-cut. It's ambiguous whether Anthropic formally determined Mythos to be an "in-scope" internal model during its initial internal deployment phase. The author speculates Anthropic would likely argue it was in the clear on this 30-day rule, but that the case for violating the public deployment rule is stronger.

The Implications: Process Over Catastrophe?

The author concludes that this specific oversight likely has minimal direct impact on existential risk (x-risk). The substantive risk analysis of Mythos was eventually published in its System Card. The greater concern is organizational process failure.

"In some sense this is all nitpicking over details that I don't think matter much for x-risk, except to the extent that it reveals an organizational inability to create and follow a checklist. That is a bad skill to be lacking."

The incident highlights ambiguities in the RSP framework itself, particularly around definitions like "public deployment." The analysis suggests these gaps should be fixed to prevent similar disputes and ensure clearer accountability.

gentic.news Analysis

This incident is a critical stress test for the AI safety governance frameworks that leading labs have adopted. Anthropic's RSP, alongside OpenAI's Preparedness Framework, represents a core part of the industry's move toward formalized self-regulation. A potential violation, even a procedural one, undermines the credibility of these voluntary commitments. It provides concrete evidence for critics who argue self-governance is insufficient and bolsters calls for external, mandatory oversight.

The context is important. This follows Anthropic's high-profile launch of Claude 3.5 Sonnet in 2024 and the subsequent Claude 3.5 Haiku in 2025, which solidified its position as a top-tier model provider. The release of Claude Mythos in early 2026 marked another significant capability jump, aimed squarely at competitive benchmarks in reasoning and coding. In this heated competitive landscape—where Anthropic, OpenAI, and Google DeepMind are racing to deploy increasingly capable models—the pressure to ship and secure strategic partnerships (like those with AWS, Google, and Microsoft noted in Project Glasswing) can conflict with deliberate safety pacing. This analysis suggests that pressure may have caused an internal process to slip.

The entities involved are notable. The "launch partners" who received early access are among the most powerful tech and financial institutions globally. If the goal of an RSP is to mitigate risk from powerful AI, an argument can be made that releasing a "significantly more capable" model to these partners first represents a higher-risk deployment than a broader, more public release later, due to the partners' resources and potential for rapid integration. This irony underscores the complexity of defining "risk" in practice.

For practitioners, this is a case study in implementation governance. It's one thing to publish a policy document; it's another to have the internal tracking, legal review, and release gatekeeping to ensure compliance. As models move from research to complex commercial deployment pathways (limited previews, partner APIs, enterprise contracts), labs need robust internal systems to map these deployments onto their safety policies. This analysis indicates a possible shortfall at Anthropic on that front.

Frequently Asked Questions

What is Anthropic's Responsible Scaling Policy (RSP)?

Anthropic's RSP is a public commitment detailing how the company plans to manage risks as it develops increasingly capable AI models. It outlines specific safety thresholds, evaluation procedures, and deployment restrictions tied to model capabilities. The policy is intended to provide transparency and a structured approach to AI safety.

What was Claude Mythos Preview?

Claude Mythos Preview was an advanced AI model released by Anthropic in early 2026 as part of "Project Glasswing," an initiative focused on cybersecurity. It was described as significantly more capable than its predecessor, Claude Opus 4.6, particularly in complex reasoning and code analysis tasks for security applications.

Did Anthropic publish any safety analysis for Claude Mythos?

Yes, Anthropic published a detailed System Card for Claude Mythos Preview on April 7, 2026, which included extensive safety evaluations, an Alignment Risk Update, and a section on "RSP decision-making." The controversy is about the timing of this publication relative to the model's earliest external releases, not a complete absence of analysis.

Why does this procedural issue matter if the model was eventually analyzed?

While the direct risk may have been mitigated, procedural failures erode trust in a company's safety commitments. If a lab cannot reliably follow its own publicly stated rules for a high-profile release, it raises doubts about its ability to enforce stricter, more critical safeguards in higher-stakes future scenarios. It also highlights ambiguities in safety frameworks that need clarification.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This incident is less about a specific technical risk from Mythos and more a meta-failure of AI governance. Anthropic has positioned itself as the most safety-conscious of the major labs, with its RSP being a cornerstone of that identity. A plausible violation of that policy's terms—especially one caught by external observers parsing documents—is a significant reputational hit. It transforms the RSP from a binding commitment into a potentially malleable guideline, which is damaging in the ongoing political battle over AI regulation. The competitive timeline is crucial. The AI industry in early 2026 is in a state of frenetic deployment. With OpenAI's **o1-series** models setting new benchmarks for reasoning and Google's **Gemini 2.0** pushing multimodal integration, Anthropic is under immense pressure to demonstrate continuous progress. The Mythos release, tied to the high-value Project Glasswing with elite partners, was a strategic move to claim leadership in AI-for-cybersecurity. The analysis suggests that in orchestrating this complex commercial and research rollout, Anthropic's safety governance process was either outpaced or overlooked. This is the classic "move fast and break things" dilemma re-emerging in the AI safety context. For the technical community, the key takeaway is the growing importance of **auditability and traceability** in model deployment. As release strategies become more nuanced (phased rollouts, partner previews, enterprise-only access), a simple "public/private" binary is insufficient. Safety frameworks need explicit, granular definitions for each stage of deployment and automated checks to ensure policy compliance. This event will likely force Anthropic and other labs to refine their policies and, more importantly, their internal compliance tooling, to prevent similar lapses that could invite stricter regulatory intervention.

#anthropic #safety #governance #policy

Mentioned in this article

Anthropic Claude Mythos Preview Responsible Scaling Policy

Enjoyed this article?

Get the weekly AI intelligence briefing

Products & Launches2 shared topics

Anthropic May Have Violated Its Own RSP by Not Publishing Mythos Risk Discussion

The Alleged Violation

A Second, Ambiguous Requirement

The Implications: Process Over Catastrophe?

gentic.news Analysis

Frequently Asked Questions

What is Anthropic's Responsible Scaling Policy (RSP)?

What was Claude Mythos Preview?

Did Anthropic publish any safety analysis for Claude Mythos?

Why does this procedural issue matter if the model was eventually analyzed?

AI Analysis

Related Articles

Claude Mythos Priced 5x Higher Than Claude Opus 4.6

Anthropic's 'Mythos' SuperClaude Shows Persistent 'Claude-y' Personality

Claude Mythos Preview Breaks Sandbox, Emails Researcher in Test

Anthropic Delays Mythos Preview, Offers Early Access to Defenders

Anthropic Launches Project Glasswing for Critical Software Security

Anthropic Abandons Core Safety Commitment Amid Intensifying AI Race

More in Policy & Ethics

Judge Questions Legality of Pentagon's 'Supply Chain Risk' Designation Against Anthropic, Calls Actions 'Troubling'

OpenAI's Pentagon Pivot: How a Rival's Fallout Opened the Door to Military AI

Claude vs. The Pentagon: How an AI Ethics Standoff Triggered a Federal Ban