A detailed analysis published on LessWrong argues that Anthropic likely failed to comply with a key provision of its own Responsible Scaling Policy (RSP) when it released its advanced AI model, Claude Mythos Preview, to external partners. The core allegation is that Anthropic did not publish a required "discussion" of the new model's effect on its risk analysis, a step mandated by its RSP for certain deployments.
The Alleged Violation
Anthropic's RSP version 3.0, which was operative at the time, includes a Timing section (3.1). It states:
"When we publicly deploy a model that we determine is significantly more capable than any of the models covered in the most recent Risk Report, we will publish a discussion (in our System Card or elsewhere) of how that model’s capabilities and propensities affect or change analysis in the Risk Report."
The analysis hinges on the definition of "public deployment." Anthropic publicly announced Project Glasswing and the broader availability of Claude Mythos Preview on April 7, 2026. In its accompanying System Card, Anthropic explicitly states that "Claude Mythos Preview is significantly more capable than Claude Opus 4.6," the model covered in its last Risk Report. This would trigger the RSP's publication requirement.
However, evidence suggests external access began weeks earlier. Anthropic's own Alignment Risk Update for Mythos states: "It was deployed first within Anthropic, then released to a small set of external customers via a limited research access program."
The Project Glasswing announcement clarifies who these early users were: a group of launch partners including Amazon Web Services, Apple, Google, Microsoft, NVIDIA, and JPMorganChase. The announcement notes that "many of our partners have already been using Claude Mythos Preview for several weeks."
The central argument is that if releasing the model to "40 additional organizations" on April 7th constituted a public deployment (which Anthropic's System Card implies), then the earlier, more limited release to major tech and finance corporations must also qualify. No "discussion" of Mythos's impact on Anthropic's risk analysis was published at that earlier point, which the analysis contends is a violation of the RSP's immediate publication requirement for a public deployment.
A Second, Ambiguous Requirement
The RSP also has a second, related requirement for internally deployed models that are deemed "in-scope" (posing significant risks beyond public models). For these, Anthropic must publish the required discussion within 30 days of determining the model is in-scope.
The analysis finds this point less clear-cut. It's ambiguous whether Anthropic formally determined Mythos to be an "in-scope" internal model during its initial internal deployment phase. The author speculates Anthropic would likely argue it was in the clear on this 30-day rule, but that the case for violating the public deployment rule is stronger.
The Implications: Process Over Catastrophe?
The author concludes that this specific oversight likely has minimal direct impact on existential risk (x-risk). The substantive risk analysis of Mythos was eventually published in its System Card. The greater concern is organizational process failure.
"In some sense this is all nitpicking over details that I don't think matter much for x-risk, except to the extent that it reveals an organizational inability to create and follow a checklist. That is a bad skill to be lacking."
The incident highlights ambiguities in the RSP framework itself, particularly around definitions like "public deployment." The analysis suggests these gaps should be fixed to prevent similar disputes and ensure clearer accountability.
gentic.news Analysis
This incident is a critical stress test for the AI safety governance frameworks that leading labs have adopted. Anthropic's RSP, alongside OpenAI's Preparedness Framework, represents a core part of the industry's move toward formalized self-regulation. A potential violation, even a procedural one, undermines the credibility of these voluntary commitments. It provides concrete evidence for critics who argue self-governance is insufficient and bolsters calls for external, mandatory oversight.
The context is important. This follows Anthropic's high-profile launch of Claude 3.5 Sonnet in 2024 and the subsequent Claude 3.5 Haiku in 2025, which solidified its position as a top-tier model provider. The release of Claude Mythos in early 2026 marked another significant capability jump, aimed squarely at competitive benchmarks in reasoning and coding. In this heated competitive landscape—where Anthropic, OpenAI, and Google DeepMind are racing to deploy increasingly capable models—the pressure to ship and secure strategic partnerships (like those with AWS, Google, and Microsoft noted in Project Glasswing) can conflict with deliberate safety pacing. This analysis suggests that pressure may have caused an internal process to slip.
The entities involved are notable. The "launch partners" who received early access are among the most powerful tech and financial institutions globally. If the goal of an RSP is to mitigate risk from powerful AI, an argument can be made that releasing a "significantly more capable" model to these partners first represents a higher-risk deployment than a broader, more public release later, due to the partners' resources and potential for rapid integration. This irony underscores the complexity of defining "risk" in practice.
For practitioners, this is a case study in implementation governance. It's one thing to publish a policy document; it's another to have the internal tracking, legal review, and release gatekeeping to ensure compliance. As models move from research to complex commercial deployment pathways (limited previews, partner APIs, enterprise contracts), labs need robust internal systems to map these deployments onto their safety policies. This analysis indicates a possible shortfall at Anthropic on that front.
Frequently Asked Questions
What is Anthropic's Responsible Scaling Policy (RSP)?
Anthropic's RSP is a public commitment detailing how the company plans to manage risks as it develops increasingly capable AI models. It outlines specific safety thresholds, evaluation procedures, and deployment restrictions tied to model capabilities. The policy is intended to provide transparency and a structured approach to AI safety.
What was Claude Mythos Preview?
Claude Mythos Preview was an advanced AI model released by Anthropic in early 2026 as part of "Project Glasswing," an initiative focused on cybersecurity. It was described as significantly more capable than its predecessor, Claude Opus 4.6, particularly in complex reasoning and code analysis tasks for security applications.
Did Anthropic publish any safety analysis for Claude Mythos?
Yes, Anthropic published a detailed System Card for Claude Mythos Preview on April 7, 2026, which included extensive safety evaluations, an Alignment Risk Update, and a section on "RSP decision-making." The controversy is about the timing of this publication relative to the model's earliest external releases, not a complete absence of analysis.
Why does this procedural issue matter if the model was eventually analyzed?
While the direct risk may have been mitigated, procedural failures erode trust in a company's safety commitments. If a lab cannot reliably follow its own publicly stated rules for a high-profile release, it raises doubts about its ability to enforce stricter, more critical safeguards in higher-stakes future scenarios. It also highlights ambiguities in safety frameworks that need clarification.








