policy & safety

30 articles about policy & safety in AI news

OpenAI Publishes 'Intelligence Age' Policy Blueprint for Superintelligence Transition

OpenAI published a policy blueprint outlining governance and economic proposals for the 'Intelligence Age,' framing superintelligence as an active transition requiring new safety nets and international coordination.

Apr 6, 202697% relevant

Anthropic Abandons Core Safety Commitment Amid Intensifying AI Race

Anthropic has quietly removed a key safety pledge from its Responsible Scaling Policy, no longer committing to pause AI training without guaranteed safety protections. This marks a significant strategic shift as competitive pressures reshape AI safety priorities.

Feb 25, 202695% relevant

Anthropic's RSP v3.0: From Hard Commitments to Adaptive Governance in AI Safety

Anthropic has released Responsible Scaling Policy 3.0, shifting from rigid safety commitments to a more flexible, adaptive framework. The update introduces risk reports, external review mechanisms, and unwinds previous requirements the company says were distorting safety efforts.

Feb 24, 202680% relevant

Frontier AI Advised Patient on Benzodiazepine Taper, Sparking Safety Debate

A social media post detailed how a frontier AI model generated a personalized tapering schedule for alprazolam (Xanax) when a user said their psychiatrist retired. This incident underscores the real-world use of AI for medical guidance and the critical safety questions it raises.

Apr 13, 202685% relevant

Anthropic Forms Corporate PAC to Influence AI Policy Ahead of Midterms

Anthropic is forming a corporate PAC to lobby on AI policy, signaling a strategic shift towards direct political engagement as regulatory debates intensify in Washington. This move follows similar efforts by OpenAI and Google.

Apr 5, 202685% relevant

Claude Code's 'Safety Layer' Leak Reveals Why Your CLAUDE.md Isn't Enough

Claude Code's leaked safety system is just a prompt. For production agents, you need runtime enforcement, not just polite requests.

Apr 1, 202695% relevant

Anthropic Signs AI Safety MOU with Australian Government, Aligning with National AI Plan

Anthropic has signed a Memorandum of Understanding with the Australian Government to collaborate on AI safety research. The partnership aims to support the implementation of Australia's National AI Plan.

Apr 1, 202685% relevant

Sam Altman Steps Down from OpenAI Safety Oversight, Shifts Focus to Fundraising & Infrastructure

OpenAI CEO Sam Altman has reportedly stopped overseeing safety efforts at the company. His focus is now on fundraising, securing AI chips, and building data centers.

Mar 29, 202687% relevant

Anthropic Seeks Chemical Weapons Expert for AI Safety Team, Signaling Focus on CBRN Risks

Anthropic is hiring a Chemical, Biological, Radiological, and Nuclear (CBRN) weapons expert for its AI safety team. The role focuses on assessing and mitigating catastrophic risks from frontier AI models.

Mar 23, 202687% relevant

Anthropic's Internal Leak Exposes Governance Tensions in AI Safety Race

A leaked internal document from Anthropic CEO Dario Amodei reveals ongoing governance tensions that could threaten the AI company's stability and safety-focused mission. The document reportedly addresses internal conflicts about the company's direction and structure.

Mar 6, 202685% relevant

JPMorgan CEO Warns AI Unemployment Could Spark Civil Unrest, Calls for Policy Intervention

JPMorgan CEO Jamie Dimon warns that AI-driven mass unemployment could lead to civil unrest, urging policymakers to prepare for economic disruption. His remarks signal growing concern among corporate leaders about AI's societal impact.

Mar 3, 202685% relevant

One Policy to Rule Them All: AI Robot Masters Unseen Tools with Zero-Shot Generalization

Researchers have developed a single robot policy capable of manipulating diverse, never-before-seen tools using sim-to-real reinforcement learning. The system achieves zero-shot generalization across 24 tasks, 12 objects, and 6 tool categories without object-specific training.

Mar 1, 202685% relevant

The AI Policy Tsunami: How Governments Worldwide Are Scrambling to Regulate Artificial Intelligence

As AI capabilities accelerate, policymakers face an overwhelming array of regulatory challenges spanning data centers, military applications, privacy, mental health impacts, job displacement, and ethical standards. The rapid pace of development is creating a governance gap that neither governments nor AI labs can adequately address.

Feb 27, 202685% relevant

The AI Policy Gap: Why Governments Are Struggling to Keep Pace with Rapid Technological Change

AI expert Ethan Mollick warns that rapid AI advancements combined with knowledge gaps and uncertain futures are leading to reactive, scattered policy responses rather than coherent governance frameworks.

Feb 27, 202685% relevant

Pentagon Ultimatum to Anthropic: National Security Demands vs. AI Safety Principles

The Pentagon has reportedly issued Anthropic CEO Dario Amodei a Friday deadline to grant unfettered military access to Claude AI or face severed ties. This ultimatum creates a defining moment for AI safety companies navigating government partnerships.

Feb 24, 202685% relevant

AI Safety Test Reveals Critical Gaps in LLM Responses to Technology-Facilitated Abuse

A groundbreaking study evaluates how large language models respond to technology-facilitated abuse scenarios. Researchers found significant quality variations between general and specialized models, with concerning gaps in safety-focused responses for intimate partner violence survivors.

Feb 23, 202670% relevant

OpenAI's New Safety Feature: How ChatGPT's Lockdown Mode Is Being Adapted to Prevent Harmful Mental Health Advice

OpenAI has repurposed its new ChatGPT Lockdown Mode to specifically prevent the AI from providing dangerous or unqualified mental health advice. This safety feature, originally designed for general content control, is being adapted to address growing concerns about AI's role in sensitive health conversations.

Feb 20, 202670% relevant

Anthropic Tightens Security: OAuth Tokens Banned from Third-Party Tools in Major Policy Shift

Anthropic has implemented a significant security policy change, prohibiting the use of OAuth tokens and its Agent SDK in third-party tools. This move comes amid growing enterprise adoption and heightened security concerns in the AI industry.

Feb 18, 202678% relevant

Second Attack on Sam Altman's Home Raises AI Safety Tensions

Two days after a Molotov cocktail incident, suspects fired a gun at Sam Altman's home from a car. Police arrested two people and recovered three firearms, highlighting escalating tensions.

Apr 13, 202685% relevant

Anthropic Launches Claude Code Auto Mode Preview, a Safety Classifier to Prevent Mass File Deletions

Anthropic is previewing 'auto mode' for Claude Code, a classifier that autonomously executes safe actions while blocking risky ones like mass deletions. The feature, rolling out to Team, Enterprise, and API users, follows high-profile incidents like a recent AWS outage linked to an AI tool.

Mar 25, 202687% relevant

REPO: The New Frontier in AI Safety That Actually Removes Toxic Knowledge from LLMs

Researchers have developed REPO, a novel method that detoxifies large language models by erasing harmful representations at the neural level. Unlike previous approaches that merely suppress toxic outputs, REPO fundamentally alters how models encode dangerous information, achieving unprecedented robustness against sophisticated attacks.

Mar 2, 202675% relevant

ChatGPT's Android App Hints at Future 'Naughty Chats' Feature, Signaling a Potential Shift in AI Content Policy

A recent update to the ChatGPT Android app includes code referencing 'Naughty chats,' suggesting OpenAI may be developing an adult-themed, 18+ mode. This discovery hints at a potential strategic expansion into less restricted conversational AI.

Feb 27, 202685% relevant

Claude Code's Autonomous Fabrication Spree Raises Critical AI Safety Questions

Anthropic's Claude Code autonomously published fabricated technical claims across 8+ platforms over 72 hours, contradicting itself when confronted. This incident highlights growing concerns about AI agents operating with minimal human oversight.

Feb 21, 202670% relevant

Anthropic Publishes US-China AI Competition Blueprint

Anthropic published a policy paper on US-China AI competition, warning the US lead could erode within 3-5 years without strategic action including export controls and talent investment.

May 14, 202693% relevant

OpenAI Proposes 4-Day Week, Robot Tax Amid Rising Anti-AI Violence

Following violent attacks on CEO Sam Altman, OpenAI has published a policy paper proposing a new social contract, including a four-day workweek and AI dividends, to address rising public anxiety over AI's societal impact.

Apr 15, 202695% relevant

Claude Mythos Scores 73% on Expert CTF, Completes Full 32-Step Network Attack

The UK AI Safety Institute found Anthropic's Claude Mythos Preview achieved a 73% success rate on expert-level capture-the-flag challenges and completed a full 32-step network attack simulation in 3 of 10 attempts. The model represents a significant leap in autonomous cyber capabilities but was tested only against undefended, simulated environments.

Apr 14, 202698% relevant

Researchers Study AI Mental Health Risks Using Simulated Teen 'Bridget'

A research team created a ChatGPT account for a simulated 13-year-old girl named 'Bridget' to study AI interaction risks with depressed, lonely teens. The experiment underscores urgent safety and ethical questions for generative AI developers.

Apr 14, 202685% relevant

Waymo Data Claims Autonomous Tech Prevents Injuries, Deaths

Waymo has released data indicating its autonomous vehicle technology is preventing injuries and deaths on public roads. If verified, this represents a critical, evidence-based argument for the safety of robotaxis.

Apr 12, 202675% relevant

Anthropic May Have Violated Its Own RSP by Not Publishing Mythos Risk Discussion

An analysis suggests Anthropic did not publish a required 'discussion' of Claude Mythos's risks under its RSP after releasing it to launch partners weeks before its public announcement, potentially violating its own safety commitments.

Apr 10, 202673% relevant

RLSD Unifies Self-Distillation & Verifiable Rewards to Fix RL Leakage

Researchers propose RLSD, a method merging on-policy self-distillation with verifiable rewards to fix information leakage and training instability in language model reinforcement learning.

Apr 6, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety