model security

30 articles about model security in AI news

Anthropic Shows Anyone With a Laptop Can Poison Any Major AI Model

Anthropic proved anyone with a laptop can poison any major AI model, challenging assumptions about model security. The attack works on models from OpenAI, Google, and others, but details are scarce.

May 10, 202675% relevant

Anthropic's Claude Code Source Code Leaked and Forked in Major Open-Source AI Incident

Anthropic accidentally leaked the source code for Claude Code, its proprietary AI coding assistant, leading to a public fork that gained significant traction within hours. The incident represents a major unplanned open-sourcing of a commercial AI product and has sparked discussions about AI model security and open-source accessibility.

Apr 3, 2026100% relevant

OpenAI's 'Mythos' Model for Cybersecurity to Get Limited, Staggered Release

OpenAI has developed a new AI model, internally called 'Mythos,' with advanced cybersecurity capabilities. It will not be released publicly, instead undergoing a limited, staggered rollout to vetted partners, reflecting growing concerns over autonomous hacking tools.

Apr 9, 202689% relevant

OpenAI Launches GPT-Rosalind for Drug Discovery, GPT-5.4-Cyber for Security

OpenAI launched GPT-Rosalind, a life sciences model performing above the 95th percentile of human experts on novel biological data, and GPT-5.4-Cyber, a cybersecurity variant. These releases, alongside a major Agents SDK update, signal a pivot from general AI to specialized, high-stakes enterprise domains.

Apr 20, 202690% relevant

US Officials Warn Anthropic's 'Mythos' AI Poses Major Cybersecurity Threat

Senior US officials, including Jerome Powell, warn that Anthropic's highly advanced 'Mythos' AI model presents significant cybersecurity risks. Its powerful ability to find system vulnerabilities requires tight restrictions to prevent misuse.

Apr 10, 202695% relevant

MCP Security Crisis: 43% of Servers Vulnerable, 341 Malicious Skills Found

Security audits of the Model Context Protocol (MCP) ecosystem reveal 43% of servers are vulnerable to command execution, while 341 malicious skills were found on marketplaces, exposing systemic security flaws in agentic AI. The findings highlight a growing attack surface as AI agents become more autonomous.

Apr 9, 202677% relevant

Anthropic's 'Project Glassing' Opus-Beater Restricted to Security Researchers

Anthropic's new model, which outperforms Claude 3 Opus, is being released under 'Project Glassing' exclusively to vetted security researchers. This controlled rollout follows recent warnings from security experts about advanced AI risks.

Apr 7, 202685% relevant

AI Offensive Cybersecurity Capabilities Double Every 5.7 Months, Matching METR's AI Timelines

An independent analysis extends METR's AI capability timeline research to offensive cybersecurity, finding a 5.7-month doubling time. Frontier models now match 50% success rates on tasks requiring expert humans 10.5 hours.

Apr 3, 202685% relevant

Automate Kali Linux Security Tasks with This New MCP Server

Claude Code users can now automate Kali Linux security tools like Nmap and Metasploit via a new Model Context Protocol server, turning the editor into a security operations hub.

Apr 2, 202675% relevant

Claude Code's New Cybersecurity Guardrails: How to Keep Your Security Research Flowing

Claude Opus 4.6 is now aggressively blocking cybersecurity prompts. Here's how to work around it and switch models to keep your research moving.

Mar 28, 2026100% relevant

Anthropic's Opus 5 and OpenAI's 'Spud' Rumored as Major AI Leaps, Prompting Security Concerns

A Fortune report, cited on social media, claims Anthropic's upcoming Opus 5 model is a 'massive leap' from Claude 3.5 Sonnet, posing significant security risks. OpenAI is also rumored to have a similarly advanced model, 'Spud,' in development.

Mar 27, 202695% relevant

Claude 'Mythos' Leak Suggests New Tier Beyond Opus 4.6, Targeting Cybersecurity Partners First

A leak from a reportedly reliable source claims Anthropic is developing 'Claude Mythos,' a new tier beyond Opus 4.6 with major gains in coding, reasoning, and cybersecurity. The model is described as so compute-intensive that initial access will be limited to select cybersecurity partners.

Mar 27, 202699% relevant

Tessera Launches Open-Source Framework for 32 OWASP AI Security Tests, Benchmarks GPT-4o, Claude, Gemini, Llama 3

Tessera introduces the first open-source framework to run all 32 OWASP AI security tests against any model with one CLI command. It provides benchmark results for GPT-4o, Claude, Gemini, Llama 3, and Mistral across 21 model-specific security tests.

Mar 24, 202697% relevant

Claude Opus 4.6's Security Audit Power Is Now in Claude Code

The new Claude Opus 4.6 model, which found 500+ high-severity open-source flaws, is now available in Claude Code for automated security auditing.

Mar 21, 202680% relevant

NVIDIA Open-Sources NeMo Claw: A Local Security Sandbox for AI Agents

NVIDIA has open-sourced NeMo Claw, a security sandbox designed to run AI agents locally. It isolates models from cloud services, blocks unauthorized network calls, and secures model APIs via a single installation script.

Mar 18, 202697% relevant

Edge AI for Loss Prevention: Adaptive Pose-Based Detection for Luxury Retail Security

A new periodic adaptation framework enables edge devices to autonomously detect shoplifting behaviors from pose data, offering a scalable, privacy-preserving solution for luxury retail security with 91.6% outperformance over static models.

Mar 6, 202685% relevant

Skills as Untrusted Code: A Security Precedent for Agent Runtimes

Paper argues agent skills are untrusted code until verified; runtimes must enforce verification gates to prevent supply-chain attacks, echoing decades of software security lessons.

May 5, 2026100% relevant

Claude Security Public Beta Launches in Claude Code on Web

Anthropic launched Claude Security in public beta for Claude Code on web, letting developers validate and fix vulnerabilities without leaving the editor.

Apr 30, 2026100% relevant

Anthropic Ships Claude Security, a Standalone Code Vulnerability Scanner for Enterprise

Anthropic shipped Claude Security, a standalone code vulnerability scanner for Enterprise powered by Opus 4.7, directly targeting Snyk, Semgrep, and SonarQube.

Apr 30, 2026100% relevant

Research Paper Proposes Security Framework for Autonomous AI Agents in Commerce

A Systematization of Knowledge (SoK) paper analyzes the emerging threat landscape for autonomous LLM agents conducting commerce. It identifies 12 attack vectors across five dimensions and proposes a layered defense architecture. This is a foundational security analysis for a nascent but high-stakes technology.

Apr 20, 2026100% relevant

AI Agent Security Startup Emerges Amid Enterprise Rush, Per VC Tweet

A VC's tweet highlights a critical gap in enterprise AI agent adoption: security. This signals a market opportunity, with a new startup reportedly emerging to address it.

Apr 20, 202687% relevant

Claude Code Security Alert: Patch Now, Stop Using Authentication Helpers

A critical security leak reveals three command injection vulnerabilities in Claude Code. Users must update and stop using authentication helpers to prevent credential theft and supply chain attacks.

Apr 20, 2026100% relevant

Claude Code's Security Defaults: What It Ships When You Don't Ask

When building auth, uploads, and admin features, Claude Code defaults to importing bcrypt/JWT libraries while Codex uses standard library functions—neither adds rate limiting or security headers without explicit prompting.

Apr 15, 2026100% relevant

AI-Powered Password Leak Detection: A Critical Security Shift

Security experts are leveraging AI to detect when user passwords appear in data breaches, enabling immediate alerts. This shifts the security paradigm from periodic manual checks to continuous, automated monitoring.

Apr 13, 202685% relevant

Gen Z Workers Sabotage AI Rollouts, Risking Job Security

A new report details Gen Z workers actively undermining corporate AI adoption due to job security fears. This resistance paradoxically increases their replacement risk as AI-proficient 'power users' advance.

Apr 11, 202687% relevant

MLX Enables Local Grounded Reasoning for Satellite, Security, Robotics AI

Apple's MLX framework is enabling 'local grounded reasoning' for AI applications in satellite imagery, security systems, and robotics, moving complex tasks from the cloud to on-device processing.

Apr 11, 202685% relevant

Alpha Vision Unveils AI Security Agent at RILA Asset Protection Conference 2026

Alpha Vision showcased an AI agent for retail security at the RILA Retail Asset Protection Conference 2026. The announcement highlights the growing integration of autonomous AI systems into physical retail loss prevention strategies.

Apr 9, 202674% relevant

Keygraph Launches Shannon AI to Automate Web App Security Testing

Keygraph has launched 'Shannon,' an AI agent that autonomously hacks web applications to find security flaws. This positions AI as an offensive security tool for proactive defense.

Apr 7, 202687% relevant

Vulnetix VDB: Live Package Security Scanning Inside Claude Code

A new MCP server, Vulnetix VDB, provides real-time security scanning for package dependencies within Claude Code, helping developers catch vulnerabilities as they write code.

Apr 7, 202695% relevant

Audit Your MCP Servers in 10 Seconds with This Free Security Score API

A new free API gives Claude Code users a Lighthouse-style security score for any MCP server, revealing that 60% of scanned packages have vulnerabilities.

Mar 31, 202695% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety