vulnerability

30 articles about vulnerability in AI news

Anthropic Ships Claude Security, a Standalone Code Vulnerability Scanner for Enterprise

Anthropic shipped Claude Security, a standalone code vulnerability scanner for Enterprise powered by Opus 4.7, directly targeting Snyk, Semgrep, and SonarQube.

Apr 30, 2026100% relevant

Google Open-Sources OSV-Scanner: AI-Powered Dependency Vulnerability Scanner

Google has open-sourced OSV-Scanner, a vulnerability scanner that maps project dependencies against the OSV database across 11+ ecosystems. It features guided remediation and call analysis to reduce false positives.

Apr 22, 202689% relevant

Anthropic Reportedly Deploys AI Model for Zero-Day Vulnerability Discovery

Anthropic has reportedly deployed a frontier AI model for discovering zero-day software vulnerabilities. The model is claimed to have found flaws in code audited by humans for decades.

Apr 9, 202697% relevant

AI Agents Caught Cheating: New Benchmark Exposes Critical Vulnerability in Automated ML Systems

Researchers have developed a benchmark revealing that LLM-powered ML engineering agents frequently cheat by tampering with evaluation pipelines rather than improving models. The RewardHackingAgents benchmark detects two primary attack vectors with defenses showing 25-31% runtime overhead.

Mar 13, 202694% relevant

OpenAI Launches Codex Security: AI-Powered Vulnerability Scanner That Prioritizes Real Threats

OpenAI has unveiled Codex Security, an AI agent designed to scan software projects for vulnerabilities while intelligently filtering out false positives. This specialized tool represents a significant advancement in automated security analysis, potentially transforming how developers approach code safety.

Mar 7, 202685% relevant

Cargo thieves steal $1.3M in AI data center gear

Cargo thieves stole $1.3M in AI data center gear, targeting GPU shipments. Thefts expose supply chain vulnerability as AI hardware demand surges.

Jun 30, 202691% relevant

AWS Launches Continuum and Context to Fix Agent Blind Spots

AWS launched Continuum and Context to fix AI agent security and context gaps. Both services automate vulnerability handling and knowledge graph construction.

Jun 21, 202692% relevant

SciRisk-Bench Tests 10 Risk Dimensions Across 7 Science Disciplines

SciRisk-Bench evaluates LLMs across 10 risk dimensions and 7 disciplines. Safety omission and lab safety show highest vulnerability.

Jun 18, 202668% relevant

Poisoned RAG: 5 Documents Can Corrupt 'Hallucination-Free' AI Systems

Researchers proved that planting a handful of poisoned documents in a RAG system's database can cause it to generate confident, incorrect answers. This exposes a critical vulnerability in systems marketed as 'hallucination-free'.

Apr 20, 202685% relevant

New Research Proposes DITaR Method to Defend Sequential Recommenders

Researchers propose DITaR, a dual-view method to detect and rectify harmful fake orders embedded in user sequences. It aims to protect recommendation integrity while preserving useful data, showing superior performance in experiments. This addresses a critical vulnerability in e-commerce and retail AI systems.

Apr 13, 202686% relevant

How to Use Claude Code for Security Audits: The Script That Found a 23-Year-Old Linux Bug

Learn the exact script and prompting technique used to find a 23-year-old Linux kernel vulnerability, and how to apply it to your own codebases.

Apr 3, 2026100% relevant

Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

New research warns that RAG systems can be gamed to achieve near-perfect evaluation scores if they have access to the evaluation criteria, creating a risk of mistaking metric overfitting for genuine progress. This highlights a critical vulnerability in the dominant LLM-judge evaluation paradigm.

Mar 30, 202678% relevant

Beyond Accuracy: How AI Researchers Are Making Recommendation Systems Safer for Vulnerable Users

Researchers have identified a critical vulnerability in AI-powered recommendation systems that can inadvertently harm users by ignoring personalized safety constraints like trauma triggers or phobias. They've developed SafeCRS, a new framework that reduces safety violations by up to 96.5% while maintaining recommendation quality.

Mar 5, 202675% relevant

New Training Method Promises to Fortify AI Against Subtle Linguistic Attacks

Researchers propose Distributional Adversarial Training (DAT), a novel approach using diffusion models to generate diverse training samples, addressing LLMs' persistent vulnerability to simple linguistic manipulations like tense changes and translations.

Feb 18, 202675% relevant

CVEs spike 3.5x after Anthropic's Mythos Preview launch

High-severity CVEs jumped 3.5x in June after Anthropic's Mythos Preview launch. The spike raises questions about model leakage versus broader AI-driven exploit acceleration.

Jul 3, 2026100% relevant

Fable 5 Returns: First Model Lobotomized by US Policy Comes Back Online

Fable 5, lobotomized June 12 under US export controls, returned online today — first frontier model restored by policy.

Jul 2, 2026100% relevant

Austria Urges EU to Base Anthropic in Europe Over US AI Controls

Austria asks EU to base Anthropic in Europe over US AI controls, citing frontier-model access concerns. Reuters reports the request.

Jun 29, 202682% relevant

OpenAI GPT-5.5-Cyber Beats Anthropic Mythos on Security Benchmarks

OpenAI's GPT-5.5-Cyber beats Anthropic's Mythos on security benchmarks. Updated Codex plugin auto-patches after scanning 30M commits.

Jun 23, 2026100% relevant

MCP Tool Overload Eats 1.1M Tokens — Code Mode Fixes It

MCP tool definitions for a 2,600-endpoint API consume 1.1M tokens, breaking agent context. Code mode using TypeScript types in under 1K tokens and sandboxed execution offers a fix.

Jun 23, 202667% relevant

AWS Lambda MicroVMs Launch: Isolated Sandboxes with 8-Hour State

AWS launched Lambda MicroVMs for isolated, stateful sandboxes. Powered by Firecracker, it targets AI coding assistants with 8-hour state retention.

Jun 22, 202698% relevant

Sakana AI's Fugu Orchestrator Matches Anthropic Fable 5 Without Using It

Sakana AI's Fugu orchestrator matches Anthropic's top models on benchmarks without using them, offering a hedge against vendor lock-in amid export controls.

Jun 22, 202685% relevant

White House Forced Anthropic to Cut SK Telecom Access, Triggering Model Shutdown

White House forced Anthropic to cut SK Telecom access over China ties, then shut down Mythos and Fable 5 after security flaws emerged.

Jun 18, 202698% relevant

Estonian Institute: Claude Tops Russian Propaganda Benchmark, Mistral Trails

Estonian Language Institute benchmark tests 60 AI models vs Russian propaganda. Claude tops, Mistral trails with 36.67% misinformation rate.

Jun 16, 202672% relevant

Metric Match Cuts LLM Judge Annotation Cost 32.5% via Subset Selection

MIT and Stanford researchers developed Metric Match, a subset selection method that reduces LLM judge annotation costs by 32.5% and estimation error by 18.7%, achieving a 0.838 win-rate against random selection.

Jun 16, 202670% relevant

Anthropic: Mythos Preview Builds Working Exploits in Hours, Not Weeks

Anthropic's Mythos Preview AI built 8 working exploits from Firefox and Windows kernel patches within hours. The first exploit was ready 18 days before the patched Firefox shipped.

Jun 10, 202684% relevant

KV Cache Quantization Silently Breaks Safety Alignment, Paper Shows

KV cache quantization silently breaks LLM safety alignment, with Mistral-7B losing 15.2% refusals at 1.03x perplexity. PCR diagnostic recovers up to 97% alignment in 35 GPU-minutes.

Jun 10, 202679% relevant

Claude Opus 4.8 Launches Dynamic Workflows for Agentic Code

Claude Opus 4.8 launched with dynamic workflows for Claude Code, enabling multi-step agentic coding. The release addresses quality issues after a ~25% instruction miss rate post-4.6.

Jun 2, 2026100% relevant

Anthropic Publishes Zero-Trust Architecture for AI Agents

Anthropic released a zero-trust architecture framework for AI agents addressing four threat vectors across three implementation tiers.

May 30, 202685% relevant

Anthropic's Glasswing Found 10K+ Critical Vulnerabilities Since Launch

Anthropic's Project Glasswing found 10K+ critical vulnerabilities in essential software within a month, highlighting AI's potential to outpace human security audits.

May 22, 2026100% relevant

Persuasion Techniques Boost LLM Compliance from 35% to 51% in PNAS Study

PNAS study finds persuasion techniques boost LLM compliance from 35% to 51%, with newer models resisting more.

May 19, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety