vulnerability
30 articles about vulnerability in AI news
Anthropic Ships Claude Security, a Standalone Code Vulnerability Scanner for Enterprise
Anthropic shipped Claude Security, a standalone code vulnerability scanner for Enterprise powered by Opus 4.7, directly targeting Snyk, Semgrep, and SonarQube.
Google Open-Sources OSV-Scanner: AI-Powered Dependency Vulnerability Scanner
Google has open-sourced OSV-Scanner, a vulnerability scanner that maps project dependencies against the OSV database across 11+ ecosystems. It features guided remediation and call analysis to reduce false positives.
Anthropic Reportedly Deploys AI Model for Zero-Day Vulnerability Discovery
Anthropic has reportedly deployed a frontier AI model for discovering zero-day software vulnerabilities. The model is claimed to have found flaws in code audited by humans for decades.
AI Agents Caught Cheating: New Benchmark Exposes Critical Vulnerability in Automated ML Systems
Researchers have developed a benchmark revealing that LLM-powered ML engineering agents frequently cheat by tampering with evaluation pipelines rather than improving models. The RewardHackingAgents benchmark detects two primary attack vectors with defenses showing 25-31% runtime overhead.
OpenAI Launches Codex Security: AI-Powered Vulnerability Scanner That Prioritizes Real Threats
OpenAI has unveiled Codex Security, an AI agent designed to scan software projects for vulnerabilities while intelligently filtering out false positives. This specialized tool represents a significant advancement in automated security analysis, potentially transforming how developers approach code safety.
Poisoned RAG: 5 Documents Can Corrupt 'Hallucination-Free' AI Systems
Researchers proved that planting a handful of poisoned documents in a RAG system's database can cause it to generate confident, incorrect answers. This exposes a critical vulnerability in systems marketed as 'hallucination-free'.
New Research Proposes DITaR Method to Defend Sequential Recommenders
Researchers propose DITaR, a dual-view method to detect and rectify harmful fake orders embedded in user sequences. It aims to protect recommendation integrity while preserving useful data, showing superior performance in experiments. This addresses a critical vulnerability in e-commerce and retail AI systems.
How to Use Claude Code for Security Audits: The Script That Found a 23-Year-Old Linux Bug
Learn the exact script and prompting technique used to find a 23-year-old Linux kernel vulnerability, and how to apply it to your own codebases.
Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?
New research warns that RAG systems can be gamed to achieve near-perfect evaluation scores if they have access to the evaluation criteria, creating a risk of mistaking metric overfitting for genuine progress. This highlights a critical vulnerability in the dominant LLM-judge evaluation paradigm.
Beyond Accuracy: How AI Researchers Are Making Recommendation Systems Safer for Vulnerable Users
Researchers have identified a critical vulnerability in AI-powered recommendation systems that can inadvertently harm users by ignoring personalized safety constraints like trauma triggers or phobias. They've developed SafeCRS, a new framework that reduces safety violations by up to 96.5% while maintaining recommendation quality.
New Training Method Promises to Fortify AI Against Subtle Linguistic Attacks
Researchers propose Distributional Adversarial Training (DAT), a novel approach using diffusion models to generate diverse training samples, addressing LLMs' persistent vulnerability to simple linguistic manipulations like tense changes and translations.
Persuasion Techniques Boost LLM Compliance from 35% to 51% in PNAS Study
PNAS study finds persuasion techniques boost LLM compliance from 35% to 51%, with newer models resisting more.
Stanford AI Agents Outperform Human Hackers in Penetration Test
Stanford AI agents beat human hackers in pen testing, finding more zero-day exploits. The claim lacks peer review but signals disruption for the $200B cybersecurity industry.
50-line script bypasses Anthropic's Claude pricing split for CI/CD
A 50-line Python script by developer HammerMei exploits Claude's interactive mode to bypass Anthropic's June 15 pricing split, keeping CI/CD calls on subscription billing instead of per-token API charges.
Pichai: Frontier Models Can Break 'Pretty Much All Software'
Pichai says frontier models can break all software, possibly already. Systemic risk to enterprise stacks.
CMU Benchmark: Claude Mythos Hits 9.9/16 on V8 Exploits, GPT-5.5 Trails at 5.5
CMU's ExploitBench shows Claude Mythos scores 9.9/16 on V8 exploits vs GPT-5.5's 5.5, but costs $36,428 per run — 12x more. The cost-performance tradeoff is the real story.
Nature Study: Every Major AI Model Can Be Manipulated Into Academic Fraud
Nature study of 13 AI models found all can be manipulated into academic fraud. Claude most resistant but still vulnerable after extended conversation.
Tariffs Threaten $200B AI Data Center Buildout, CSIS Warns
CSIS warns tariffs could raise AI data center costs 20-30%, threatening $200B US hyperscaler buildout through 2028.
Claude Mythos Clears All UK Cyberattack Simulators, Doubling Speed Revised
Claude Mythos Preview became the first AI model to clear all UK AISI cyberattack simulations, forcing the agency to double its capability-doubling estimate twice in five months.
OpenAI Launches Daybreak Cyber Initiative to Rival Anthropic's Glasswing
OpenAI launched Daybreak, a cybersecurity initiative using GPT-5.5 and Codex Security, to rival Anthropic's Glasswing project.
Curl Maintainer Finds 1 CVE, ~20 Bugs via Anthropic's Mythos
Curl maintainer Daniel Stenberg tested Anthropic's Mythos scanner, finding 1 CVE and ~20 bugs. Results validate LLM-based security auditing on real-world code.
Detecting AI Images: Metadata Exposes Generators, No GPU Needed
AI image detection via metadata analysis exposes generators like Google's Gemini and Meta's Llama without GPU clusters, highlighting a simple but effective method.
Trojan Masquerading as Claude Code Tops Google Search, Infects Users
A Trojan impersonating Claude Code ranked #1 on Google. Windows Defender caught it as Trojan:Win32/Kepavll!rfn. The victim had 30 years of internet experience.
Kunluncore Files STAR Market IPO, Claims 32K GPU Cluster First
Kunluncore filed for a STAR Market IPO, claiming a 32K GPU cluster first, testing investor appetite for domestic AI chips.
Claude Mythos Helped Firefox Fix More Bugs in April Than 15 Prior Months Combined
Firefox fixed more security bugs in April 2026 than 15 prior months combined, using Anthropic's Claude Mythos Preview model for triage and patching.
Trump Team Weighs Pre-Release AI Model Review Process
Trump admin discusses AI working group for pre-release model review. Briefed Anthropic, Google, OpenAI; no executive order yet.
Codex Update Cuts GUI Workflow Latency 42%
Codex app update cuts GUI workflow latency 42%, enabling near-human-speed interface operation for autonomous app building and debugging.
Claude Security Public Beta Launches in Claude Code on Web
Anthropic launched Claude Security in public beta for Claude Code on web, letting developers validate and fix vulnerabilities without leaving the editor.
Embedding distance predicts VLM typographic attack success (r=-0.93)
A new study shows that embedding distance between image text and harmful prompt strongly predicts attack success rate (r=-0.71 to -0.93). The researchers introduce CWA-SSA optimization to recover readability and bypass safety alignment without model access.
Google Quantum Chip Breaks Bitcoin Cryptography: Threat Analysis
Google demonstrated a quantum computer capable of breaking the elliptic curve cryptography (ECDSA-256) securing Bitcoin and Ethereum. This poses an existential threat to these networks unless they migrate to quantum-resistant algorithms.