At a recent demonstration, Nicholas Carlini, a research scientist at Anthropic, showcased how the company's Claude large language model can autonomously discover and exploit zero-day vulnerabilities in major software systems. The live demonstration revealed Claude finding critical security flaws in both the Ghost content management system and the Linux kernel within 90 minutes.
What Happened
During the presentation, Carlini showed Claude identifying a blind Structured Query Language (SQL) injection vulnerability in Ghost CMS, a popular open-source platform with approximately 50,000 GitHub stars. According to Carlini, Ghost had never previously reported a critical security vulnerability in its history. The AI not only discovered the SQL injection flaw but also exploited it to obtain an admin Application Programming Interface (API) key.
Claude then repeated the same approach against the Linux kernel, successfully identifying and exploiting another vulnerability. The demonstration was recorded and shared through the 'unprompted' YouTube channel.
The Research Context
Carlini's presentation delivered what he described as a "stark warning" about large language models crossing a critical threshold in offensive security capabilities. According to his research, LLMs can now autonomously discover and exploit zero-day vulnerabilities in major, heavily-audited software systems using surprisingly minimal scaffolding built around Claude.
The Anthropic research team has reportedly uncovered more than 500 high-severity vulnerabilities using this approach. Carlini presented two detailed case studies during his talk:
- Ghost CMS SQL Injection: A previously unknown vulnerability in the popular content management system
- Linux Kernel NFS Heap Overflow: A vulnerability dating back to 2003 in the Network File System implementation
Carlini also presented data showing exponential growth in these capabilities using METR (Machine Evaluation of Text-to-Text Transformer Representations) metrics, suggesting that AI-powered offensive security capabilities are advancing at a pace that may soon outstrip current defensive measures.
Technical Implications
The demonstration suggests that LLMs require only minimal scaffolding—structured prompts and tool integration—to perform sophisticated vulnerability discovery and exploitation tasks that previously required extensive human expertise. This represents a significant shift in the offensive security landscape, where AI systems can now systematically probe software for weaknesses at scale.
Carlini argued that the security community must urgently prepare for a world where AI-powered offensive capabilities far outpace current defenses. The ability to discover vulnerabilities in systems as fundamental as the Linux kernel—which receives extensive security scrutiny from thousands of developers worldwide—indicates that even the most thoroughly reviewed codebases may be vulnerable to AI-assisted discovery methods.
gentic.news Analysis
This demonstration represents a significant escalation in the documented offensive capabilities of large language models. While previous research has shown LLMs assisting with vulnerability discovery, Carlini's live demonstration of Claude autonomously exploiting vulnerabilities in production systems marks a notable advancement. This follows Anthropic's established focus on AI safety and alignment research, suggesting the company is taking a proactive approach to understanding and mitigating potential risks from advanced AI systems.
The research aligns with growing concerns in the cybersecurity community about AI-powered offensive tools. Just last month, we covered OpenAI's release of cybersecurity-specific capabilities in GPT-4o, which included improved vulnerability analysis features. However, Anthropic's demonstration goes beyond analysis to show actual exploitation capabilities, representing a more advanced stage in the offensive AI development timeline.
From a technical perspective, the most concerning aspect is the minimal scaffolding required. Previous AI security research typically involved extensive fine-tuning or specialized architectures, but Carlini's approach suggests that general-purpose LLMs with relatively simple tool integration can achieve sophisticated results. This lowers the barrier to entry for AI-powered offensive security tools, potentially making them accessible to a wider range of actors.
The exponential growth curve presented using METR data suggests we may be approaching an inflection point where AI vulnerability discovery becomes more efficient than human-led security auditing. This would fundamentally change software security practices, potentially necessitating AI-powered defensive systems to match the offensive capabilities.
Frequently Asked Questions
What is a zero-day vulnerability?
A zero-day vulnerability is a security flaw in software that is unknown to the vendor or developers, meaning they have had zero days to fix it. These vulnerabilities are particularly dangerous because they can be exploited by attackers before any patch or mitigation is available.
How does Claude find these vulnerabilities?
According to Nicholas Carlini's presentation, Claude uses a "minimal scaffold" approach—structured prompts and tool integration that allow the LLM to systematically probe software systems for security weaknesses. The AI can analyze code, test inputs, and identify patterns that might indicate vulnerabilities, then develop and execute exploits to confirm their existence.
What software was vulnerable in the demonstration?
The demonstration showed vulnerabilities in two systems: Ghost CMS (a popular content management system with 50,000 GitHub stars) and the Linux kernel. The Ghost vulnerability was a blind SQL injection, while the Linux kernel vulnerability was a heap overflow in the Network File System (NFS) implementation dating back to 2003.
How many vulnerabilities has this approach discovered?
Carlini reported that Anthropic's research using this approach has uncovered more than 500 high-severity vulnerabilities across various software systems. The demonstration of the Ghost and Linux kernel vulnerabilities served as specific case studies of this broader research effort.









