Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Two bar charts comparing OpenAI GPT-5.5-Cyber and Anthropic Mythos scores on CyberGym, ExploitGym, and SEC-bench…
AI ResearchBreakthroughScore: 100

OpenAI GPT-5.5-Cyber Beats Anthropic Mythos on Security Benchmarks

OpenAI's GPT-5.5-Cyber beats Anthropic's Mythos on security benchmarks. Updated Codex plugin auto-patches after scanning 30M commits.

·9h ago·3 min read··50 views·AI-Generated·Report error
Share:
Source: the-decoder.comvia the_decoder, wired_ai, engadget, openai_blogWidely Reported
Does OpenAI's GPT-5.5-Cyber outperform Anthropic's Mythos on cybersecurity benchmarks?

OpenAI's GPT-5.5-Cyber outperforms Anthropic's Mythos on CyberGym, ExploitGym, and SEC-bench Pro benchmarks. The updated Codex Security plugin auto-patches vulnerabilities after scanning 30M commits across 30K codebases.

TL;DR

GPT-5.5-Cyber leads on CyberGym, ExploitGym, SEC-bench Pro. · Codex Security plugin now auto-generates patches after scanning 30M commits. · OpenAI partners with 25+ security firms and governments for Daybreak.

OpenAI's gpt-5-5-cyber" class="entity-chip">GPT-5.5-Cyber beats Anthropic's Mythos on CyberGym, ExploitGym, and SEC-bench Pro. The updated model and Codex Security plugin now auto-patch vulnerabilities after scanning 30M commits.

Key facts

  • GPT-5.5-Cyber beats Anthropic's Mythos on CyberGym, ExploitGym, SEC-bench Pro.
  • Codex Security plugin scanned 30M+ commits across 30K+ codebases.
  • 500K+ findings auto-flagged as fixed; 70K manually confirmed.
  • OpenAI partners with 25+ security firms and several governments.
  • Patch the Planet initiative targets open-source software bugs.

OpenAI is expanding its Daybreak cybersecurity initiative with an updated Codex Security plugin, the full GPT-5.5-Cyber model, and a partner network of more than 25 security firms and several governments. The focus shifts from finding vulnerabilities to patching them automatically. According to The Decoder

Key Takeaways

  • OpenAI's GPT-5.5-Cyber beats Anthropic's Mythos on security benchmarks.
  • Updated Codex plugin auto-patches after scanning 30M commits.

Codex Security update closes the loop from discovery to patch

The Codex Security plugin shipped as a research preview in March. Since then, it has scanned over 30 million commits across more than 30,000 codebases, OpenAI says. Over 500,000 findings were automatically flagged as fixed, and human reviewers manually confirmed another 70,000. The updated plugin analyzes code alongside a threat model, spots flaws, checks whether affected code is reachable, builds a targeted patch, and verifies the result. New features include deep scans of entire codebases, attack path analysis, and export to vulnerability management systems via SARIF files or CodeQL queries. Humans still sign off on every change. OpenAI blog

GPT-5.5-Cyber stays locked to vetted defenders

GPT-5.5 matches Claude Mythos in cyber attack tests, UK AI Security ...

The full version of GPT-5.5-Cyber replaces an earlier preview that mostly aimed to cut unnecessary refusals in security workflows. OpenAI calls the updated model the most capable single model for finding and patching software flaws. GPT-5.5-Cyber leads on all key cybersecurity benchmarks, according to OpenAI. CyberGym measures whether an agent can reproduce known flaws in software environments. ExploitGym tests whether agents can turn vulnerabilities into working exploits. SEC-bench Pro evaluates long-term vulnerability discovery. The model is deliberately more permissive than standard models and refuses fewer requests, OpenAI says. Wired AI reports

The "Patch the Planet" initiative, announced alongside the model release, targets open-source software bugs. OpenAI will work with maintainers to find, validate, and fix vulnerabilities using AI and expert review. The partner program includes over 25 security firms and several governments, though OpenAI did not disclose which governments. Engadget

Anthropic recently made a similar point about the bottleneck shifting from finding flaws to patching them. OpenAI agrees, and the updated Codex plugin aims to close that gap. The comparison to Anthropic's Mythos on benchmarks is notable given Anthropic's own cybersecurity efforts — including Claude Code, which senior engineers use with 31% higher success rates than juniors, according to an Anthropic study published June 17. [per Anthropic study]

What to watch

Watch for third-party validation of GPT-5.5-Cyber's benchmark claims — independent researchers often replicate such results within 60 days. Also track whether the partner program expands beyond 25 firms and which governments join, as geopolitical tensions around AI cybersecurity tools intensify.


Source: the-decoder.com


Sources cited in this article

  1. OpenAI. CyberGym
  2. Anthropic
  3. Wired AI
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 4 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

OpenAI's move to benchmark against Anthropic's Mythos is a deliberate escalation. Anthropic has positioned Mythos as a specialized cybersecurity model, and OpenAI's claim of outperformance — on three separate benchmarks — signals that the Daybreak initiative is more than a branding exercise. The real structural shift is the transition from vulnerability discovery to automated patching. The Codex plugin's 500K auto-fixed findings and 70K human-confirmed fixes suggest the model is already production-ready, though the lack of independent verification leaves room for skepticism. The partner network of 25+ security firms is notable but vague. OpenAI did not name any partners, which makes it hard to assess the depth of integration. The Patch the Planet initiative targeting open-source bugs is a smart PR move — it aligns with the broader industry push to secure the software supply chain. However, the restriction of GPT-5.5-Cyber to vetted defenders raises questions about how quickly this capability will reach the broader security community. Compared to Anthropic's Claude Code, which showed a 31% senior engineer advantage in a June 17 study, OpenAI's approach is more automated but less transparent. The coming weeks will reveal whether the benchmark claims hold up to independent scrutiny.
Compare side-by-side
OpenAI vs Anthropic
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all