bugs
30 articles about bugs in AI news
Curl Maintainer Finds 1 CVE, ~20 Bugs via Anthropic's Mythos
Curl maintainer Daniel Stenberg tested Anthropic's Mythos scanner, finding 1 CVE and ~20 bugs. Results validate LLM-based security auditing on real-world code.
Claude Mythos Helped Firefox Fix More Bugs in April Than 15 Prior Months Combined
Firefox fixed more security bugs in April 2026 than 15 prior months combined, using Anthropic's Claude Mythos Preview model for triage and patching.
Linux Kernel Maintainer Linus Torvalds Reports AI-Generated Bug Reports Now Contain 'Actual Bugs' and Working Patches
Linus Torvalds, the lead maintainer of the Linux kernel, has stated that AI-generated bug reports are no longer 'slop' and now frequently identify real bugs with working patches. This marks a significant shift in the practical utility of AI for large-scale, complex software maintenance.
Google's 'TestPilot' AI Agent Debugs Integration Tests from Logs
Google introduced TestPilot, an AI agent that diagnoses integration test failures by sifting through logs and suggesting code fixes. It autonomously resolved 15% of real-world Python test failures in an experiment.
Claude Code's /ultrareview Command
Claude Code's new /ultrareview command runs multiple AI reviewers in parallel to find and independently verify real bugs, costing $5-20 per run after three free tries.
Swap Your 100 MB Telegram Plugin for This 3.5 MB Rust MCP Server
A drop-in Rust replacement for Claude Code's Telegram plugin that solves common bugs, reduces memory usage by 95%, and enables reliable multi-agent setups.
Atlassian's Official MCP Server vs. The Community Version: Which Should You Connect to Claude Code?
Atlassian's official MCP server is GA, but critical bugs and a more powerful community alternative mean your choice depends on your stack and tolerance for risk.
SonarQube Cloud's New MCP Server: Add Security Scanning to Claude Code in 5 Minutes
SonarQube Cloud now has a native MCP server, letting Claude Code analyze code for security vulnerabilities, bugs, and code smells directly in your editor.
Anthropic's Claude AI Identifies Security Vulnerabilities, Earns $3.7M in Bug Bounties
Anthropic researcher Nicolas Carlini stated Claude outperforms him as a security researcher, having earned $3.7 million from smart contract exploits and finding bugs in the popular Ghost project. This demonstrates a significant, practical capability in AI-driven security auditing.
/loop in Claude Code: How to Build Multi-Agent Workflows Without Leaving
The /loop command in Claude Code enables autonomous multi-agent workflows, cycling through coding tasks until completion. Developers should use it to automate iterative processes like TDD cycles.
Sequential Thinking MCP: Break Down Hard Problems Into Solvable Steps in
Sequential Thinking MCP forces Claude Code into structured multi-step reasoning. Install via npx to decompose architecture decisions, debug distributed systems, and design schemas with iterative analysis.
Anthropic Opus 4.8 Cuts Bug-Finding Cost by 5x, SemiAnalysis Finds
Anthropic's Opus 4.8 + ultracode mode cuts severe bug-finding cost to ~1/5, per preliminary SemiAnalysis experiments with wide error bars.
AgingBench: AI Agents Lose Reliability Over Time & Memory Fails
UT Austin paper finds AI agents degrade over time via memory errors. Proposes AgingBench to measure reliability decay across sessions.
CMU Benchmark: Claude Mythos Hits 9.9/16 on V8 Exploits, GPT-5.5 Trails at 5.5
CMU's ExploitBench shows Claude Mythos scores 9.9/16 on V8 exploits vs GPT-5.5's 5.5, but costs $36,428 per run — 12x more. The cost-performance tradeoff is the real story.
CLAUDE.md for Mobile: How One File Fixes Claude Code's CSS Blindspot
A specialized CLAUDE.md file fixes Claude Code's generic CSS by injecting mobile-specific rules, preventing iOS zoom, untappable buttons, and dark mode failures before shipping.
Claude Code solo build: 275 tests, 6 vendor adapters, 6-month onboarding
Non-coder founder built MCP server solo with Claude Code over six months, shipping 275 tests (240 Claude-authored) and six vendor adapters, but three vendor partnerships remain stuck in onboarding.
The /goal Pattern Goes Mainstream — Agents Need Acceptance Criteria
The /goal pattern goes mainstream across coding agents. Effective goals require acceptance criteria-like conditions to avoid loops or hallucinated success.
Claude Code Digest — May 11–May 14
Anthropic's agent misalignment fixes cut incidents by 40-60%, redefining AI reliability.
OpenAI Launches Daybreak Cyber Initiative to Rival Anthropic's Glasswing
OpenAI launched Daybreak, a cybersecurity initiative using GPT-5.5 and Codex Security, to rival Anthropic's Glasswing project.
Claude Code Digest — Apr 28–May 01
CCmeter's cache-busting insights can cut your Claude Code costs by up to 40% instantly.
Codex Update Cuts GUI Workflow Latency 42%
Codex app update cuts GUI workflow latency 42%, enabling near-human-speed interface operation for autonomous app building and debugging.
Claude Security Public Beta Launches in Claude Code on Web
Anthropic launched Claude Security in public beta for Claude Code on web, letting developers validate and fix vulnerabilities without leaving the editor.
GPT-5.5 Pro Sustains 2-Hour Bug Fixing Sessions
A user reports GPT-5.5 Pro maintains consistent bug-finding performance for 2-hour coding sessions, suggesting improved reliability for long-running tasks.
The Semantic Void: A RAG Detective Story
A first-person technical blog chronicles rebuilding a vector store index on GCP, exposing a 'semantic void' where embeddings fail to capture meaning. This serves as a cautionary tale for any RAG implementation, including retail chatbots and product search.
Turn Claude Code Into an AI SRE
Five proven outer-loop workflows for using Claude Code as an AI SRE: incident triage, runbook execution, postmortem drafting, SLO investigation, and on-call handoffs. The bottleneck isn't the model — it's the MCP runtime.
UC San Diego Study: AI Copilots Slow Down Experienced Developers
A real-world study from UC San Diego shows AI coding assistants like GitHub Copilot can slow down experienced developers, increasing task time by up to 50%. This challenges the assumption that AI tools universally boost productivity for all skill levels.
Google Hits 75% AI-Generated Code, Up From 50% in Fall 2025
Google reports 75% of all new code is now AI-generated and engineer-approved, a sharp increase from 50% last fall. This indicates a massive, accelerating shift in software development practices at the tech giant.
10 Claude Code Skills That Actually Work: A Solo Developer's Vetted List
A curated list of the most effective Claude Code skills for developers, based on hands-on testing, focusing on practical MCP servers and workflow enhancements.
From CI Fire to 9% Interruption
Learn the four guardrail patterns and three-phase CLAUDE.md strategy that turns auto-approve from a CI-breaking risk into a productivity superpower.
AI-Powered PS4 Emulator 'Spine' Runs Bloodborne Locally on PC
A developer has released Spine, a PS4 emulator that uses AI techniques to run Bloodborne fully on PC. This represents a major step forward in console emulation, previously considered years away.