Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

bugs

30 articles about bugs in AI news

Curl Maintainer Finds 1 CVE, ~20 Bugs via Anthropic's Mythos

Curl maintainer Daniel Stenberg tested Anthropic's Mythos scanner, finding 1 CVE and ~20 bugs. Results validate LLM-based security auditing on real-world code.

98% relevant

Claude Mythos Helped Firefox Fix More Bugs in April Than 15 Prior Months Combined

Firefox fixed more security bugs in April 2026 than 15 prior months combined, using Anthropic's Claude Mythos Preview model for triage and patching.

86% relevant

Linux Kernel Maintainer Linus Torvalds Reports AI-Generated Bug Reports Now Contain 'Actual Bugs' and Working Patches

Linus Torvalds, the lead maintainer of the Linux kernel, has stated that AI-generated bug reports are no longer 'slop' and now frequently identify real bugs with working patches. This marks a significant shift in the practical utility of AI for large-scale, complex software maintenance.

85% relevant

Google's 'TestPilot' AI Agent Debugs Integration Tests from Logs

Google introduced TestPilot, an AI agent that diagnoses integration test failures by sifting through logs and suggesting code fixes. It autonomously resolved 15% of real-world Python test failures in an experiment.

85% relevant

Claude Code's /ultrareview Command

Claude Code's new /ultrareview command runs multiple AI reviewers in parallel to find and independently verify real bugs, costing $5-20 per run after three free tries.

91% relevant

Swap Your 100 MB Telegram Plugin for This 3.5 MB Rust MCP Server

A drop-in Rust replacement for Claude Code's Telegram plugin that solves common bugs, reduces memory usage by 95%, and enables reliable multi-agent setups.

92% relevant

Atlassian's Official MCP Server vs. The Community Version: Which Should You Connect to Claude Code?

Atlassian's official MCP server is GA, but critical bugs and a more powerful community alternative mean your choice depends on your stack and tolerance for risk.

82% relevant

SonarQube Cloud's New MCP Server: Add Security Scanning to Claude Code in 5 Minutes

SonarQube Cloud now has a native MCP server, letting Claude Code analyze code for security vulnerabilities, bugs, and code smells directly in your editor.

95% relevant

Anthropic's Claude AI Identifies Security Vulnerabilities, Earns $3.7M in Bug Bounties

Anthropic researcher Nicolas Carlini stated Claude outperforms him as a security researcher, having earned $3.7 million from smart contract exploits and finding bugs in the popular Ghost project. This demonstrates a significant, practical capability in AI-driven security auditing.

87% relevant

/loop in Claude Code: How to Build Multi-Agent Workflows Without Leaving

The /loop command in Claude Code enables autonomous multi-agent workflows, cycling through coding tasks until completion. Developers should use it to automate iterative processes like TDD cycles.

90% relevant

Sequential Thinking MCP: Break Down Hard Problems Into Solvable Steps in

Sequential Thinking MCP forces Claude Code into structured multi-step reasoning. Install via npx to decompose architecture decisions, debug distributed systems, and design schemas with iterative analysis.

75% relevant

Anthropic Opus 4.8 Cuts Bug-Finding Cost by 5x, SemiAnalysis Finds

Anthropic's Opus 4.8 + ultracode mode cuts severe bug-finding cost to ~1/5, per preliminary SemiAnalysis experiments with wide error bars.

100% relevant

AgingBench: AI Agents Lose Reliability Over Time & Memory Fails

UT Austin paper finds AI agents degrade over time via memory errors. Proposes AgingBench to measure reliability decay across sessions.

100% relevant

CMU Benchmark: Claude Mythos Hits 9.9/16 on V8 Exploits, GPT-5.5 Trails at 5.5

CMU's ExploitBench shows Claude Mythos scores 9.9/16 on V8 exploits vs GPT-5.5's 5.5, but costs $36,428 per run — 12x more. The cost-performance tradeoff is the real story.

100% relevant

CLAUDE.md for Mobile: How One File Fixes Claude Code's CSS Blindspot

A specialized CLAUDE.md file fixes Claude Code's generic CSS by injecting mobile-specific rules, preventing iOS zoom, untappable buttons, and dark mode failures before shipping.

95% relevant

Claude Code solo build: 275 tests, 6 vendor adapters, 6-month onboarding

Non-coder founder built MCP server solo with Claude Code over six months, shipping 275 tests (240 Claude-authored) and six vendor adapters, but three vendor partnerships remain stuck in onboarding.

92% relevant

The /goal Pattern Goes Mainstream — Agents Need Acceptance Criteria

The /goal pattern goes mainstream across coding agents. Effective goals require acceptance criteria-like conditions to avoid loops or hallucinated success.

83% relevant

Claude Code Digest — May 11–May 14

Anthropic's agent misalignment fixes cut incidents by 40-60%, redefining AI reliability.

95% relevant

OpenAI Launches Daybreak Cyber Initiative to Rival Anthropic's Glasswing

OpenAI launched Daybreak, a cybersecurity initiative using GPT-5.5 and Codex Security, to rival Anthropic's Glasswing project.

92% relevant

Claude Code Digest — Apr 28–May 01

CCmeter's cache-busting insights can cut your Claude Code costs by up to 40% instantly.

100% relevant

Codex Update Cuts GUI Workflow Latency 42%

Codex app update cuts GUI workflow latency 42%, enabling near-human-speed interface operation for autonomous app building and debugging.

84% relevant

Claude Security Public Beta Launches in Claude Code on Web

Anthropic launched Claude Security in public beta for Claude Code on web, letting developers validate and fix vulnerabilities without leaving the editor.

100% relevant

GPT-5.5 Pro Sustains 2-Hour Bug Fixing Sessions

A user reports GPT-5.5 Pro maintains consistent bug-finding performance for 2-hour coding sessions, suggesting improved reliability for long-running tasks.

85% relevant

The Semantic Void: A RAG Detective Story

A first-person technical blog chronicles rebuilding a vector store index on GCP, exposing a 'semantic void' where embeddings fail to capture meaning. This serves as a cautionary tale for any RAG implementation, including retail chatbots and product search.

74% relevant

Turn Claude Code Into an AI SRE

Five proven outer-loop workflows for using Claude Code as an AI SRE: incident triage, runbook execution, postmortem drafting, SLO investigation, and on-call handoffs. The bottleneck isn't the model — it's the MCP runtime.

100% relevant

UC San Diego Study: AI Copilots Slow Down Experienced Developers

A real-world study from UC San Diego shows AI coding assistants like GitHub Copilot can slow down experienced developers, increasing task time by up to 50%. This challenges the assumption that AI tools universally boost productivity for all skill levels.

87% relevant

Google Hits 75% AI-Generated Code, Up From 50% in Fall 2025

Google reports 75% of all new code is now AI-generated and engineer-approved, a sharp increase from 50% last fall. This indicates a massive, accelerating shift in software development practices at the tech giant.

85% relevant

10 Claude Code Skills That Actually Work: A Solo Developer's Vetted List

A curated list of the most effective Claude Code skills for developers, based on hands-on testing, focusing on practical MCP servers and workflow enhancements.

100% relevant

From CI Fire to 9% Interruption

Learn the four guardrail patterns and three-phase CLAUDE.md strategy that turns auto-approve from a CI-breaking risk into a productivity superpower.

100% relevant

AI-Powered PS4 Emulator 'Spine' Runs Bloodborne Locally on PC

A developer has released Spine, a PS4 emulator that uses AI techniques to run Bloodborne fully on PC. This represents a major step forward in console emulation, previously considered years away.

87% relevant