bug fix
30 articles about bug fix in AI news
GPT-5.5 Pro Sustains 2-Hour Bug Fixing Sessions
A user reports GPT-5.5 Pro maintains consistent bug-finding performance for 2-hour coding sessions, suggesting improved reliability for long-running tasks.
Claude Code vs. Codex: Real-World Devs Reveal When Each Tool Wins
Claude Code shines at design and greenfield work; pair with Codex for bug fixes. Use CLAUDE.md for guidance.
PaperDebugger Open-Sourced: NUS Tool Auto-Fixes Academic Writing
NUS open-sourced PaperDebugger, an in-editor tool that auto-fixes academic writing clarity and structure. It runs locally via Ollama and catches 40% more issues than Grammarly.
Claude Mythos Helped Firefox Fix More Bugs in April Than 15 Prior Months Combined
Firefox fixed more security bugs in April 2026 than 15 prior months combined, using Anthropic's Claude Mythos Preview model for triage and patching.
Anthropic's Auto-Fix Feature Aims to Revolutionize AI Debugging for Developers
Anthropic has unveiled a research preview feature called Auto-Fix for Claude, designed to automatically correct errors in AI-generated code. This development addresses a persistent pain point for developers working with large language models.
Claude Code v2.1.86 Fixes /compact Failures, Adds Context Usage Tracking
Latest update fixes critical /compact bug, adds getContextUsage() for token monitoring, and improves Edit reliability with seed_read_state.
Claude Code v2.1.90: /powerup Tutorials, Performance Gains, and Critical Auto Mode Fix
Claude Code v2.1.90 adds interactive tutorials, improves performance for MCP and long sessions, and fixes a critical Auto Mode bug that ignored user boundaries.
Compass v1.1.0 Ships Recall Consumption Fix 12 Hours After Launch
Nautilus-Compass v1.1.0 fixes a recall consumption failure where agents saw file titles but didn't read bodies, embedding body text in top-3 hits and adding a drift detector for unconsumed recalls.
CLAUDE.md for Mobile: How One File Fixes Claude Code's CSS Blindspot
A specialized CLAUDE.md file fixes Claude Code's generic CSS by injecting mobile-specific rules, preventing iOS zoom, untappable buttons, and dark mode failures before shipping.
Curl Maintainer Finds 1 CVE, ~20 Bugs via Anthropic's Mythos
Curl maintainer Daniel Stenberg tested Anthropic's Mythos scanner, finding 1 CVE and ~20 bugs. Results validate LLM-based security auditing on real-world code.
Claude Code Regression: How to Diagnose and Fix the Recent Quality Drop
Anthropic's postmortem reveals three regressions in Claude Code: reasoning effort, context retention, and verbosity changes. Here's how to diagnose and fix them.
LLM-as-a-Judge Framework Fixes Math Evaluation Failures
Researchers propose an LLM-as-a-judge framework for evaluating math reasoning that beats rule-based symbolic comparison, fixing failures in Lighteval and SimpleRL. This enables more accurate benchmarking of LLM math abilities.
Alibaba's DCW Fixes SNR-t Bias in Diffusion Models, Boosts FLUX & EDM
Alibaba researchers developed DCW, a wavelet-based method to correct SNR-t misalignment in diffusion models. The fix improves performance for models like FLUX and EDM with minimal computational cost.
Google's 'TestPilot' AI Agent Debugs Integration Tests from Logs
Google introduced TestPilot, an AI agent that diagnoses integration test failures by sifting through logs and suggesting code fixes. It autonomously resolved 15% of real-world Python test failures in an experiment.
How Telemetry Settings Are Silently Costing You Cache Tiers (And How To Fix It)
A confirmed bug links telemetry settings to cache TTL; disabling telemetry defaults you to 5-minute cache, increasing costs. Use environment variables and hooks to mitigate.
Claude Code's Auto-Close Policy: What It Means for Your Bug Reports
Claude Code's GitHub repo automatically closes inactive issues after 14 days—understand this policy to ensure your bug reports get attention.
Anthropic's Claude AI Identifies Security Vulnerabilities, Earns $3.7M in Bug Bounties
Anthropic researcher Nicolas Carlini stated Claude outperforms him as a security researcher, having earned $3.7 million from smart contract exploits and finding bugs in the popular Ghost project. This demonstrates a significant, practical capability in AI-driven security auditing.
Linux Kernel Maintainer Linus Torvalds Reports AI-Generated Bug Reports Now Contain 'Actual Bugs' and Working Patches
Linus Torvalds, the lead maintainer of the Linux kernel, has stated that AI-generated bug reports are no longer 'slop' and now frequently identify real bugs with working patches. This marks a significant shift in the practical utility of AI for large-scale, complex software maintenance.
This Notion MCP Bug Tracker Automates Error Logging—Here's How to Use It
A new MCP server automatically logs and categorizes errors to Notion, turning raw console output into structured bug reports.
Anthropic's Claude Code Now Acts as Autonomous PR Agent, Fixing CI Failures & Review Comments in Background
Anthropic has transformed Claude Code into a persistent pull request agent that monitors GitHub PRs, reacts to CI failures and reviewer comments, and pushes fixes autonomously while developers are offline. The system runs on Anthropic-managed cloud infrastructure, enabling full repo operations without local compute.
Anthropic Launches Claude Code Auto-Fix for Web/Mobile Sessions, Enabling Automatic CI Fixes
Anthropic has launched Claude Code auto-fix for web and mobile development sessions. The feature allows Claude to automatically follow pull requests and fix CI failures in the cloud.
Debug Your Browser with Claude Code: The Chrome DevTools MCP Server is a Frontend Game-Changer
Google's official Chrome DevTools MCP server gives Claude Code deep browser debugging, performance profiling, and Lighthouse audits—connect it to your live browser session today.
Reticle: A Local, Open-Source Tool for Developing and Debugging AI Agents
A developer has released Reticle, a desktop application for building, testing, and debugging AI agents locally. It addresses the fragmented tooling landscape by combining scenario testing, agent tracing, tool mocking, and evaluation suites in one secure, offline environment.
Visual-SDPO: Self-Distillation Fixes Code-Generated Visual Defects by +10 Points
Visual-SDPO uses visual-feedback self-distillation to improve code-generated visual artifacts by >10 points on ChartMimic, Design2Code, and AeSlides, with no added inference cost.
GitHub Spec Kit: Open-Source Tool to Fix Vibe Coding’s Core Flaw
GitHub released Spec Kit, an open-source toolkit that enforces specification-first workflows for AI coding, addressing vibe coding's tendency to generate code before requirements are clear.
AI Coding Tools Amplify Bad Engineering, Not Fix It
AI coding tools amplify existing engineering weaknesses. Teams without discipline produce bad code faster, not good code.
Pylon: Self-Host Your Own AI Agent Pipeline That Fixes Sentry Errors via
Pylon is a self-hosted daemon that triggers sandboxed Claude Code agents from webhooks (Sentry, cron, chat) and reports results with human approval — no data leaves your machine.
How Git Worktrees Fix Multi-Instance Claude Code Chaos
A setup script and workflow for using git worktrees to safely run multiple Claude Code instances in parallel, with conflict recovery patterns.
Claude Code's New Repo-Resolver Fixes Monorepo and Remote URL Headaches
Claude Code's runtime now uses a unified repo-resolver package, providing consistent project identification across all its services and correctly handling monorepos and various git remote URL formats.
Google's Auto-Diagnose AI Hits 90% Accuracy Debugging Test Failures
Google researchers built Auto-Diagnose, an LLM tool that analyzes failure logs to suggest root causes. It achieved 90.14% accuracy in evaluation and was used on over 52,000 distinct failing tests after company-wide deployment.