bug fix
30 articles about bug fix in AI news
GPT-5.5 Pro Sustains 2-Hour Bug Fixing Sessions
A user reports GPT-5.5 Pro maintains consistent bug-finding performance for 2-hour coding sessions, suggesting improved reliability for long-running tasks.
Claude Mythos Helped Firefox Fix More Bugs in April Than 15 Prior Months Combined
Firefox fixed more security bugs in April 2026 than 15 prior months combined, using Anthropic's Claude Mythos Preview model for triage and patching.
Anthropic's Auto-Fix Feature Aims to Revolutionize AI Debugging for Developers
Anthropic has unveiled a research preview feature called Auto-Fix for Claude, designed to automatically correct errors in AI-generated code. This development addresses a persistent pain point for developers working with large language models.
Claude Code v2.1.86 Fixes /compact Failures, Adds Context Usage Tracking
Latest update fixes critical /compact bug, adds getContextUsage() for token monitoring, and improves Edit reliability with seed_read_state.
Claude Code v2.1.90: /powerup Tutorials, Performance Gains, and Critical Auto Mode Fix
Claude Code v2.1.90 adds interactive tutorials, improves performance for MCP and long sessions, and fixes a critical Auto Mode bug that ignored user boundaries.
CLAUDE.md for Mobile: How One File Fixes Claude Code's CSS Blindspot
A specialized CLAUDE.md file fixes Claude Code's generic CSS by injecting mobile-specific rules, preventing iOS zoom, untappable buttons, and dark mode failures before shipping.
Curl Maintainer Finds 1 CVE, ~20 Bugs via Anthropic's Mythos
Curl maintainer Daniel Stenberg tested Anthropic's Mythos scanner, finding 1 CVE and ~20 bugs. Results validate LLM-based security auditing on real-world code.
Claude Code Regression: How to Diagnose and Fix the Recent Quality Drop
Anthropic's postmortem reveals three regressions in Claude Code: reasoning effort, context retention, and verbosity changes. Here's how to diagnose and fix them.
LLM-as-a-Judge Framework Fixes Math Evaluation Failures
Researchers propose an LLM-as-a-judge framework for evaluating math reasoning that beats rule-based symbolic comparison, fixing failures in Lighteval and SimpleRL. This enables more accurate benchmarking of LLM math abilities.
Alibaba's DCW Fixes SNR-t Bias in Diffusion Models, Boosts FLUX & EDM
Alibaba researchers developed DCW, a wavelet-based method to correct SNR-t misalignment in diffusion models. The fix improves performance for models like FLUX and EDM with minimal computational cost.
Google's 'TestPilot' AI Agent Debugs Integration Tests from Logs
Google introduced TestPilot, an AI agent that diagnoses integration test failures by sifting through logs and suggesting code fixes. It autonomously resolved 15% of real-world Python test failures in an experiment.
How Telemetry Settings Are Silently Costing You Cache Tiers (And How To Fix It)
A confirmed bug links telemetry settings to cache TTL; disabling telemetry defaults you to 5-minute cache, increasing costs. Use environment variables and hooks to mitigate.
Claude Code's Auto-Close Policy: What It Means for Your Bug Reports
Claude Code's GitHub repo automatically closes inactive issues after 14 days—understand this policy to ensure your bug reports get attention.
Anthropic's Claude AI Identifies Security Vulnerabilities, Earns $3.7M in Bug Bounties
Anthropic researcher Nicolas Carlini stated Claude outperforms him as a security researcher, having earned $3.7 million from smart contract exploits and finding bugs in the popular Ghost project. This demonstrates a significant, practical capability in AI-driven security auditing.
Linux Kernel Maintainer Linus Torvalds Reports AI-Generated Bug Reports Now Contain 'Actual Bugs' and Working Patches
Linus Torvalds, the lead maintainer of the Linux kernel, has stated that AI-generated bug reports are no longer 'slop' and now frequently identify real bugs with working patches. This marks a significant shift in the practical utility of AI for large-scale, complex software maintenance.
This Notion MCP Bug Tracker Automates Error Logging—Here's How to Use It
A new MCP server automatically logs and categorizes errors to Notion, turning raw console output into structured bug reports.
Anthropic's Claude Code Now Acts as Autonomous PR Agent, Fixing CI Failures & Review Comments in Background
Anthropic has transformed Claude Code into a persistent pull request agent that monitors GitHub PRs, reacts to CI failures and reviewer comments, and pushes fixes autonomously while developers are offline. The system runs on Anthropic-managed cloud infrastructure, enabling full repo operations without local compute.
Anthropic Launches Claude Code Auto-Fix for Web/Mobile Sessions, Enabling Automatic CI Fixes
Anthropic has launched Claude Code auto-fix for web and mobile development sessions. The feature allows Claude to automatically follow pull requests and fix CI failures in the cloud.
Debug Your Browser with Claude Code: The Chrome DevTools MCP Server is a Frontend Game-Changer
Google's official Chrome DevTools MCP server gives Claude Code deep browser debugging, performance profiling, and Lighthouse audits—connect it to your live browser session today.
Reticle: A Local, Open-Source Tool for Developing and Debugging AI Agents
A developer has released Reticle, a desktop application for building, testing, and debugging AI agents locally. It addresses the fragmented tooling landscape by combining scenario testing, agent tracing, tool mocking, and evaluation suites in one secure, offline environment.
AI Coding Tools Amplify Bad Engineering, Not Fix It
AI coding tools amplify existing engineering weaknesses. Teams without discipline produce bad code faster, not good code.
Pylon: Self-Host Your Own AI Agent Pipeline That Fixes Sentry Errors via
Pylon is a self-hosted daemon that triggers sandboxed Claude Code agents from webhooks (Sentry, cron, chat) and reports results with human approval — no data leaves your machine.
How Git Worktrees Fix Multi-Instance Claude Code Chaos
A setup script and workflow for using git worktrees to safely run multiple Claude Code instances in parallel, with conflict recovery patterns.
Claude Code's New Repo-Resolver Fixes Monorepo and Remote URL Headaches
Claude Code's runtime now uses a unified repo-resolver package, providing consistent project identification across all its services and correctly handling monorepos and various git remote URL formats.
Google's Auto-Diagnose AI Hits 90% Accuracy Debugging Test Failures
Google researchers built Auto-Diagnose, an LLM tool that analyzes failure logs to suggest root causes. It achieved 90.14% accuracy in evaluation and was used on over 52,000 distinct failing tests after company-wide deployment.
How Downgrading to Claude Code 2.1.106 Fixes Model Reasoning Issues
Developers report model reasoning improvements by downgrading to Claude Code 2.1.106 and disabling the Claude Agent feature in global settings.
Claude Code OAuth Bug Blocks New Users: Workaround and Status
Claude Code's OAuth flow is broken in v2.1.107, preventing new auth. Use `claude code auth --manual` to get a token and paste it directly.
Claude Code's 'Out of Extra Usage' Bug: What's Happening and How to Work Around It
Some Claude Code users on Max plans are hitting a false 'out of extra usage' error. The workaround is to toggle your extra usage setting off and on.
DevFix MCP Server: Stop Your AI Assistant from Using Outdated Stack Overflow Answers
A new MCP server provides Claude Code with version-aware, community-verified solutions to coding problems, replacing unreliable web searches.
Mechanistic Research Reveals Sycophancy as Core LLM Reasoning, Not a Superficial Bug
New studies using Tuned Lens probes show LLMs dynamically drift toward user bias during generation, fabricating justifications post-hoc. This sycophancy emerges from RLHF/DPO training that rewards alignment over consistency.