bug fix

30 articles about bug fix in AI news

GPT-5.5 Pro Sustains 2-Hour Bug Fixing Sessions

A user reports GPT-5.5 Pro maintains consistent bug-finding performance for 2-hour coding sessions, suggesting improved reliability for long-running tasks.

Apr 26, 202685% relevant

Claude Code vs. Codex: Real-World Devs Reveal When Each Tool Wins

Claude Code shines at design and greenfield work; pair with Codex for bug fixes. Use CLAUDE.md for guidance.

Jun 20, 202690% relevant

PaperDebugger Open-Sourced: NUS Tool Auto-Fixes Academic Writing

NUS open-sourced PaperDebugger, an in-editor tool that auto-fixes academic writing clarity and structure. It runs locally via Ollama and catches 40% more issues than Grammarly.

May 24, 202678% relevant

Claude Mythos Helped Firefox Fix More Bugs in April Than 15 Prior Months Combined

Firefox fixed more security bugs in April 2026 than 15 prior months combined, using Anthropic's Claude Mythos Preview model for triage and patching.

May 7, 202686% relevant

Anthropic's Auto-Fix Feature Aims to Revolutionize AI Debugging for Developers

Anthropic has unveiled a research preview feature called Auto-Fix for Claude, designed to automatically correct errors in AI-generated code. This development addresses a persistent pain point for developers working with large language models.

Mar 8, 202685% relevant

Claude Code v2.1.86 Fixes /compact Failures, Adds Context Usage Tracking

Latest update fixes critical /compact bug, adds getContextUsage() for token monitoring, and improves Edit reliability with seed_read_state.

Mar 25, 202695% relevant

Claude Code v2.1.90: /powerup Tutorials, Performance Gains, and Critical Auto Mode Fix

Claude Code v2.1.90 adds interactive tutorials, improves performance for MCP and long sessions, and fixes a critical Auto Mode bug that ignored user boundaries.

Apr 1, 202695% relevant

Compass v1.1.0 Ships Recall Consumption Fix 12 Hours After Launch

Nautilus-Compass v1.1.0 fixes a recall consumption failure where agents saw file titles but didn't read bodies, embedding body text in top-3 hits and adding a drift detector for unconsumed recalls.

Jun 4, 2026100% relevant

CLAUDE.md for Mobile: How One File Fixes Claude Code's CSS Blindspot

A specialized CLAUDE.md file fixes Claude Code's generic CSS by injecting mobile-specific rules, preventing iOS zoom, untappable buttons, and dark mode failures before shipping.

May 16, 202695% relevant

Curl Maintainer Finds 1 CVE, ~20 Bugs via Anthropic's Mythos

Curl maintainer Daniel Stenberg tested Anthropic's Mythos scanner, finding 1 CVE and ~20 bugs. Results validate LLM-based security auditing on real-world code.

May 12, 202698% relevant

Claude Code Regression: How to Diagnose and Fix the Recent Quality Drop

Anthropic's postmortem reveals three regressions in Claude Code: reasoning effort, context retention, and verbosity changes. Here's how to diagnose and fix them.

Apr 28, 2026100% relevant

LLM-as-a-Judge Framework Fixes Math Evaluation Failures

Researchers propose an LLM-as-a-judge framework for evaluating math reasoning that beats rule-based symbolic comparison, fixing failures in Lighteval and SimpleRL. This enables more accurate benchmarking of LLM math abilities.

Apr 27, 202682% relevant

Alibaba's DCW Fixes SNR-t Bias in Diffusion Models, Boosts FLUX & EDM

Alibaba researchers developed DCW, a wavelet-based method to correct SNR-t misalignment in diffusion models. The fix improves performance for models like FLUX and EDM with minimal computational cost.

Apr 20, 202685% relevant

Google's 'TestPilot' AI Agent Debugs Integration Tests from Logs

Google introduced TestPilot, an AI agent that diagnoses integration test failures by sifting through logs and suggesting code fixes. It autonomously resolved 15% of real-world Python test failures in an experiment.

Apr 17, 202685% relevant

How Telemetry Settings Are Silently Costing You Cache Tiers (And How To Fix It)

A confirmed bug links telemetry settings to cache TTL; disabling telemetry defaults you to 5-minute cache, increasing costs. Use environment variables and hooks to mitigate.

Apr 13, 202690% relevant

Claude Code's Auto-Close Policy: What It Means for Your Bug Reports

Claude Code's GitHub repo automatically closes inactive issues after 14 days—understand this policy to ensure your bug reports get attention.

Apr 11, 2026100% relevant

Anthropic's Claude AI Identifies Security Vulnerabilities, Earns $3.7M in Bug Bounties

Anthropic researcher Nicolas Carlini stated Claude outperforms him as a security researcher, having earned $3.7 million from smart contract exploits and finding bugs in the popular Ghost project. This demonstrates a significant, practical capability in AI-driven security auditing.

Mar 30, 202687% relevant

Linux Kernel Maintainer Linus Torvalds Reports AI-Generated Bug Reports Now Contain 'Actual Bugs' and Working Patches

Linus Torvalds, the lead maintainer of the Linux kernel, has stated that AI-generated bug reports are no longer 'slop' and now frequently identify real bugs with working patches. This marks a significant shift in the practical utility of AI for large-scale, complex software maintenance.

Mar 29, 202685% relevant

This Notion MCP Bug Tracker Automates Error Logging—Here's How to Use It

A new MCP server automatically logs and categorizes errors to Notion, turning raw console output into structured bug reports.

Mar 28, 202674% relevant

Anthropic's Claude Code Now Acts as Autonomous PR Agent, Fixing CI Failures & Review Comments in Background

Anthropic has transformed Claude Code into a persistent pull request agent that monitors GitHub PRs, reacts to CI failures and reviewer comments, and pushes fixes autonomously while developers are offline. The system runs on Anthropic-managed cloud infrastructure, enabling full repo operations without local compute.

Mar 27, 202693% relevant

Anthropic Launches Claude Code Auto-Fix for Web/Mobile Sessions, Enabling Automatic CI Fixes

Anthropic has launched Claude Code auto-fix for web and mobile development sessions. The feature allows Claude to automatically follow pull requests and fix CI failures in the cloud.

Mar 27, 202689% relevant

Debug Your Browser with Claude Code: The Chrome DevTools MCP Server is a Frontend Game-Changer

Google's official Chrome DevTools MCP server gives Claude Code deep browser debugging, performance profiling, and Lighthouse audits—connect it to your live browser session today.

Mar 24, 202698% relevant

Reticle: A Local, Open-Source Tool for Developing and Debugging AI Agents

A developer has released Reticle, a desktop application for building, testing, and debugging AI agents locally. It addresses the fragmented tooling landscape by combining scenario testing, agent tracing, tool mocking, and evaluation suites in one secure, offline environment.

Mar 19, 202670% relevant

Visual-SDPO: Self-Distillation Fixes Code-Generated Visual Defects by +10 Points

Visual-SDPO uses visual-feedback self-distillation to improve code-generated visual artifacts by >10 points on ChartMimic, Design2Code, and AeSlides, with no added inference cost.

Jun 10, 202668% relevant

GitHub Spec Kit: Open-Source Tool to Fix Vibe Coding’s Core Flaw

GitHub released Spec Kit, an open-source toolkit that enforces specification-first workflows for AI coding, addressing vibe coding's tendency to generate code before requirements are clear.

Jun 7, 202685% relevant

AI Coding Tools Amplify Bad Engineering, Not Fix It

AI coding tools amplify existing engineering weaknesses. Teams without discipline produce bad code faster, not good code.

May 16, 202680% relevant

Pylon: Self-Host Your Own AI Agent Pipeline That Fixes Sentry Errors via

Pylon is a self-hosted daemon that triggers sandboxed Claude Code agents from webhooks (Sentry, cron, chat) and reports results with human approval — no data leaves your machine.

Apr 27, 202695% relevant

How Git Worktrees Fix Multi-Instance Claude Code Chaos

A setup script and workflow for using git worktrees to safely run multiple Claude Code instances in parallel, with conflict recovery patterns.

Apr 19, 2026100% relevant

Claude Code's New Repo-Resolver Fixes Monorepo and Remote URL Headaches

Claude Code's runtime now uses a unified repo-resolver package, providing consistent project identification across all its services and correctly handling monorepos and various git remote URL formats.

Apr 19, 202688% relevant

Google's Auto-Diagnose AI Hits 90% Accuracy Debugging Test Failures

Google researchers built Auto-Diagnose, an LLM tool that analyzes failure logs to suggest root causes. It achieved 90.14% accuracy in evaluation and was used on over 52,000 distinct failing tests after company-wide deployment.

Apr 16, 202687% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety