agentic coding
30 articles about agentic coding in AI news
Qwen 3.7-Max Agentic Coding Demo Shows Frontier-Level UI Replication
Qwen 3.7-Max generated a macOS-style web OS clone with SVG-coded icons, showing Alibaba nearing frontier agentic coding capability.
AI-2027 Authors Accelerate AGI Timelines, Citing Rapid Progress in Agentic Coding
The AI-2027 forecasting group has accelerated its timeline for when AI could replace human software engineers by 1.5 years, from late 2029 to mid-2028. This revision is based on observed rapid progress in agentic coding systems over the last 3-5 months.
Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC
A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achieving 0.81 AUC. This enables benchmark designers to calibrate difficulty without expensive evaluations.
Claude Opus 4.7 Launches with 3.75MP Vision, Agentic Coding, and New Tokenizer
Anthropic launched Claude Opus 4.7 today with 3x higher vision resolution (3.75MP), self-verifying coding outputs, and stricter instruction following. The update targets enterprise agentic workflows and knowledge work benchmarks.
Context Graph for Agentic Coding: A New Abstraction for LLM-Powered Development
A new "context graph" abstraction is emerging for AI coding agents, designed to manage project state and memory across sessions. It aims to solve the persistent context problem in long-running development tasks.
From Agentic Coding to Autonomous Factories: How Cursor Automations Is Redefining Software Engineering
Cursor's new Automations feature transforms AI-assisted coding from a manual, agent-babysitting model to an event-driven system where AI agents trigger automatically based on workflows. This addresses the human attention bottleneck in managing multiple coding agents simultaneously.
Claude Opus 4.8 Launches Dynamic Workflows for Agentic Code
Claude Opus 4.8 launched with dynamic workflows for Claude Code, enabling multi-step agentic coding. The release addresses quality issues after a ~25% instruction miss rate post-4.6.
Claude Code's /btw Command Enables Side Conversations During Agentic Workflows
Claude Code's new /btw command allows developers to ask follow-up questions while Claude is actively working, maintaining context without interrupting the primary task. This addresses a key workflow friction point in agentic coding.
Cursor AI Unveils New Benchmark for Evaluating AI Coding Assistants
Cursor AI has introduced a novel method for scoring AI models on agentic coding tasks, measuring both intelligence and efficiency. The benchmark reveals how different models perform in real-world development scenarios.
DeepSeek-R1 Reportedly Hits 78.9% on OS-World, Outperforming GPT-5.4 at 1/10th Cost
A new benchmark claim suggests DeepSeek-R1 has achieved 78.9% on the OS-World agentic coding benchmark, reportedly outperforming GPT-5.4 while operating at one-tenth the cost. If verified, this would represent a significant leap in cost-performance for AI coding agents.
Boris Cherny's Claude Code Tips Are Now a Skill. Here Is What the Complete Collection Reveals.
A curated collection of expert Claude Code tips is now available as a shareable 'Skill,' revealing proven workflows for faster, more reliable agentic coding.
Industry Leaders Predict 2026 as Breakthrough Year for AI Agents Across Domains
AI industry leaders predict 2026 as the breakthrough year for AI agents across all domains, following initial successes in agentic coding. NVIDIA's Jensen Huang positions current AI development in the 'era of Agents'.
From Code to Discovery: The Next Frontier of AI Agents in Research
AI researcher Omar Saray predicts a shift from 'agentic coding' to 'agentic research'—where AI systems will autonomously conduct scientific discovery. This evolution promises to accelerate innovation across disciplines.
Agentic Harness Engineering Boosts Coding Agents 7% on Terminal-Bench 2
Agentic Harness Engineering introduces a structured approach to evolving coding-agent harnesses, using revertible components, condensed experience, and falsifiable decisions. On Terminal-Bench 2, pass@1 climbs from 69.7% to 77.0% in ten iterations, beating human-designed baselines.
Claude Opus 4.6 Is Live: How to Use Its Improved Coding & Agentic Features in Claude Code
Claude Opus 4.6 is now available with better coding accuracy and agentic task handling. Here's how to configure Claude Code to use it and what to expect.
Google DeepMind Forms 'Strike Team' to Boost AI Coding, Citing Anthropic Pressure
Google has formed a specialized team within DeepMind to rapidly improve its AI coding capabilities. The move is a direct response to internal assessments that Anthropic's tools are more advanced, with leadership pushing for agentic systems.
CMU Research Identifies 'Biggest Unlock' for Coding Agents: Strategic Test Execution
New research from Carnegie Mellon University suggests the key advancement for AI coding agents lies not in raw code generation, but in developing strategies for how to run and interpret tests. This shifts focus from LLM capability to agentic reasoning.
Andrew Ng's Context Hub Solves AI's Documentation Dilemma for Coding Agents
Andrew Ng's team at DeepLearning.AI has launched Context Hub, an open-source tool that provides coding agents with real-time API documentation access. This addresses a critical bottleneck in agentic AI workflows where outdated documentation causes failures.
MiniMax M3 Sparse Attention: 15.6x Decoding Speedup at 1M Tokens
MiniMax M3 sparse attention achieves 9.7x prefilling and 15.6x decoding speedup at 1M tokens, reversing M2's full-attention stance.
Median Coding Agent Hits 96k Input Tokens, Rewriting Inference Economics
SemiAnalysis found median coding agent uses 96k input tokens from 432k requests, shifting inference cost focus from output to context.
Composer 2.5 Scores 62 on Coding Index at $0.07 vs. $4-5 for Rivals
Composer 2.5 scores 62 on coding index at $0.07/task vs $4-5 for rivals scoring 65-66. 60x cost savings with near-parity performance.
Snapdragon X2 Elite Beats Intel Arrow Lake for AI Coding Agents
Snapdragon X2 Elite beat Intel Arrow Lake for Windows AI coding agents. CPU bottleneck, not inference speed, limited performance per @mweinbach.
PayPal Cuts LLM Inference Cost 50% with EAGLE3 Speculative Decoding on H100
PayPal engineers applied EAGLE3 speculative decoding to their fine-tuned 8B-parameter commerce agent, achieving up to 49% higher throughput and 33% lower latency. This allowed a single H100 GPU to match the performance of two H100s running NVIDIA NIM, cutting inference hardware cost by 50%.
Google's Design.md Gives AI Coding Agents a Visual Design Memory
Google introduced Design.md, a file format for storing design tokens and rules that AI coding agents can read to maintain visual consistency, addressing a key failure point in automated UI generation.
Qwen3.6-27B: How to Run a 17GB Local Model That Beats 397B MoE on Coding Tasks
Qwen3.6-27B delivers flagship-level coding performance in a 55.6GB model that can be quantized to 16.8GB, making high-quality local coding assistance accessible.
Chamath: AI Coding Agents Erase the '10x Engineer' Advantage
Chamath Palihapitiya argues AI coding agents are eliminating the '10x engineer' by making the most efficient code paths obvious to all, similar to how AI solved chess. This reduces technical differentiation and shifts the basis of engineering value.
Tiny Fish Improves Live Web Usability for AI Coding Agents
Tiny Fish has released a tool that makes the live web significantly more usable for AI coding agents. This addresses a critical failure point where agent workflows often break down during real-world web interactions.
OpenClaw Creator: Agentic Workflows Fail Without Human Taste in Loop
Peter Steinberger, creator of the OpenClaw AI agent framework, argues that the core failure in agentic workflows is removing human judgment too soon. He asserts that strong output requires continuous human vision, steering, and questioning.
OpenMontage: Open-Source Agentic Video Production System Costs $0.69 Per Ad
OpenMontage, an open-source agentic video production system, has been released. It orchestrates 11 pipelines and 49 tools across multiple AI providers to autonomously script, generate assets, edit, and render videos from a plain language prompt.
Anthropic's Agentic Workflows Launch: A Deep Dive on Cost & Capabilities
Anthropic launched Agentic Workflows, a managed service for running persistent AI agents. While marketed from $0.08/hr, real-world costs are higher due to compute, memory, and network fees.