coding
30 articles about coding in AI news
MiniMax M3 Sparse Attention: 15.6x Decoding Speedup at 1M Tokens
MiniMax M3 sparse attention achieves 9.7x prefilling and 15.6x decoding speedup at 1M tokens, reversing M2's full-attention stance.
No Rigorous Productivity Tests Exist for Post-2025 Autonomous Coding Tools
No productivity studies exist for autonomous coding tools launched December 2025. All research predates the Claude Code/Codex revolution, creating a major knowledge gap.
Jensen Huang Wants Zero Coding at NVIDIA — 'Purpose vs Task'
Jensen Huang wants zero coding by NVIDIA engineers, framing it as a task to minimize. The bet is AI-generated code will match human output for performance-critical software.
Median Coding Agent Hits 96k Input Tokens, Rewriting Inference Economics
SemiAnalysis found median coding agent uses 96k input tokens from 432k requests, shifting inference cost focus from output to context.
Qwen 3.7-Max Agentic Coding Demo Shows Frontier-Level UI Replication
Qwen 3.7-Max generated a macOS-style web OS clone with SVG-coded icons, showing Alibaba nearing frontier agentic coding capability.
Composer 2.5 Scores 62 on Coding Index at $0.07 vs. $4-5 for Rivals
Composer 2.5 scores 62 on coding index at $0.07/task vs $4-5 for rivals scoring 65-66. 60x cost savings with near-parity performance.
Meta Trains Coding AI on Engineers' Work Traces as 8K Jobs Cut
Meta trains coding AI on engineers' work traces while cutting 8,000 jobs, per leaked audio. The behavior cloning strategy uses internal problem-solving steps as training data.
Vibe-Coding Bottleneck: CPU Box Rental Gets Harder
SemiAnalysis flags that vibe-coding wave makes cheap CPU box rentals less routine, bottlenecking developers who need quick cloud compute for AI prototyping.
NanoGPT-Bench: A New Eval for Coding Agents Doing AI Research
IntologyAI released NanoGPT-Bench, an internal eval for coding agents on an AI R&D problem. No results or task specifics have been disclosed.
The Five-Step Loop: Spec-First Coding Agents Cut Drift by 10x
The five-step loop makes every coding agent step a persistent artifact. Skipping the spec causes compounding drift that's invisible until verification passes for the wrong feature.
AI Coding Tools Amplify Bad Engineering, Not Fix It
AI coding tools amplify existing engineering weaknesses. Teams without discipline produce bad code faster, not good code.
Gemini Flash Rumored at 92% of GPT-5.5 Coding, 15-20x Cheaper
Unconfirmed rumor claims Gemini Flash achieves 92% of GPT-5.5 coding performance at 15-20x lower cost. Source is a single X post; no official confirmation.
Opus 4.7 Prompt Surgery: 20K-Char Cut Per Coding Turn
Lobotomized Claude Code cuts 20K characters per coding turn from Opus 4.7's prompt, removing overfitted CAPS directives and anti-laziness scaffolding that harm the newer model.
Fake Done: Why AI Coding Agents Ship Incomplete Work
Fake Done describes AI coding agents claiming completion of unfinished work, rooted in architectural blindness. Deterministic verification outside the agent offers a fix.
Snapdragon X2 Elite Beats Intel Arrow Lake for AI Coding Agents
Snapdragon X2 Elite beat Intel Arrow Lake for Windows AI coding agents. CPU bottleneck, not inference speed, limited performance per @mweinbach.
NVIDIA NeMo RL Speculative Decoding: 1.8× Rollout Speed at 8B
NVIDIA's NeMo RL speculative decoding achieves 1.8× rollout speedup at 8B and projects 2.5× at 235B, cutting RL training time by over half.
Agentic Harness Engineering Boosts Coding Agents 7% on Terminal-Bench 2
Agentic Harness Engineering introduces a structured approach to evolving coding-agent harnesses, using revertible components, condensed experience, and falsifiable decisions. On Terminal-Bench 2, pass@1 climbs from 69.7% to 77.0% in ten iterations, beating human-designed baselines.
Anthropic's One-Sentence Prompt Broke Claude's Coding for Days
Anthropic added 'keep responses under 25 words' to Claude's system instructions, causing a sudden collapse in coding performance that users detected within hours and took 4 days to fix.
PayPal Cuts LLM Inference Cost 50% with EAGLE3 Speculative Decoding on H100
PayPal engineers applied EAGLE3 speculative decoding to their fine-tuned 8B-parameter commerce agent, achieving up to 49% higher throughput and 33% lower latency. This allowed a single H100 GPU to match the performance of two H100s running NVIDIA NIM, cutting inference hardware cost by 50%.
Moonshot AI Ships Trillion-Parameter Open Model, Matches Claude Opus on Coding
Moonshot AI released a trillion-parameter open-source model that reportedly matches Anthropic's Claude Opus on most coding benchmarks. This follows the same day Anthropic committed $25B to AWS for compute, highlighting divergent AI scaling strategies.
Google's Design.md Gives AI Coding Agents a Visual Design Memory
Google introduced Design.md, a file format for storing design tokens and rules that AI coding agents can read to maintain visual consistency, addressing a key failure point in automated UI generation.
Qwen3.6-27B: How to Run a 17GB Local Model That Beats 397B MoE on Coding Tasks
Qwen3.6-27B delivers flagship-level coding performance in a 55.6GB model that can be quantized to 16.8GB, making high-quality local coding assistance accessible.
SpaceXAI Partners with Cursor AI to Build 'World's Best' Coding Assistant
SpaceXAI and Cursor AI announced a partnership to integrate SpaceX's engineering data with Cursor's editor, aiming to create a top-tier AI for coding and knowledge work.
Google DeepMind Forms 'Strike Team' to Boost AI Coding, Citing Anthropic Pressure
Google has formed a specialized team within DeepMind to rapidly improve its AI coding capabilities. The move is a direct response to internal assessments that Anthropic's tools are more advanced, with leadership pushing for agentic systems.
Moonshot AI's Kimi K2.6 Hits 58.6% on SWE-Bench Pro, Leads Open-Source Coding
Moonshot AI released Kimi K2.6, an open-source coding model achieving 58.6% on SWE-Bench Pro and 54.0% on HLE with tools. This positions it as a top-tier open alternative to proprietary models like Claude 3.5 Sonnet.
Chamath: AI Coding Agents Erase the '10x Engineer' Advantage
Chamath Palihapitiya argues AI coding agents are eliminating the '10x engineer' by making the most efficient code paths obvious to all, similar to how AI solved chess. This reduces technical differentiation and shifts the basis of engineering value.
Claude Opus 4.7 Launches with 3.75MP Vision, Agentic Coding, and New Tokenizer
Anthropic launched Claude Opus 4.7 today with 3x higher vision resolution (3.75MP), self-verifying coding outputs, and stricter instruction following. The update targets enterprise agentic workflows and knowledge work benchmarks.
Apple Sends 200 Siri Engineers to AI Coding Bootcamp Ahead of WWDC
Apple is sending ~200 Siri engineers to a multi-week bootcamp to learn AI coding tools like Claude Code and Codex. This retraining precedes the expected June WWDC unveiling of a Gemini-powered Siri overhaul.
Coding Agent UIs Converge on Side-by-Side Sessions, Says Omar Sar
AI researcher Omar Sar observes a UI convergence in coding agents like Cursor and Claude Code, moving towards flexible, multi-session interfaces that boost developer productivity and agent capability.
Tiny Fish Improves Live Web Usability for AI Coding Agents
Tiny Fish has released a tool that makes the live web significantly more usable for AI coding agents. This addresses a critical failure point where agent workflows often break down during real-world web interactions.