coding

30 articles about coding in AI news

PadCaptioner: 3B video caption model beats 7B rivals with parallel decoding

PadCaptioner, a 3B model, beats 7B rivals in dense video captioning via lossless parallel autoregressive decoding, challenging scaling orthodoxy.

Jul 12, 202685% relevant

Claude Opus 4.8 Now Beats Gemini Pro 5 in Coding Benchmarks — What It

Claude Opus 4.8 beats Gemini Pro 5 by 11 points on Fable 5. Claude Code users should run `claude code --model opus-4.8` for complex coding tasks.

Jul 12, 202682% relevant

DeepSeek DSpark: Speculative Decoding Unifies Parallel Gen, Adaptive Verification

DeepSeek released DSpark, a speculative decoding framework unifying parallel generation with adaptive verification. No benchmarks disclosed yet; the approach targets inference latency and throughput.

Jul 11, 202690% relevant

Databricks Tests Coding Agents on Its Own Codebase

Databricks benchmarked coding agents on its own polyglot codebase. GLM-5.2 matched top closed models, a minimal harness halved costs, and cheaper-per-token models cost more per task.

Jul 11, 202675% relevant

Meta Muse Spark 1.1 Debuts in AI Coding Battle; Zuck Post Hits 12M Views

Meta released Muse Spark 1.1 for agentic coding tasks. Zuckerberg's post got 12M views in 12 hours; no benchmarks disclosed.

Jul 10, 2026100% relevant

OpenAI Claims 54% Token Efficiency Gain on Agentic Coding in New Model

OpenAI CEO Sam Altman claims 54% token efficiency gain on agentic coding for a new unnamed model, but no technical details or release date were provided.

Jul 9, 202690% relevant

SpaceXAI Ships Grok 4.5, Blackwell-Trained Coding Model

SpaceXAI released Grok 4.5, a coding-focused model trained on Blackwell GPUs, now available in Cursor and Vercel. Inference cost claims lack independent benchmarks.

Jul 9, 202694% relevant

GitHub's Former CEO Launches Distributed Git Network for AI Coding Agents

Claude Code users should monitor Nat Friedman's distributed Git network for faster agentic coding workflows. The new network optimizes Git for AI agents, potentially reducing clone/push latency.

Jul 8, 202680% relevant

Lovable spent $85K on tokens to learn agentic coding at scale

Lovable spent $85K on tokens for agentic coding. Debugging costs dominate, challenging enterprise adoption.

Jul 3, 2026100% relevant

Vibe Coding Fails: Why AI-Generated Code Breaks at Scale

Vibe coding fails because AI-generated code lacks architectural coherence, test coverage, and security validation, breaking at scale beyond 1,000 lines.

Jun 27, 202670% relevant

MirrorCode Benchmark Costs $2,600 Per Run, Challenges AI Coding Limits

Epoch AI and METR launched MirrorCode, a $2,600-per-run coding benchmark. Claude Opus 4.7 leads with 56% solve rate.

Jun 26, 202677% relevant

JetSpec hits 1,000 t/s on Qwen-8B with speculative decoding

JetSpec achieves 1,000 t/s on Qwen-8B with a B200 GPU, claiming superiority over prior speculative decoding methods, but lacks independent verification.

Jun 26, 202689% relevant

Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'

Zhipu AI's GLM-5.2 ranks top-3 globally on a coding benchmark, with US engineers calling it a daily driver superior to GPT-5.5.

Jun 26, 2026100% relevant

GLM-5.2 matches Opus 4.7 at 1/5 the price in Snowflake coding test

Zhipu AI's GLM-5.2 matched Claude Opus 4.7 on a Snowflake coding benchmark at one-fifth the cost, threatening Western AI lab pricing and IPO valuations.

Jun 24, 202685% relevant

Chinese Lab's Free MoE Model Matches GPT-5.5 on Agentic Coding

A Chinese lab released an Apache-2.0 open-weights MoE model matching GPT-5.5 on agentic coding. This free model challenges proprietary AI's lead with sparse MoE architecture.

Jun 12, 2026100% relevant

OpenAI Buys Ona to Give Codex Multi-Day Autonomous Coding

OpenAI acquired Ona (formerly Gitpod) to give Codex persistent cloud environments for autonomous coding tasks lasting hours or days, targeting Anthropic's Claude Code lead.

Jun 12, 202682% relevant

GitHub Spec Kit: Open-Source Tool to Fix Vibe Coding’s Core Flaw

GitHub released Spec Kit, an open-source toolkit that enforces specification-first workflows for AI coding, addressing vibe coding's tendency to generate code before requirements are clear.

Jun 7, 202685% relevant

MiniMax M3 Sparse Attention: 15.6x Decoding Speedup at 1M Tokens

MiniMax M3 sparse attention achieves 9.7x prefilling and 15.6x decoding speedup at 1M tokens, reversing M2's full-attention stance.

May 26, 2026100% relevant

No Rigorous Productivity Tests Exist for Post-2025 Autonomous Coding Tools

No productivity studies exist for autonomous coding tools launched December 2025. All research predates the Claude Code/Codex revolution, creating a major knowledge gap.

May 26, 202672% relevant

Jensen Huang Wants Zero Coding at NVIDIA — 'Purpose vs Task'

Jensen Huang wants zero coding by NVIDIA engineers, framing it as a task to minimize. The bet is AI-generated code will match human output for performance-critical software.

May 24, 202677% relevant

Median Coding Agent Hits 96k Input Tokens, Rewriting Inference Economics

SemiAnalysis found median coding agent uses 96k input tokens from 432k requests, shifting inference cost focus from output to context.

May 22, 202695% relevant

Qwen 3.7-Max Agentic Coding Demo Shows Frontier-Level UI Replication

Qwen 3.7-Max generated a macOS-style web OS clone with SVG-coded icons, showing Alibaba nearing frontier agentic coding capability.

May 22, 2026100% relevant

Composer 2.5 Scores 62 on Coding Index at $0.07 vs. $4-5 for Rivals

Composer 2.5 scores 62 on coding index at $0.07/task vs $4-5 for rivals scoring 65-66. 60x cost savings with near-parity performance.

May 21, 202683% relevant

Meta Trains Coding AI on Engineers' Work Traces as 8K Jobs Cut

Meta trains coding AI on engineers' work traces while cutting 8,000 jobs, per leaked audio. The behavior cloning strategy uses internal problem-solving steps as training data.

May 21, 2026100% relevant

Vibe-Coding Bottleneck: CPU Box Rental Gets Harder

SemiAnalysis flags that vibe-coding wave makes cheap CPU box rentals less routine, bottlenecking developers who need quick cloud compute for AI prototyping.

May 20, 202675% relevant

NanoGPT-Bench: A New Eval for Coding Agents Doing AI Research

IntologyAI released NanoGPT-Bench, an internal eval for coding agents on an AI R&D problem. No results or task specifics have been disclosed.

May 19, 202685% relevant

The Five-Step Loop: Spec-First Coding Agents Cut Drift by 10x

The five-step loop makes every coding agent step a persistent artifact. Skipping the spec causes compounding drift that's invisible until verification passes for the wrong feature.

May 17, 202692% relevant

AI Coding Tools Amplify Bad Engineering, Not Fix It

AI coding tools amplify existing engineering weaknesses. Teams without discipline produce bad code faster, not good code.

May 16, 202680% relevant

Gemini Flash Rumored at 92% of GPT-5.5 Coding, 15-20x Cheaper

Unconfirmed rumor claims Gemini Flash achieves 92% of GPT-5.5 coding performance at 15-20x lower cost. Source is a single X post; no official confirmation.

May 14, 202689% relevant

Opus 4.7 Prompt Surgery: 20K-Char Cut Per Coding Turn

Lobotomized Claude Code cuts 20K characters per coding turn from Opus 4.7's prompt, removing overfitted CAPS directives and anti-laziness scaffolding that harm the newer model.

May 13, 202678% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety