code llm
30 articles about code llm in AI news
Claude Code Ships /workflows, Replaces LLM Orchestrator with Code
Claude Code /workflows replaces LLM orchestrator with code-based control flow, solving the token tax problem from multi-agent context buildup.
Claude Code's HTML Output Beats Markdown for LLM-Readable Docs
Claude Code generates HTML docs that LLMs parse more accurately than Markdown, per Thariq's analysis. Trade-off: harder for humans to edit.
Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell
Daniel & Michael Han at Unsloth, in collaboration with NVIDIA, published a joint guide quantifying three glue-code optimizations that combine for ~25% faster LLM training on B200 Blackwell hardware. The wins target overhead around the main kernels — caching packed-sequence metadata, double-buffered gradient checkpoint reloads, and a cheaper GPT-OSS MoE router using argsort + bincount. All three are merged via public PRs.
HyEvo Framework Automates Hybrid LLM-Code Workflows, Cuts Inference Cost 19x vs. SOTA
Researchers propose HyEvo, an automated framework that generates agentic workflows combining LLM nodes for reasoning with deterministic code nodes for execution. It reduces inference cost by up to 19x and latency by 16x while outperforming existing methods on reasoning benchmarks.
How to Run Claude Code with Local LLMs Using This Open-Source Script
A new open-source script lets you connect Claude Code to local LLMs via llama.cpp, giving you full privacy and offline access.
How to Run Claude Code on Local LLMs with VibePod's New Backend Support
VibePod now lets you route Claude Code to Ollama or vLLM servers, enabling local model usage and cost savings.
How Godogen's Claude Code Skills Solve LLM Game Development
A developer built two Claude Code skills that generate complete Godot games by solving three key LLM bottlenecks: GDScript knowledge, build-time/runtime state, and visual QA.
LLM Architecture Gallery Compiles 38 Model Designs from 2024-2026 with Diagrams and Code
A new open-source repository provides annotated architecture diagrams, key design choices, and code implementations for 38 major LLMs released between 2024 and 2026, including DeepSeek V3, Qwen3 variants, and GLM-5 744B.
Open-Source Hack Enables Free Claude Code Execution with Local LLMs
Developers have discovered a method to run Anthropic's Claude Code using local LLMs without API costs or data leaving their machines. By redirecting API calls through environment variables, users can leverage open-source models like Qwen3.5 for private, cost-free coding assistance.
Code-as-Agent Harness Thesis: 88.5% Gains Without Touching the LLM
Paper shows 88.5% improvement by adapting runtime interface around frozen LLM. Harness generalizes across 18 backbones, challenging model-centric agent improvement.
llm-anthropic 0.25 Adds Opus 4.7 with xhigh Thinking Effort — Here's How
Update to llm-anthropic 0.25 to access Claude Opus 4.7 with xhigh thinking_effort for tackling your most challenging code problems.
Developer Builds LLM Wiki 'Second Brain' for AI Coding Agents
A developer built an 'LLM Wiki' that feeds an AI coding agent's context window with a living knowledge base of a specific codebase. This aims to solve the agent's short-term memory problem, leading to more consistent and informed code generation.
QuatRoPE: New Positional Embedding Enables Linear-Scale 3D Spatial Reasoning in LLMs, Outperforming Quadratic Methods
Researchers propose QuatRoPE, a novel positional embedding method that encodes 3D object relations with linear input scaling. Paired with IGRE, it improves spatial reasoning in LLMs while preserving their original language capabilities.
OpenAI Winds Down Sora App, Reallocates Compute to Next-Gen 'Spud' LLM Development
OpenAI has completed initial development of its next major AI model, codenamed 'Spud,' and is winding down the Sora video app, which was reportedly a compute resource drain. The move reallocates critical infrastructure toward core LLM competition with Anthropic and Google.
LLM-Driven Heuristic Synthesis for Industrial Process Control: Lessons from Hot Steel Rolling
Researchers propose a framework where an LLM iteratively writes and refines human-readable Python controllers for industrial processes, using feedback from a physics simulator. The method generates auditable, verifiable code and employs a principled budget strategy, eliminating need for problem-specific tuning.
OpenAI, Broadcom Unveil Jalapeño ASIC for LLM Inference
OpenAI and Broadcom unveiled Jalapeño, a custom ASIC for LLM inference, targeting volume deployment by late 2026. No performance metrics were disclosed.
Miami Startup Claims 12M-Token LLM Inference at $8 vs. $2,600 on Claude
Miami startup claims 12M-token LLM inference for $8 vs. $2,600 on Claude Opus 4.6. No paper or benchmarks released yet.
Never Let the LLM Write the Joins
This article details a two-phase text-to-SQL pipeline: Phase A deterministically plans (intent, entity resolution, joins, RBAC) and Phase B executes with bounded LLM calls. The subject graph caches entity mappings lazily, and security is enforced before the model sees any schema.
Metric Match Cuts LLM Judge Annotation Cost 32.5% via Subset Selection
MIT and Stanford researchers developed Metric Match, a subset selection method that reduces LLM judge annotation costs by 32.5% and estimation error by 18.7%, achieving a 0.838 win-rate against random selection.
Omaha Steaks Shrinks Average Delivery Time to 1.24 Days via Fulfillment
Omaha Steaks cut delivery from 6.2 to 1.24 days via five new fulfillment centers and a UPS Roadie partnership. CEO Nate Rempe says same-day delivery now covers 40-45% of the U.S.
UniSound U2 Cuts Token Use 25%, Joins Top Chinese LLM Tier
UniSound's U2 foundation model cuts token consumption by 25% while matching top Chinese LLM performance, entering the top tier with an efficiency-first design.
WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark
WorldBench, a new multimodal benchmark, tests 15 MLLMs on visually diverse images. Top model scores 64.0%, exposing fundamental gaps in visual understanding.
Chinese LLMs Surge on OpenRouter as U.S. AI Traffic Shifts
Chinese LLMs now drive most weekly token growth on OpenRouter, with American startups routing more traffic to them, per @rohanpaul_ai. The shift reflects utility over brand loyalty.
ChatHealthAI: EHR Foundation Model + Frozen LLM Hits 79.8% F1 on Length-of-Stay
ChatHealthAI aligns CLMBR-T-Base with a frozen LLM via a task-aware resampler, achieving 79.8% F1 on EHRSHOT length-of-stay prediction while enabling interpretable reasoning.
New 474-Game Benchmark Reveals LLMs Collapse on Counterfactual Reasoning
New 474-game benchmark reveals LLMs fail on counterfactual reasoning, with larger drops than contextual perturbations. Highlights metacognitive gaps in agentic AI.
Microsoft Markitdown: One-Command File-to-Markdown for LLMs
Microsoft open-sourced Markitdown, a one-command file-to-markdown converter for LLMs, improving output quality by leveraging markdown training data.
Claude.md Hits 152K GitHub Stars; Karpathy Notes LLM Failure Patterns
Claude.md hits 152K GitHub stars. Karpathy notes LLMs fail consistently, driving demand for standardized prompt templates.
ModelBest Drops BitCPM-CANN: First 1.58-bit LLM on Ascend 910B
ModelBest released BitCPM-CANN, the first 1.58-bit ternary LLM on Ascend 910B NPUs, using 6× less VRAM than BF16 with minimal capability loss.
Apple Paper Argues LLMs Show 'Illusion of Thinking'
Apple paper argues LLMs show no genuine reasoning, only pattern matching. The critique targets vendor claims but lacks new empirical evidence.
train-llm-from-scratch: 1B-Parameter LLM on a Single GPU
train-llm-from-scratch trains billion-parameter LLMs on a single GPU, cutting costs from $10M+ to consumer hardware.