Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

code llm

30 articles about code llm in AI news

Claude Code Ships /workflows, Replaces LLM Orchestrator with Code

Claude Code /workflows replaces LLM orchestrator with code-based control flow, solving the token tax problem from multi-agent context buildup.

100% relevant

Claude Code's HTML Output Beats Markdown for LLM-Readable Docs

Claude Code generates HTML docs that LLMs parse more accurately than Markdown, per Thariq's analysis. Trade-off: harder for humans to edit.

92% relevant

Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell

Daniel & Michael Han at Unsloth, in collaboration with NVIDIA, published a joint guide quantifying three glue-code optimizations that combine for ~25% faster LLM training on B200 Blackwell hardware. The wins target overhead around the main kernels — caching packed-sequence metadata, double-buffered gradient checkpoint reloads, and a cheaper GPT-OSS MoE router using argsort + bincount. All three are merged via public PRs.

87% relevant

HyEvo Framework Automates Hybrid LLM-Code Workflows, Cuts Inference Cost 19x vs. SOTA

Researchers propose HyEvo, an automated framework that generates agentic workflows combining LLM nodes for reasoning with deterministic code nodes for execution. It reduces inference cost by up to 19x and latency by 16x while outperforming existing methods on reasoning benchmarks.

95% relevant

How to Run Claude Code with Local LLMs Using This Open-Source Script

A new open-source script lets you connect Claude Code to local LLMs via llama.cpp, giving you full privacy and offline access.

95% relevant

How to Run Claude Code on Local LLMs with VibePod's New Backend Support

VibePod now lets you route Claude Code to Ollama or vLLM servers, enabling local model usage and cost savings.

95% relevant

How Godogen's Claude Code Skills Solve LLM Game Development

A developer built two Claude Code skills that generate complete Godot games by solving three key LLM bottlenecks: GDScript knowledge, build-time/runtime state, and visual QA.

95% relevant

LLM Architecture Gallery Compiles 38 Model Designs from 2024-2026 with Diagrams and Code

A new open-source repository provides annotated architecture diagrams, key design choices, and code implementations for 38 major LLMs released between 2024 and 2026, including DeepSeek V3, Qwen3 variants, and GLM-5 744B.

93% relevant

Open-Source Hack Enables Free Claude Code Execution with Local LLMs

Developers have discovered a method to run Anthropic's Claude Code using local LLMs without API costs or data leaving their machines. By redirecting API calls through environment variables, users can leverage open-source models like Qwen3.5 for private, cost-free coding assistance.

85% relevant

Code-as-Agent Harness Thesis: 88.5% Gains Without Touching the LLM

Paper shows 88.5% improvement by adapting runtime interface around frozen LLM. Harness generalizes across 18 backbones, challenging model-centric agent improvement.

84% relevant

llm-anthropic 0.25 Adds Opus 4.7 with xhigh Thinking Effort — Here's How

Update to llm-anthropic 0.25 to access Claude Opus 4.7 with xhigh thinking_effort for tackling your most challenging code problems.

100% relevant

Developer Builds LLM Wiki 'Second Brain' for AI Coding Agents

A developer built an 'LLM Wiki' that feeds an AI coding agent's context window with a living knowledge base of a specific codebase. This aims to solve the agent's short-term memory problem, leading to more consistent and informed code generation.

87% relevant

QuatRoPE: New Positional Embedding Enables Linear-Scale 3D Spatial Reasoning in LLMs, Outperforming Quadratic Methods

Researchers propose QuatRoPE, a novel positional embedding method that encodes 3D object relations with linear input scaling. Paired with IGRE, it improves spatial reasoning in LLMs while preserving their original language capabilities.

79% relevant

OpenAI Winds Down Sora App, Reallocates Compute to Next-Gen 'Spud' LLM Development

OpenAI has completed initial development of its next major AI model, codenamed 'Spud,' and is winding down the Sora video app, which was reportedly a compute resource drain. The move reallocates critical infrastructure toward core LLM competition with Anthropic and Google.

87% relevant

LLM-Driven Heuristic Synthesis for Industrial Process Control: Lessons from Hot Steel Rolling

Researchers propose a framework where an LLM iteratively writes and refines human-readable Python controllers for industrial processes, using feedback from a physics simulator. The method generates auditable, verifiable code and employs a principled budget strategy, eliminating need for problem-specific tuning.

70% relevant

OpenAI, Broadcom Unveil Jalapeño ASIC for LLM Inference

OpenAI and Broadcom unveiled Jalapeño, a custom ASIC for LLM inference, targeting volume deployment by late 2026. No performance metrics were disclosed.

100% relevant

Miami Startup Claims 12M-Token LLM Inference at $8 vs. $2,600 on Claude

Miami startup claims 12M-token LLM inference for $8 vs. $2,600 on Claude Opus 4.6. No paper or benchmarks released yet.

90% relevant

Never Let the LLM Write the Joins

This article details a two-phase text-to-SQL pipeline: Phase A deterministically plans (intent, entity resolution, joins, RBAC) and Phase B executes with bounded LLM calls. The subject graph caches entity mappings lazily, and security is enforced before the model sees any schema.

82% relevant

Metric Match Cuts LLM Judge Annotation Cost 32.5% via Subset Selection

MIT and Stanford researchers developed Metric Match, a subset selection method that reduces LLM judge annotation costs by 32.5% and estimation error by 18.7%, achieving a 0.838 win-rate against random selection.

70% relevant

Omaha Steaks Shrinks Average Delivery Time to 1.24 Days via Fulfillment

Omaha Steaks cut delivery from 6.2 to 1.24 days via five new fulfillment centers and a UPS Roadie partnership. CEO Nate Rempe says same-day delivery now covers 40-45% of the U.S.

74% relevant

UniSound U2 Cuts Token Use 25%, Joins Top Chinese LLM Tier

UniSound's U2 foundation model cuts token consumption by 25% while matching top Chinese LLM performance, entering the top tier with an efficiency-first design.

71% relevant

WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark

WorldBench, a new multimodal benchmark, tests 15 MLLMs on visually diverse images. Top model scores 64.0%, exposing fundamental gaps in visual understanding.

92% relevant

Chinese LLMs Surge on OpenRouter as U.S. AI Traffic Shifts

Chinese LLMs now drive most weekly token growth on OpenRouter, with American startups routing more traffic to them, per @rohanpaul_ai. The shift reflects utility over brand loyalty.

100% relevant

ChatHealthAI: EHR Foundation Model + Frozen LLM Hits 79.8% F1 on Length-of-Stay

ChatHealthAI aligns CLMBR-T-Base with a frozen LLM via a task-aware resampler, achieving 79.8% F1 on EHRSHOT length-of-stay prediction while enabling interpretable reasoning.

92% relevant

New 474-Game Benchmark Reveals LLMs Collapse on Counterfactual Reasoning

New 474-game benchmark reveals LLMs fail on counterfactual reasoning, with larger drops than contextual perturbations. Highlights metacognitive gaps in agentic AI.

92% relevant

Microsoft Markitdown: One-Command File-to-Markdown for LLMs

Microsoft open-sourced Markitdown, a one-command file-to-markdown converter for LLMs, improving output quality by leveraging markdown training data.

75% relevant

Claude.md Hits 152K GitHub Stars; Karpathy Notes LLM Failure Patterns

Claude.md hits 152K GitHub stars. Karpathy notes LLMs fail consistently, driving demand for standardized prompt templates.

77% relevant

ModelBest Drops BitCPM-CANN: First 1.58-bit LLM on Ascend 910B

ModelBest released BitCPM-CANN, the first 1.58-bit ternary LLM on Ascend 910B NPUs, using 6× less VRAM than BF16 with minimal capability loss.

87% relevant

Apple Paper Argues LLMs Show 'Illusion of Thinking'

Apple paper argues LLMs show no genuine reasoning, only pattern matching. The critique targets vendor claims but lacks new empirical evidence.

91% relevant

train-llm-from-scratch: 1B-Parameter LLM on a Single GPU

train-llm-from-scratch trains billion-parameter LLMs on a single GPU, cutting costs from $10M+ to consumer hardware.

85% relevant