code llm
30 articles about code llm in AI news
HyEvo Framework Automates Hybrid LLM-Code Workflows, Cuts Inference Cost 19x vs. SOTA
Researchers propose HyEvo, an automated framework that generates agentic workflows combining LLM nodes for reasoning with deterministic code nodes for execution. It reduces inference cost by up to 19x and latency by 16x while outperforming existing methods on reasoning benchmarks.
How to Run Claude Code with Local LLMs Using This Open-Source Script
A new open-source script lets you connect Claude Code to local LLMs via llama.cpp, giving you full privacy and offline access.
How to Run Claude Code on Local LLMs with VibePod's New Backend Support
VibePod now lets you route Claude Code to Ollama or vLLM servers, enabling local model usage and cost savings.
How Godogen's Claude Code Skills Solve LLM Game Development
A developer built two Claude Code skills that generate complete Godot games by solving three key LLM bottlenecks: GDScript knowledge, build-time/runtime state, and visual QA.
LLM Architecture Gallery Compiles 38 Model Designs from 2024-2026 with Diagrams and Code
A new open-source repository provides annotated architecture diagrams, key design choices, and code implementations for 38 major LLMs released between 2024 and 2026, including DeepSeek V3, Qwen3 variants, and GLM-5 744B.
Open-Source Hack Enables Free Claude Code Execution with Local LLMs
Developers have discovered a method to run Anthropic's Claude Code using local LLMs without API costs or data leaving their machines. By redirecting API calls through environment variables, users can leverage open-source models like Qwen3.5 for private, cost-free coding assistance.
llm-anthropic 0.25 Adds Opus 4.7 with xhigh Thinking Effort — Here's How
Update to llm-anthropic 0.25 to access Claude Opus 4.7 with xhigh thinking_effort for tackling your most challenging code problems.
Developer Builds LLM Wiki 'Second Brain' for AI Coding Agents
A developer built an 'LLM Wiki' that feeds an AI coding agent's context window with a living knowledge base of a specific codebase. This aims to solve the agent's short-term memory problem, leading to more consistent and informed code generation.
QuatRoPE: New Positional Embedding Enables Linear-Scale 3D Spatial Reasoning in LLMs, Outperforming Quadratic Methods
Researchers propose QuatRoPE, a novel positional embedding method that encodes 3D object relations with linear input scaling. Paired with IGRE, it improves spatial reasoning in LLMs while preserving their original language capabilities.
OpenAI Winds Down Sora App, Reallocates Compute to Next-Gen 'Spud' LLM Development
OpenAI has completed initial development of its next major AI model, codenamed 'Spud,' and is winding down the Sora video app, which was reportedly a compute resource drain. The move reallocates critical infrastructure toward core LLM competition with Anthropic and Google.
LLM-Driven Heuristic Synthesis for Industrial Process Control: Lessons from Hot Steel Rolling
Researchers propose a framework where an LLM iteratively writes and refines human-readable Python controllers for industrial processes, using feedback from a physics simulator. The method generates auditable, verifiable code and employs a principled budget strategy, eliminating need for problem-specific tuning.
Microsoft: LLMs Corrupt 25% of Docs in Long Edits
Microsoft paper shows LLMs corrupt ~25% of documents across 52 domains during 20-edit sessions, with failures compounding silently.
Vibe Training: SLM Replaces LLM-as-a-Judge, 8x Faster, 50% Fewer Errors
Plurai introduces 'vibe training,' using adversarial agent swarms to distill a small language model (SLM) for evaluating and guarding production AI agents. The SLM outperforms standard LLM-as-a-judge setups with ~8x faster inference and ~50% fewer evaluation errors.
LLM-as-a-Judge Framework Fixes Math Evaluation Failures
Researchers propose an LLM-as-a-judge framework for evaluating math reasoning that beats rule-based symbolic comparison, fixing failures in Lighteval and SimpleRL. This enables more accurate benchmarking of LLM math abilities.
Nvidia Trains Billion-Parameter LLM Without Backpropagation
Nvidia demonstrated training a billion-parameter language model using zero gradients or backpropagation, eliminating FP32 weights entirely. This could dramatically reduce memory and compute costs for LLM training.
TF-LLMER: A New Framework to Fix Optimization Problems in LLM-Enhanced
Researchers identify two key causes of poor training in LLM-enhanced recommenders: norm disparity and misaligned angular clustering. Their solution, TF-LLMER, uses embedding normalization and Rec-PCA to significantly outperform existing methods.
ItemRAG: A New RAG Approach for LLM-Based Recommendation That Retrieves
ItemRAG shifts RAG for LLM-based recommenders from user-history retrieval to fine-grained item-level retrieval, using co-purchase and semantic data to prioritize informative items. Experiments show consistent outperformance over existing methods, especially for cold-start items.
From DIY to MLflow: A Developer's Journey Building an LLM Tracing System
A technical blog details the experience of creating a custom tracing system for LLM applications using FastAPI and Ollama, then migrating to MLflow Tracing. The author discusses practical challenges with spans, traces, and debugging before concluding that established MLOps tools offer better production readiness.
GPT-5.4 LLM Choice Drastically Impacts GPT-ImageGen-2 Output Quality
The quality of images generated by GPT-ImageGen-2 is heavily dependent on the underlying LLM used for reasoning. GPT-5.4 'Thinking' and 'Pro' models produce superior outputs, especially for complex concepts, a non-intuitive finding not documented by OpenAI.
Columbia Prof: LLMs Can't Generate New Science, Only Map Known Data
Columbia CS Professor Vishal Misra argues LLMs cannot generate new scientific ideas because they learn structured maps of known data and fail outside those boundaries. True discovery requires creating new conceptual maps, a capability current architectures lack.
Prefill-as-a-Service Paper Claims to Decouple LLM Inference Bottleneck
A research paper proposes a 'Prefill-as-a-Service' architecture to separate the heavy prefill computation from the lighter decoding phase in LLM inference. This could enable new deployment models where resource-constrained devices handle only the decoding step.
ByteDance's PersonaVLM Boosts MLLM Personalization by 22.4%, Beats GPT-4o
ByteDance researchers unveiled PersonaVLM, a framework that transforms multimodal LLMs into personalized assistants with memory. It improves baseline performance by 22.4% and surpasses GPT-4o by 5.2% on personalized benchmarks.
PRL-Bench: LLMs Score Below 50% on End-to-End Physics Research Tasks
Researchers introduced PRL-Bench, a benchmark built from 100 recent Physical Review Letters papers, testing LLMs on end-to-end physics research. Top models scored below 50%, exposing a significant capability gap for autonomous scientific discovery.
KWBench: New Benchmark Tests LLMs' Unprompted Problem Recognition
Researchers introduced KWBench, a 223-task benchmark measuring if LLMs can recognize the governing game-theoretic problem in professional scenarios without being told what to look for. The best-performing model passed only 27.9% of tasks, highlighting a critical gap between task execution and situational understanding.
Polarization by Default: New Study Audits Recommendation Bias in LLM-Based
A controlled study of 540,000 LLM-based content selections reveals robust biases across providers. All models amplified polarization, showed negative sentiment preferences, and exhibited distinct trade-offs in toxicity handling and demographic representation, with political leaning bias being particularly persistent.
Omar Sarayra Builds LLM Artifact Generator for AI Knowledge Discovery
Omar Sarayra created a system that transforms dense LLM knowledge bases into consumable visual artifacts, like a pulse on HN AI discussions. He argues this format could become a new medium for staying current.
Andrej Karpathy's LLM-Wiki Framework Solves AI Amnesia with Persistent Knowledge
Andrej Karpathy published a two-page framework called LLM-Wiki that transforms how AI systems handle accumulated knowledge. Instead of retrieving from raw documents each time, the AI compiles sources into its own structured wiki that persists across sessions.
BERT-as-a-Judge Matches LLM-as-a-Judge Performance at Fraction of Cost
Researchers propose 'BERT-as-a-Judge,' a lightweight evaluation method that matches the performance of costly LLM-as-a-Judge setups. This could drastically reduce the cost of automated LLM evaluation pipelines.
OpenAI Open-Sources Agents SDK, Supports 100+ LLMs
OpenAI has open-sourced its internal Agents SDK, a lightweight framework for building multi-agent systems. It features three core primitives, works with over 100 LLMs, and has gained 18.9k GitHub stars immediately.
Akshay Pachaar Inverts LLM Agent Architecture with 'Harness' Design
AI engineer Akshay Pachaar outlined a novel 'harness' architecture for LLM agents that externalizes intelligence into memory, skills, and protocols. He is building a minimal, didactic open-source implementation of this design.