Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

model routing

30 articles about model routing in AI news

CostRouter Emerges as Smart AI Gateway, Cutting API Expenses by 60% Through Intelligent Model Routing

A new API gateway called CostRouter analyzes request complexity and automatically routes queries to the cheapest capable AI model, saving developers up to 60% on API costs while maintaining quality thresholds.

79% relevant

Scale Your AI Code Review Fleet

Gito v4.1.0 now runs on Claude Code and Gemini CLI. Use async LLM requests and selective model routing to scale code review fleets efficiently.

87% relevant

Claude Code Reverse-Engineered: 98.4% of Codebase is Operational Harness

A reverse-engineering analysis of Claude Code reveals only 1.6% of its codebase is AI decision logic, with the rest being operational infrastructure. This challenges current agent design paradigms by prioritizing a robust deterministic harness over complex model routing.

100% relevant

oh-my-claudecode: Open-Source Multi-Agent Orchestration Layer for Claude Code Boosts Speed 3-5x

Developer hasantoxr released oh-my-claudecode, an open-source orchestration layer that adds five execution modes and 32 specialized agents to Claude Code, reportedly delivering 3-5x faster output with automated model routing between Haiku and Opus.

95% relevant

OpenDev Paper Formalizes the Architecture for Next-Generation Terminal AI Coding Agents

A comprehensive 81-page research paper introduces OpenDev, a systematic framework for building terminal-based AI coding agents. The work details specialized model routing, dual-agent architectures, and safety controls that address reliability challenges in autonomous coding systems.

95% relevant

R³AG: A New Routing Framework That Matches Queries to Retriever

R³AG is a novel routing framework that dynamically selects the optimal retriever for each query in RAG systems, considering not just relevance but also how well the retrieved document helps the generator produce correct answers. It uses contrastive learning to model query-specific preferences, consistently outperforming existing methods on knowledge-intensive tasks.

78% relevant

Agno v2: An Open-Source Framework for Intelligent Multi-LLM Routing

Agno v2 is an open-source framework that enables developers to build a production-ready chat application with intelligent routing. It automatically selects the cheapest LLM capable of handling each user query, optimizing cost and performance.

85% relevant

98× Faster LLM Routing Without a Dedicated GPU: Technical Breakthrough for vLLM Semantic Router

New research presents a three-stage optimization pipeline for the vLLM Semantic Router, achieving 98× speedup and enabling long-context classification on shared GPUs. This solves critical memory and latency bottlenecks for system-level LLM routing.

80% relevant

Beyond Euclidean Distances: How Asymmetric Routing AI Can Optimize Luxury Logistics and Last-Mile Delivery

RADAR introduces a neural framework that solves real-world asymmetric vehicle routing problems, crucial for optimizing luxury goods delivery, store replenishment, and client appointment scheduling in complex urban environments.

70% relevant

Plano AI Proxy Promises 50% Cost Reduction by Intelligently Routing LLM Queries

Plano, an open-source AI proxy powered by the 1.5B parameter Arch-Router model, automatically directs prompts to optimal LLMs based on complexity, potentially halving inference costs while adding orchestration and safety layers.

85% relevant

New MoE Framework Tames User Interest Shifts in Long-Sequence Recommendations

Researchers propose MoS, a model-agnostic MoE approach that handles long user sequences by detecting session hopping – where user interests shift across sessions. The theme-aware routing mechanism filters irrelevant sessions, while multi-scale fusion captures global and local patterns. Results show SOTA on benchmarks with fewer FLOPs than alternatives.

94% relevant

Kimi's Selective Layer Communication Improves Training Efficiency by ~25% with Minimal Inference Overhead

Kimi has developed a method that replaces uniform residual connections with selective information routing between layers in deep AI models. This improves training stability and achieves ~25% better compute efficiency with negligible inference slowdown.

87% relevant

New AI Research: Cluster-Aware Attention-Based Deep RL for Pickup and Delivery Problems

Researchers propose CAADRL, a deep reinforcement learning framework that explicitly models clustered spatial layouts to solve complex pickup and delivery routing problems more efficiently. It matches state-of-the-art performance with significantly lower inference latency.

79% relevant

Beyond Homogenization: How Expert Divergence Learning Unlocks MoE's True Potential

Researchers have developed Expert Divergence Learning, a novel pre-training strategy that combats expert homogenization in Mixture-of-Experts language models. By encouraging functional specialization through domain-aware routing, the method improves performance across benchmarks with minimal computational overhead.

75% relevant

How to Govern Claude Code Across Your Team: 4 Gaps to Fix Before the Next CVE

Govern Claude Code by routing through an AI gateway with `ANTHROPIC_BASE_URL`, locking settings via MDM, denying filesystem access to secrets, and auditing MCP servers. Two CVEs in 2026 prove repo-level config is an execution layer.

100% relevant

Chinese LLMs Surge on OpenRouter as U.S. AI Traffic Shifts

Chinese LLMs now drive most weekly token growth on OpenRouter, with American startups routing more traffic to them, per @rohanpaul_ai. The shift reflects utility over brand loyalty.

100% relevant

Prism v1.8 Adds CLI, MCP Server, and SDKs — Here's How to Use Them with

Prism v1.8's MCP server gives Claude Code direct control over caches, budgets, and routing. Install it in 2 minutes and ditch the dashboard for terminal-based AI infrastructure management.

73% relevant

How Claude Code's Upstream Proxy Solves Corporate Network Headaches

Claude Code's CCR feature transparently routes subprocess HTTP traffic through a secure WebSocket tunnel, handling corporate MITM certificates and complex network routing automatically.

100% relevant

The Claude OAuth Workaround Is Dead. Here's How to Cut Your Claude Code API Bill Today

Anthropic killed the OAuth token exploit. Use TeamoRouter's 50% discount and multi-provider routing to slash Claude Code costs without crypto.

95% relevant

How to Configure Claude Code's Sub-Agent Orchestration for Parallel, Sequential, and Background Work

Add routing rules to your CLAUDE.md to make your central AI delegate tasks intelligently—parallel for independent domains, sequential for dependencies, background for research.

95% relevant

vLLM Semantic Router: A New Approach to LLM Orchestration Beyond Simple Benchmarks

The article critiques current LLM routing benchmarks as solving only the easy part, introducing vLLM Semantic Router as a comprehensive solution for production-grade LLM orchestration with semantic understanding.

75% relevant

ByteDance Lance 3B MoE Beats 7B Models on Multimodal Benchmarks

ByteDance released Lance, a 3B multimodal MoE model that beats 7B+ models on benchmarks through multi-task synergy and specialized pathways.

90% relevant

30B-A3B Reasoning Model Hits Gold Medal on Physics, Math Olympiads

30B-A3B reasoning model from @stingning achieves gold-medal level on physics and math Olympiads, released on Hugging Face.

87% relevant

Switchcraft Router Cuts Agentic AI Inference Cost 84%, Matches Top Model

Switchcraft, a DistilBERT-based model router for agentic tool calling, achieves 82.9% accuracy while cutting inference cost by 84%, saving over $3,600 per million queries.

78% relevant

3 Ways to Switch Claude Code Models Instantly: /model, --flag, and ENV Variables

Anthropic's official guide reveals three methods to switch Claude Code models: /model command, --model flag, and ANTHROPIC_MODEL env variable. Choose the right model for each task.

100% relevant

Free-Claude-Code Proxy Routes Anthropic API to Free NVIDIA NIM Models

A developer released free-claude-code, a proxy that intercepts Claude Code's API calls and routes them to free NVIDIA NIM endpoints, unlocking free access to models like Kimi K2 and GLM 4.7. This bypasses Anthropic's subscription fees and adds remote execution via a Telegram bot.

91% relevant

Alibaba Makes Qwen 3.6 Plus API-Only, Shifts Frontier Model to Paid Access

Alibaba has moved its most capable Qwen 3.6 Plus model to API-only access, while keeping the smaller Qwen 3.6 free. This aligns the company's strategy with OpenAI, Anthropic, and Google's paid frontier model approach.

89% relevant

MiniMax M2.7 Tops Open LLM Leaderboard with 230B Parameter Sparse Model

MiniMax announced its M2.7 model has taken the top spot on the Hugging Face Open LLM Leaderboard. The model uses a sparse mixture-of-experts architecture with 230B total parameters but only activates 10B per token.

85% relevant

How Downgrading to Claude Code 2.1.106 Fixes Model Reasoning Issues

Developers report model reasoning improvements by downgrading to Claude Code 2.1.106 and disabling the Claude Agent feature in global settings.

96% relevant

AI Models Dumber as Compute Shifts to Enterprise, Users Report

Users report noticeable performance degradation in major AI models this month. Analysts suggest providers are shifting computational resources to prioritize enterprise clients over general subscribers.

85% relevant