Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

model routing

30 articles about model routing in AI news

CostRouter Emerges as Smart AI Gateway, Cutting API Expenses by 60% Through Intelligent Model Routing

A new API gateway called CostRouter analyzes request complexity and automatically routes queries to the cheapest capable AI model, saving developers up to 60% on API costs while maintaining quality thresholds.

79% relevant

Claude Code Reverse-Engineered: 98.4% of Codebase is Operational Harness

A reverse-engineering analysis of Claude Code reveals only 1.6% of its codebase is AI decision logic, with the rest being operational infrastructure. This challenges current agent design paradigms by prioritizing a robust deterministic harness over complex model routing.

100% relevant

oh-my-claudecode: Open-Source Multi-Agent Orchestration Layer for Claude Code Boosts Speed 3-5x

Developer hasantoxr released oh-my-claudecode, an open-source orchestration layer that adds five execution modes and 32 specialized agents to Claude Code, reportedly delivering 3-5x faster output with automated model routing between Haiku and Opus.

95% relevant

OpenDev Paper Formalizes the Architecture for Next-Generation Terminal AI Coding Agents

A comprehensive 81-page research paper introduces OpenDev, a systematic framework for building terminal-based AI coding agents. The work details specialized model routing, dual-agent architectures, and safety controls that address reliability challenges in autonomous coding systems.

95% relevant

R³AG: A New Routing Framework That Matches Queries to Retriever

R³AG is a novel routing framework that dynamically selects the optimal retriever for each query in RAG systems, considering not just relevance but also how well the retrieved document helps the generator produce correct answers. It uses contrastive learning to model query-specific preferences, consistently outperforming existing methods on knowledge-intensive tasks.

78% relevant

Agno v2: An Open-Source Framework for Intelligent Multi-LLM Routing

Agno v2 is an open-source framework that enables developers to build a production-ready chat application with intelligent routing. It automatically selects the cheapest LLM capable of handling each user query, optimizing cost and performance.

85% relevant

98× Faster LLM Routing Without a Dedicated GPU: Technical Breakthrough for vLLM Semantic Router

New research presents a three-stage optimization pipeline for the vLLM Semantic Router, achieving 98× speedup and enabling long-context classification on shared GPUs. This solves critical memory and latency bottlenecks for system-level LLM routing.

80% relevant

Beyond Euclidean Distances: How Asymmetric Routing AI Can Optimize Luxury Logistics and Last-Mile Delivery

RADAR introduces a neural framework that solves real-world asymmetric vehicle routing problems, crucial for optimizing luxury goods delivery, store replenishment, and client appointment scheduling in complex urban environments.

70% relevant

Plano AI Proxy Promises 50% Cost Reduction by Intelligently Routing LLM Queries

Plano, an open-source AI proxy powered by the 1.5B parameter Arch-Router model, automatically directs prompts to optimal LLMs based on complexity, potentially halving inference costs while adding orchestration and safety layers.

85% relevant

New MoE Framework Tames User Interest Shifts in Long-Sequence Recommendations

Researchers propose MoS, a model-agnostic MoE approach that handles long user sequences by detecting session hopping – where user interests shift across sessions. The theme-aware routing mechanism filters irrelevant sessions, while multi-scale fusion captures global and local patterns. Results show SOTA on benchmarks with fewer FLOPs than alternatives.

94% relevant

Kimi's Selective Layer Communication Improves Training Efficiency by ~25% with Minimal Inference Overhead

Kimi has developed a method that replaces uniform residual connections with selective information routing between layers in deep AI models. This improves training stability and achieves ~25% better compute efficiency with negligible inference slowdown.

87% relevant

New AI Research: Cluster-Aware Attention-Based Deep RL for Pickup and Delivery Problems

Researchers propose CAADRL, a deep reinforcement learning framework that explicitly models clustered spatial layouts to solve complex pickup and delivery routing problems more efficiently. It matches state-of-the-art performance with significantly lower inference latency.

79% relevant

Beyond Homogenization: How Expert Divergence Learning Unlocks MoE's True Potential

Researchers have developed Expert Divergence Learning, a novel pre-training strategy that combats expert homogenization in Mixture-of-Experts language models. By encouraging functional specialization through domain-aware routing, the method improves performance across benchmarks with minimal computational overhead.

75% relevant

How Claude Code's Upstream Proxy Solves Corporate Network Headaches

Claude Code's CCR feature transparently routes subprocess HTTP traffic through a secure WebSocket tunnel, handling corporate MITM certificates and complex network routing automatically.

100% relevant

The Claude OAuth Workaround Is Dead. Here's How to Cut Your Claude Code API Bill Today

Anthropic killed the OAuth token exploit. Use TeamoRouter's 50% discount and multi-provider routing to slash Claude Code costs without crypto.

95% relevant

How to Configure Claude Code's Sub-Agent Orchestration for Parallel, Sequential, and Background Work

Add routing rules to your CLAUDE.md to make your central AI delegate tasks intelligently—parallel for independent domains, sequential for dependencies, background for research.

95% relevant

vLLM Semantic Router: A New Approach to LLM Orchestration Beyond Simple Benchmarks

The article critiques current LLM routing benchmarks as solving only the easy part, introducing vLLM Semantic Router as a comprehensive solution for production-grade LLM orchestration with semantic understanding.

75% relevant

Switchcraft Router Cuts Agentic AI Inference Cost 84%, Matches Top Model

Switchcraft, a DistilBERT-based model router for agentic tool calling, achieves 82.9% accuracy while cutting inference cost by 84%, saving over $3,600 per million queries.

78% relevant

3 Ways to Switch Claude Code Models Instantly: /model, --flag, and ENV Variables

Anthropic's official guide reveals three methods to switch Claude Code models: /model command, --model flag, and ANTHROPIC_MODEL env variable. Choose the right model for each task.

100% relevant

Free-Claude-Code Proxy Routes Anthropic API to Free NVIDIA NIM Models

A developer released free-claude-code, a proxy that intercepts Claude Code's API calls and routes them to free NVIDIA NIM endpoints, unlocking free access to models like Kimi K2 and GLM 4.7. This bypasses Anthropic's subscription fees and adds remote execution via a Telegram bot.

91% relevant

Alibaba Makes Qwen 3.6 Plus API-Only, Shifts Frontier Model to Paid Access

Alibaba has moved its most capable Qwen 3.6 Plus model to API-only access, while keeping the smaller Qwen 3.6 free. This aligns the company's strategy with OpenAI, Anthropic, and Google's paid frontier model approach.

89% relevant

MiniMax M2.7 Tops Open LLM Leaderboard with 230B Parameter Sparse Model

MiniMax announced its M2.7 model has taken the top spot on the Hugging Face Open LLM Leaderboard. The model uses a sparse mixture-of-experts architecture with 230B total parameters but only activates 10B per token.

85% relevant

How Downgrading to Claude Code 2.1.106 Fixes Model Reasoning Issues

Developers report model reasoning improvements by downgrading to Claude Code 2.1.106 and disabling the Claude Agent feature in global settings.

96% relevant

AI Models Dumber as Compute Shifts to Enterprise, Users Report

Users report noticeable performance degradation in major AI models this month. Analysts suggest providers are shifting computational resources to prioritize enterprise clients over general subscribers.

85% relevant

Zuckerberg: Most Businesses Will Run Custom AI Layers, Not Frontier Models

Mark Zuckerberg predicts most businesses will not own frontier AI models but will build customized operational layers on top of shared models to handle support, sales, and operations. This vision positions foundation models as infrastructure, with value captured in the business-specific layer.

87% relevant

OpenAI Voice Mode Uses Older, Weaker Model, Not GPT-4o

OpenAI's voice mode, which powers its conversational interface, is not powered by the latest GPT-4o model but by a much older and weaker system, creating a disconnect between user perception and technical reality.

75% relevant

Mistral AI Teases 'New Model Tomorrow' in Cryptic Tweet

Mistral AI co-founder Arthur Mensch tweeted 'new model tomorrow!?!', signaling an imminent release. This follows their pattern of rapid, often surprise, model deployments.

85% relevant

DeepSeek-V4 Rumored as 'Whale' Returns, Signaling Major Model Release

DeepSeek's cryptic 'whale' codename has reappeared, strongly hinting at the impending launch of DeepSeek-V4. This follows the company's pattern of using the whale symbol before major model releases.

89% relevant

DeepSeek's R1 Model Triggers Major AI Market Valuation Shifts

Chinese AI startup DeepSeek has released its new large language model R1, causing significant market disruption. The launch reportedly reduced tech giant valuations by approximately one trillion dollars as the model demonstrates competitive capabilities at lower costs.

87% relevant

Gamma 31B Model Reportedly Outperforms Qwen 3.5 397B, Highlighting Efficiency Leap

A developer's social media post claims the Gamma 31B model outperforms the much larger Qwen 3.5 397B. If verified, this would represent a dramatic efficiency gain in large language model scaling.

85% relevant