![Monitoring Claude Code costs on AWS Bedrock | by hackthebox | Medium](https://miro.medium.com/v2/resize:fit:1358/format:webp/1*rvzbjvmDIHS6SNse3fg9tw.png)

- Max plan users — Your biggest cost is Fable/Opus subtasks. Muxer cuts that. - Multi-model workflows — You want to route specific work to Gemini or OpenAI Codex via their CLIs. - Quality-sensitive projects — The review escalation rules ensure cheap models don't produce garbage.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

Developer typing code on a laptop with Claude Code interface open, demonstrating AI-assisted programming

Open SourceScore: 54

Muxer: Open-Source Model Multiplexer Slashes Claude Code Costs by Routing

Muxer reduces Claude Code costs by multiplexing models per subtask via agent frontmatter and session hooks. Keep Fable/Opus for planning; route boilerplate to Haiku.

AAAla SMITH & AI Research Desk·1d ago·4 min read··4 views·AI-Generated·Report error

Source: github.comvia hn_claude_codeCorroborated

How do I use Muxer to reduce Claude Code costs by routing subtasks to cheaper models?

Muxer is an open-source Claude Code plugin that multiplexes models per subtask: keep Fable/Opus for planning and review, route grep/boilerplate to Haiku or Gemini via agent frontmatter and session hooks. Saves credits without quality loss.

TL;DR

Route expensive planning models to high-level work and cheap models to boilerplate—Muxer guarantees it via agent frontmatter and session hooks.

Key Takeaways

Muxer reduces Claude Code costs by multiplexing models per subtask via agent frontmatter and session hooks.
Keep Fable/Opus for planning; route boilerplate to Haiku.

What Changed — The Update

Muxer is an open-source Claude Code plugin that lets an expensive model orchestrate a session while cheaper models do the actual work. It hit GitHub recently and solves a pain point anyone on a Max plan knows: the orchestrator model (Fable, Opus) bills for every subtask it spawns, even trivial ones like grepping files or writing boilerplate.

Muxer works through three mechanisms:

Agent frontmatter — Each agent in agents/*.md has a model: line. The scout always runs on Haiku; the builder always runs on Opus. This is a hard guarantee, not a suggestion.
SessionStart hook — A scripts/session-policy.sh injects a routing policy: on Fable sessions it says "keep the main loop lean"; on cheaper models it adds an escalation path up to Fable via muxer:oracle.
PreToolUse guard — scripts/guard-model.sh catches subagents (Explore, Plan) that inherit the session model and pins them to Opus unless overridden. Prevents premium billing on file exploration.

What It Means For You — Concrete Impact

If you're on a Max plan, your biggest cost driver isn't the number of prompts—it's the model running each subtask. Claude Code's built-in subagents inherit the session model by default. On a Fable session, every grep, find, and boilerplate write bills at premium rates.

Muxer's approach mirrors the strategy we covered in "Claude Fable 5 in Claude Code: The Routing Strategy That Saves Your Weekly Limit" (2026-07-02). The difference: Muxer gives you hard guarantees via agent frontmatter rather than relying on prompt engineering.

The project also has rules for quality control:

Taste-critical work (UI, CSS, game feel) never goes below Opus regardless of cost hints.
The verifier is never a cheaper model than the builder it's judging.
Work that fails review twice at one tier gets redone a tier up with a fresh brief.

Try It Now — Commands, Config, and Prompts

1. Clone and Set Up

git clone https://github.com/DangerousYams/muxer.git
cd muxer
# Copy agents and hooks into your Claude Code project
cp -r agents/* ~/.claude/agents/
cp scripts/* ~/.claude/hooks/

2. Configure Agent Frontmatter

Create agents/scout.md:

---
model: haiku
---
You are a scout. Your job is to explore the codebase and gather information. Be fast and concise.

Create agents/builder.md:

---
model: opus
---
You are a builder. You implement features from detailed briefs. Never accept under-scoped briefs.

3. Install Session Hook

In ~/.claude/hooks/session-start.sh:

#!/bin/bash
# If session model is Fable, inject delegation policy
if [ "$CLAUDE_SESSION_MODEL" = "fable" ]; then
  echo "Policy: Delegate all file operations to haiku. Escalate complex decisions to opus."
fi

4. Run and Watch Savings

Start a Claude Code session on Fable. Muxer prints $ saved after each task. You'll see the scout run on Haiku, the builder on Opus, and Fable only for planning and review.

Why It Works

Monitoring Claude Code costs on AWS Bedrock | by hackthebox | Medium

Claude Code picks a model for a subtask at the moment the orchestrator spawns it. Muxer leans on that decision from three directions: agent frontmatter guarantees the model, session hooks set policy, and guards catch unspawned overrides. This triple-layer approach means you don't need to trust prompt engineering—you get hard routing guarantees.

When To Use It

Max plan users — Your biggest cost is Fable/Opus subtasks. Muxer cuts that.
Multi-model workflows — You want to route specific work to Gemini or OpenAI Codex via their CLIs.
Quality-sensitive projects — The review escalation rules ensure cheap models don't produce garbage.

Caveats

Muxer is new (2 points, 0 comments on HN as of writing). The guard script approach is experimental. Test on a small project before rolling out to production. Also, as we noted in "How Navan's MCP Server Cuts Travel Booking from 8 Steps to 1 Command" (2026-07-02), any tool that modifies session behavior can introduce unexpected interactions—monitor your first few sessions closely.

Source: github.com

Source: gentic.news · 1d ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Claude Code users should immediately adopt Muxer's agent frontmatter pattern to enforce model routing per subtask. The key insight is that Claude Code's subagents inherit the session model by default, which is expensive. By adding a `model:` line to every agent file, you get hard guarantees without relying on prompt engineering. Start by defining a scout agent (Haiku), a builder agent (Opus), and a reviewer agent (Opus). Second, implement the PreToolUse guard to catch subagents that spawn without explicit model overrides. This prevents the Explore and Plan subagents from billing at premium rates during routine file operations. The guard pattern is simple: check if `$MODEL` is set; if not, pin to `opus`. Third, adopt the review escalation rules: never let a cheaper model judge a more expensive model's work, and escalate after two failures. This prevents the failure mode where a cheap model builds something visually off and an equally cheap reviewer can't see what's wrong. The project prints `$ saved` after each task, so you can quantify the impact immediately.

#muxer #open source #claude code #cost optimization #model routing

Compare side-by-side

Claude Code vs Muxer

→

Mentioned in this article

Muxer Claude Code Claude Opus 4.6 Haiku Claude Fable DangerousYams

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches2 shared topics

Claude Fable 5 in Claude Code: The Routing Strategy That Saves Your Weekly Limit

Products & Launches2 shared topics

Claude Fable 5 Migration: Cut Prescriptive Skills 60% to Stop Degrading Output

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Open Source

View all

A close-up of dense lines of C and CUDA code on a dark screen, with a terminal window showing compilation output in…

Open Source

NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch

NanoEuler is a 116M-parameter GPT-2-scale model built in pure C/CUDA from scratch. It provides a complete educational training pipeline for understanding LLMs at the lowest level.

github.com/5d ago/3 min read

open sourcecudaai models

Zhipu AI engineer points at monitor displaying GLM-5.2 ranking chart, office with coding screens visible…

Open SourceBreakthrough

100

Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'

Zhipu AI's GLM-5.2 ranks top-3 globally on a coding benchmark, with US engineers calling it a daily driver superior to GPT-5.5.

scmp.com/Jun 26, 2026/3 min read/Widely Reported

open sourcechinacoding