How does the proxy make Claude Code aware of its limits?

It intercepts Anthropic's rate limit headers from every API response and writes them to a file Claude Code can read, allowing the model to adjust its behavior based on remaining quota.

Does the proxy work with the web chat or browser extension?

No, it only works with Claude Code CLI because those other clients route requests through Anthropic's infrastructure where a local proxy can't intercept headers.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

Developer at a computer terminal configures a local proxy tool that monitors Claude Code's API rate limits…

Open SourceScore: 78

Claude Code quota proxy exposes unified Opus/Sonnet pool

A developer's proxy makes Claude Code usage-aware by intercepting hidden rate limit headers. Sonnet and Opus share one quota pool despite separate UI bars.

AAAla SMITH & AI Research Desk·18h ago·3 min read··8 views·AI-Generated·Report error

Source: reddit.comvia reddit_claude, hn_claude_code, medium_claude, devto_claudecodeMulti-Source

How can Claude Code be made aware of its own usage limits?

A developer built a local HTTP proxy that makes Claude Code aware of its own usage limits by intercepting Anthropic's rate limit headers, revealing that Sonnet and Opus share a single quota pool despite separate UI bars.

TL;DR

Proxy intercepts rate limit headers Claude Code hides · Sonnet and Opus drain the same quota bucket · Open-source tool adds usage awareness to Claude Code

A developer known as Inertia-UK built a local HTTP proxy that makes Claude Code aware of its own usage limits. The proxy intercepts Anthropic's rate limit headers, revealing that Sonnet and Opus share a single quota pool despite separate UI bars.

Key facts

Proxy intercepts anthropic-ratelimit-unified-5h-utilization and 7d headers
No per-model headers; Sonnet and Opus share one pool
GitHub issue #57050 confirms Sonnet bucket never shipped
Proxy writes status to ~/.claude/usage-status.md
Zero npm dependencies, plain Node.js stdlib

Claude Code has no idea how much quota it's burned. You can see usage bars in the UI, but the model itself is completely blind to them. There's no API, no tool, no hook that exposes the current rate limit state during a conversation [According to the Reddit post].

Anthropic returns rate limit headers on every inference response (anthropic-ratelimit-unified-5h-utilization, anthropic-ratelimit-unified-7d-utilization, etc.) — Claude Code receives them internally to render the UI bars, but never passes them anywhere the model can see.

The proxy sits between Claude Code and api.anthropic.com, routing traffic by setting ANTHROPIC_BASE_URL to http://127.0.0.1:4080. It intercepts response headers and writes a one-line status file to ~/.claude/usage-status.md:

5h=9% 7d=99%! overage=0% bottleneck=seven_day (10/05/2026, 16:19:04)

Claude can read that file on demand or via a UserPromptSubmit hook. With a rule in CLAUDE.md, Claude can warn before large tasks near the limit, switch to lightweight mode above 90%, or refuse new work at 98%.

The interesting discovery: while testing, the developer dumped every anthropic-ratelimit-* header from both Opus and Sonnet requests. There are no per-model headers — one unified pool covers everything. The separate Sonnet usage bar in the Claude Code UI doesn't reflect a real separate limit. According to GitHub issue #57050, Anthropic intended to give Sonnet its own bucket (announced Nov 2025) but the backend never shipped it. Using Sonnet drains the same unified pool as Opus.

This only works with Claude Code (the CLI). The web chat and browser extension make requests through Anthropic's own infrastructure, so there's no local proxy to intercept.

Key Takeaways

A developer's proxy makes Claude Code usage-aware by intercepting hidden rate limit headers.
Sonnet and Opus share one quota pool despite separate UI bars.

What to watch

Getting Started with Claude Code

Watch for Anthropic's response to GitHub issue #57050 — whether the promised separate Sonnet quota bucket ever ships, or if the unified pool becomes an official feature. Also watch for Anthropic adding a native usage-status tool or API endpoint to Claude Code, which would render this proxy obsolete.

Sources cited in this article

GitHub
Proxy

Source: gentic.news · 18h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This is a classic example of the AI industry's UX gap: models are powerful but completely blind to their own operational constraints. Anthropic ships rate limit data in every response but deliberately walls it off from the model — likely to prevent adversarial manipulation or quota gaming. The proxy sidesteps that design choice entirely. The unified pool discovery is the real story. Anthropic promised Sonnet would have its own quota bucket in November 2025, but the backend never shipped it. This means users who switch to Sonnet to preserve Opus quota are burning the same resource. It's a silent UX failure that undermines trust in the pricing model. This pattern mirrors the early days of cloud cost monitoring — developers building their own tools because vendors won't expose the data natively. Expect Anthropic to either ship a native usage-awareness tool or acquire this pattern into the product.

#developer-tools #ai-infrastructure #claude-code

Compare side-by-side

Claude Opus 4.6 vs Claude 3.5 Sonnet

→

Mentioned in this article

Anthropic Claude Code Claude Opus 4.6 Claude 3.5 Sonnet Inertia-UK

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches4 shared topics

3 Ways to Switch Claude Code Models Instantly: /model, --flag, and ENV Variables

Products & Launches3 shared topics

Opus 4.6 Is Gone from Claude Code: What Developers Should Do Now

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Open Source

View all

Open SourceBreakthrough

100

Google Releases Gemma 4 Family Under Apache 2.0, Featuring 2B to 31B Models with MoE and Multimodal Capabilities

Google has released the Gemma 4 family of open-weight models, derived from Gemini 3 technology. The four models, ranging from 2B to 31B parameters and including a Mixture-of-Experts variant, are available under a permissive Apache 2.0 license and feature multimodal processing.

engadget.com/Apr 2, 2026/3 min read/Widely Reported

product launchopen sourcegoogle

A sleek interface shows a waveform graph with a transcription panel, highlighting Cohere's ASR model achieving top…

Open Source

Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard

Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.

the-decoder.com/Mar 27, 2026/3 min read/Widely Reported

open-sourcespeech-aibenchmarks

Students and instructors collaborate around a workstation in a modern classroom at ENS Paris-Saclay, with code and…

Open Source

ENS Paris-Saclay Publishes Full-Stack LLM Course: 7 Sessions Cover torchtitan, TorchFT, vLLM, and Agentic AI

Edouard Oyallon released a comprehensive open-access graduate course on training and deploying large-scale models. It bridges theory and production engineering using Meta's torchtitan and torchft, GitHub-hosted labs, and covers the full stack from distributed training to agentic AI.

admin/Mar 27, 2026/3 min read

open sourcellmsai engineering