Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Developer at a computer terminal configures a local proxy tool that monitors Claude Code's API rate limits…
Open SourceScore: 78

Claude Code quota proxy exposes unified Opus/Sonnet pool

A developer's proxy makes Claude Code usage-aware by intercepting hidden rate limit headers. Sonnet and Opus share one quota pool despite separate UI bars.

·18h ago·3 min read··8 views·AI-Generated·Report error
Share:
Source: reddit.comvia reddit_claude, hn_claude_code, medium_claude, devto_claudecodeMulti-Source
How can Claude Code be made aware of its own usage limits?

A developer built a local HTTP proxy that makes Claude Code aware of its own usage limits by intercepting Anthropic's rate limit headers, revealing that Sonnet and Opus share a single quota pool despite separate UI bars.

TL;DR

Proxy intercepts rate limit headers Claude Code hides · Sonnet and Opus drain the same quota bucket · Open-source tool adds usage awareness to Claude Code

A developer known as Inertia-UK built a local HTTP proxy that makes Claude Code aware of its own usage limits. The proxy intercepts Anthropic's rate limit headers, revealing that Sonnet and Opus share a single quota pool despite separate UI bars.

Key facts

  • Proxy intercepts anthropic-ratelimit-unified-5h-utilization and 7d headers
  • No per-model headers; Sonnet and Opus share one pool
  • GitHub issue #57050 confirms Sonnet bucket never shipped
  • Proxy writes status to ~/.claude/usage-status.md
  • Zero npm dependencies, plain Node.js stdlib

Claude Code has no idea how much quota it's burned. You can see usage bars in the UI, but the model itself is completely blind to them. There's no API, no tool, no hook that exposes the current rate limit state during a conversation [According to the Reddit post].

Anthropic returns rate limit headers on every inference response (anthropic-ratelimit-unified-5h-utilization, anthropic-ratelimit-unified-7d-utilization, etc.) — Claude Code receives them internally to render the UI bars, but never passes them anywhere the model can see.

The proxy sits between Claude Code and api.anthropic.com, routing traffic by setting ANTHROPIC_BASE_URL to http://127.0.0.1:4080. It intercepts response headers and writes a one-line status file to ~/.claude/usage-status.md:

5h=9% 7d=99%! overage=0% bottleneck=seven_day (10/05/2026, 16:19:04)

Claude can read that file on demand or via a UserPromptSubmit hook. With a rule in CLAUDE.md, Claude can warn before large tasks near the limit, switch to lightweight mode above 90%, or refuse new work at 98%.

The interesting discovery: while testing, the developer dumped every anthropic-ratelimit-* header from both Opus and Sonnet requests. There are no per-model headers — one unified pool covers everything. The separate Sonnet usage bar in the Claude Code UI doesn't reflect a real separate limit. According to GitHub issue #57050, Anthropic intended to give Sonnet its own bucket (announced Nov 2025) but the backend never shipped it. Using Sonnet drains the same unified pool as Opus.

This only works with Claude Code (the CLI). The web chat and browser extension make requests through Anthropic's own infrastructure, so there's no local proxy to intercept.

Key Takeaways

  • A developer's proxy makes Claude Code usage-aware by intercepting hidden rate limit headers.
  • Sonnet and Opus share one quota pool despite separate UI bars.

What to watch

Getting Started with Claude Code

Watch for Anthropic's response to GitHub issue #57050 — whether the promised separate Sonnet quota bucket ever ships, or if the unified pool becomes an official feature. Also watch for Anthropic adding a native usage-status tool or API endpoint to Claude Code, which would render this proxy obsolete.


Sources cited in this article

  1. GitHub
  2. Proxy
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This is a classic example of the AI industry's UX gap: models are powerful but completely blind to their own operational constraints. Anthropic ships rate limit data in every response but deliberately walls it off from the model — likely to prevent adversarial manipulation or quota gaming. The proxy sidesteps that design choice entirely. The unified pool discovery is the real story. Anthropic promised Sonnet would have its own quota bucket in November 2025, but the backend never shipped it. This means users who switch to Sonnet to preserve Opus quota are burning the same resource. It's a silent UX failure that undermines trust in the pricing model. This pattern mirrors the early days of cloud cost monitoring — developers building their own tools because vendors won't expose the data natively. Expect Anthropic to either ship a native usage-awareness tool or acquire this pattern into the product.
Compare side-by-side
Claude Opus 4.6 vs Claude 3.5 Sonnet
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Open Source

View all
Google logo and Gemma 4 branding on a dark gradient background, representing the new open-weight AI model family…
Open SourceBreakthrough
100

Google Releases Gemma 4 Family Under Apache 2.0, Featuring 2B to 31B Models with MoE and Multimodal Capabilities

Google has released the Gemma 4 family of open-weight models, derived from Gemini 3 technology. The four models, ranging from 2B to 31B parameters and including a Mixture-of-Experts variant, are available under a permissive Apache 2.0 license and feature multimodal processing.

engadget.com/Apr 2, 2026/3 min read/Widely Reported
product launchopen sourcegoogle
A sleek interface shows a waveform graph with a transcription panel, highlighting Cohere's ASR model achieving top…
Open Source
95

Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard

Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.

the-decoder.com/Mar 27, 2026/3 min read/Widely Reported
open-sourcespeech-aibenchmarks
Students and instructors collaborate around a workstation in a modern classroom at ENS Paris-Saclay, with code and…
Open Source
65

ENS Paris-Saclay Publishes Full-Stack LLM Course: 7 Sessions Cover torchtitan, TorchFT, vLLM, and Agentic AI

Edouard Oyallon released a comprehensive open-access graduate course on training and deploying large-scale models. It bridges theory and production engineering using Meta's torchtitan and torchft, GitHub-hosted labs, and covers the full stack from distributed training to agentic AI.

admin/Mar 27, 2026/3 min read
open sourcellmsai engineering