The Problem: Your Claude Code Agent Is Reading Everything, Every Time
A developer tracked every token consumed by Claude Code on a real project (FastAPI, ~800 files) across 42 executions. The pattern was consistent: every prompt triggered a file exploration spree.
- Glob pattern
*— find all files - Glob pattern
**/*.{py,js,ts,...}— find code files - Read file 1, file 2, file 3... 20+ times
- Finally start thinking about the actual question
Average per prompt:
- 23 tool calls (Read/Grep/Glob)
- ~180,000 tokens consumed
- ~50,000 tokens actually relevant
- 70% waste rate
That 70% is why you're hitting Anthropic's tighter usage limits. You're not asking too many questions—your agent is reading too many files on every single prompt.
Why This Happens: No Codebase Map
AI coding agents lack a pre-built map of your codebase. They don't know which files are relevant before they start reading, so they default to exploration. Unlike a human developer who reads the codebase once, your AI agent reads it on every prompt.
It gets worse with session length. By turn 15, each prompt re-processes your full conversation history plus all those codebase reads. The cost per prompt grows exponentially.
5 Free Fixes You Can Apply Today
These techniques reduce waste by 20-30% immediately:
1. Scope Your Prompts Precisely
// Instead of:
Fix the auth error
// Use:
Fix the auth error in src/auth/login.ts
The first triggers 20+ file reads. The second triggers 3-5.
2. Use Short Sessions Per Task
Start a new Claude Code session for each discrete task. Don't chain 15 different requests in one conversation.
3. Manually Use /compact Before Context Bloat
Don't wait for auto-compaction at 167K tokens. When you notice the conversation getting long, run:
/compact
4. Audit Your MCP Servers
Every loaded MCP server adds token overhead on every prompt, even when unused. Remove unnecessary servers from your claude_desktop_config.json.
5. Use /model opusplan for Complex Tasks
Let Opus plan the approach, then Sonnet implements. This often reduces unnecessary exploration.
The Structural Fix: vexp MCP Server
The developer built vexp—a local context engine MCP server that pre-indexes your project and serves only relevant code per query.
What it does:
- Rust binary with tree-sitter AST parsing
- Builds dependency graphs
- Stores index in SQLite locally
- Your code never leaves your machine
Results on the FastAPI benchmark:
- Tool calls/task: -90% (23 → 2.3)
- Cost/task: -58% ($0.78 → $0.33)
- Output tokens: -63% (504 → 189)
- Task duration: -22% (170s → 132s)
Total across 42 runs: $16.29 without vexp, $6.89 with.
Installation:
# Install the MCP server
npm install -g @vexp/mcp
# Add to your Claude Desktop config
{
"mcpServers": {
"vexp": {
"command": "npx",
"args": ["@vexp/mcp"],
"env": {
"VEXP_PROJECT_ROOT": "/path/to/your/project"
}
}
}
}
When to use it:
- Projects with 50+ files
- When working in the same codebase for multiple sessions
- Before complex refactors or bug fixes
- When hitting usage limits frequently
Quality Actually Improves
On SWE-bench Verified (100 real GitHub bugs):
- 73% pass rate (highest in lineup)
- $0.67/task vs $1.98 average
- 8 bugs only vexp solved
Same model, same budget—only context quality changed. Focused input leads to focused responses.
What This Means for the Usage Limits Debate
While everyone argues about whether Anthropic should raise limits or lower prices, both miss the architectural issue. AI coding agents compensate for not knowing your codebase by reading everything. You pay for that compensation with tokens.
Cheaper tokens help. Higher limits help. But reducing what goes into the context window is the only fix that works regardless of Anthropic's pricing or limit decisions.









