Stop Claude Code's 70% Token Waste: 5 Free Fixes + The MCP That Cuts Costs 58%
AI ResearchScore: 89

Stop Claude Code's 70% Token Waste: 5 Free Fixes + The MCP That Cuts Costs 58%

Claude Code reads 20+ files per prompt—70% waste. Use specific prompts, short sessions, /compact, and the vexp MCP to slash token usage immediately.

GAla Smith & AI Research Desk·4h ago·4 min read·6 views·AI-Generated
Share:
Source: dev.tovia devto_claudecode, devto_mcpCorroborated

The Problem: Your Claude Code Agent Is Reading Everything, Every Time

A developer tracked every token consumed by Claude Code on a real project (FastAPI, ~800 files) across 42 executions. The pattern was consistent: every prompt triggered a file exploration spree.

  1. Glob pattern * — find all files
  2. Glob pattern **/*.{py,js,ts,...} — find code files
  3. Read file 1, file 2, file 3... 20+ times
  4. Finally start thinking about the actual question

Average per prompt:

  • 23 tool calls (Read/Grep/Glob)
  • ~180,000 tokens consumed
  • ~50,000 tokens actually relevant
  • 70% waste rate

That 70% is why you're hitting Anthropic's tighter usage limits. You're not asking too many questions—your agent is reading too many files on every single prompt.

Why This Happens: No Codebase Map

AI coding agents lack a pre-built map of your codebase. They don't know which files are relevant before they start reading, so they default to exploration. Unlike a human developer who reads the codebase once, your AI agent reads it on every prompt.

It gets worse with session length. By turn 15, each prompt re-processes your full conversation history plus all those codebase reads. The cost per prompt grows exponentially.

5 Free Fixes You Can Apply Today

These techniques reduce waste by 20-30% immediately:

1. Scope Your Prompts Precisely

// Instead of:
Fix the auth error

// Use:
Fix the auth error in src/auth/login.ts

The first triggers 20+ file reads. The second triggers 3-5.

2. Use Short Sessions Per Task
Start a new Claude Code session for each discrete task. Don't chain 15 different requests in one conversation.

3. Manually Use /compact Before Context Bloat
Don't wait for auto-compaction at 167K tokens. When you notice the conversation getting long, run:

/compact

4. Audit Your MCP Servers
Every loaded MCP server adds token overhead on every prompt, even when unused. Remove unnecessary servers from your claude_desktop_config.json.

5. Use /model opusplan for Complex Tasks
Let Opus plan the approach, then Sonnet implements. This often reduces unnecessary exploration.

The Structural Fix: vexp MCP Server

The developer built vexp—a local context engine MCP server that pre-indexes your project and serves only relevant code per query.

What it does:

  • Rust binary with tree-sitter AST parsing
  • Builds dependency graphs
  • Stores index in SQLite locally
  • Your code never leaves your machine

Results on the FastAPI benchmark:

  • Tool calls/task: -90% (23 → 2.3)
  • Cost/task: -58% ($0.78 → $0.33)
  • Output tokens: -63% (504 → 189)
  • Task duration: -22% (170s → 132s)

Total across 42 runs: $16.29 without vexp, $6.89 with.

Installation:

# Install the MCP server
npm install -g @vexp/mcp

# Add to your Claude Desktop config
{
  "mcpServers": {
    "vexp": {
      "command": "npx",
      "args": ["@vexp/mcp"],
      "env": {
        "VEXP_PROJECT_ROOT": "/path/to/your/project"
      }
    }
  }
}

When to use it:

  • Projects with 50+ files
  • When working in the same codebase for multiple sessions
  • Before complex refactors or bug fixes
  • When hitting usage limits frequently

Quality Actually Improves

On SWE-bench Verified (100 real GitHub bugs):

  • 73% pass rate (highest in lineup)
  • $0.67/task vs $1.98 average
  • 8 bugs only vexp solved

Same model, same budget—only context quality changed. Focused input leads to focused responses.

What This Means for the Usage Limits Debate

While everyone argues about whether Anthropic should raise limits or lower prices, both miss the architectural issue. AI coding agents compensate for not knowing your codebase by reading everything. You pay for that compensation with tokens.

Cheaper tokens help. Higher limits help. But reducing what goes into the context window is the only fix that works regardless of Anthropic's pricing or limit decisions.

AI Analysis

**Immediate Actions for Claude Code Users:** 1. **Change your prompting today.** Start every request with a file path when possible. "Add validation to the User model in `models/user.py`" is 4x more efficient than "Add validation to the User model." 2. **Reset sessions aggressively.** After completing a feature or bug fix, type `/new` and start fresh. The exponential cost growth means three 5-turn sessions are cheaper than one 15-turn session. 3. **Install vexp for your main project.** The setup takes 5 minutes. For any codebase larger than a few dozen files, the 58% cost reduction pays for the setup time in your first hour of use. This follows Anthropic's broader push into more efficient agent architectures, as seen with the recent Claude Code Auto Mode release in March 2026. 4. **Monitor your MCPs like dependencies.** Every unused MCP server is a tax on every prompt. Review your `claude_desktop_config.json` monthly and remove what you don't actively use. This aligns with our March 28 article "72% of MCP Servers Have Critical Input Sanitization Flaws"—both security and efficiency demand MCP hygiene. **The Bigger Picture:** This token waste issue explains why Anthropic has been investing in Claude Agent frameworks and context optimization. As Claude Code usage has appeared in 167 articles this week alone (total: 400), efficiency becomes critical for scaling. The vexp approach essentially brings Retrieval-Augmented Generation (RAG) principles to local codebases—a natural evolution given RAG's 70+ mentions in our coverage.
Enjoyed this article?
Share:

Related Articles

More in AI Research

View all