Stop Claude Code's 70% Token Waste: 5 Free Fixes + The MCP That Cuts Costs 58%

Claude Code reads 20+ files per prompt—70% waste. Use specific prompts, short sessions, /compact, and the vexp MCP to slash token usage immediately.

GAla Smith & AI Research Desk·4h ago·4 min read·6 views·AI-Generated

Source: dev.tovia devto_claudecode, devto_mcpCorroborated

The Problem: Your Claude Code Agent Is Reading Everything, Every Time

A developer tracked every token consumed by Claude Code on a real project (FastAPI, ~800 files) across 42 executions. The pattern was consistent: every prompt triggered a file exploration spree.

Glob pattern * — find all files
Glob pattern **/*.{py,js,ts,...} — find code files
Read file 1, file 2, file 3... 20+ times
Finally start thinking about the actual question

Average per prompt:

23 tool calls (Read/Grep/Glob)
~180,000 tokens consumed
~50,000 tokens actually relevant
70% waste rate

That 70% is why you're hitting Anthropic's tighter usage limits. You're not asking too many questions—your agent is reading too many files on every single prompt.

Why This Happens: No Codebase Map

AI coding agents lack a pre-built map of your codebase. They don't know which files are relevant before they start reading, so they default to exploration. Unlike a human developer who reads the codebase once, your AI agent reads it on every prompt.

It gets worse with session length. By turn 15, each prompt re-processes your full conversation history plus all those codebase reads. The cost per prompt grows exponentially.

5 Free Fixes You Can Apply Today

These techniques reduce waste by 20-30% immediately:

1. Scope Your Prompts Precisely

// Instead of:
Fix the auth error

// Use:
Fix the auth error in src/auth/login.ts

The first triggers 20+ file reads. The second triggers 3-5.

2. Use Short Sessions Per Task
Start a new Claude Code session for each discrete task. Don't chain 15 different requests in one conversation.

3. Manually Use /compact Before Context Bloat
Don't wait for auto-compaction at 167K tokens. When you notice the conversation getting long, run:

/compact

4. Audit Your MCP Servers
Every loaded MCP server adds token overhead on every prompt, even when unused. Remove unnecessary servers from your claude_desktop_config.json.

5. Use /model opusplan for Complex Tasks
Let Opus plan the approach, then Sonnet implements. This often reduces unnecessary exploration.

The Structural Fix: vexp MCP Server

The developer built vexp—a local context engine MCP server that pre-indexes your project and serves only relevant code per query.

What it does:

Rust binary with tree-sitter AST parsing
Builds dependency graphs
Stores index in SQLite locally
Your code never leaves your machine

Results on the FastAPI benchmark:

Tool calls/task: -90% (23 → 2.3)
Cost/task: -58% ($0.78 → $0.33)
Output tokens: -63% (504 → 189)
Task duration: -22% (170s → 132s)

Total across 42 runs: $16.29 without vexp, $6.89 with.

Installation:

# Install the MCP server
npm install -g @vexp/mcp

# Add to your Claude Desktop config
{
  "mcpServers": {
    "vexp": {
      "command": "npx",
      "args": ["@vexp/mcp"],
      "env": {
        "VEXP_PROJECT_ROOT": "/path/to/your/project"
      }
    }
  }
}

When to use it:

Projects with 50+ files
When working in the same codebase for multiple sessions
Before complex refactors or bug fixes
When hitting usage limits frequently

Quality Actually Improves

On SWE-bench Verified (100 real GitHub bugs):

73% pass rate (highest in lineup)
$0.67/task vs $1.98 average
8 bugs only vexp solved

Same model, same budget—only context quality changed. Focused input leads to focused responses.

What This Means for the Usage Limits Debate

While everyone argues about whether Anthropic should raise limits or lower prices, both miss the architectural issue. AI coding agents compensate for not knowing your codebase by reading everything. You pay for that compensation with tokens.

Cheaper tokens help. Higher limits help. But reducing what goes into the context window is the only fix that works regardless of Anthropic's pricing or limit decisions.

AI Analysis

**Immediate Actions for Claude Code Users:** 1. **Change your prompting today.** Start every request with a file path when possible. "Add validation to the User model in `models/user.py`" is 4x more efficient than "Add validation to the User model." 2. **Reset sessions aggressively.** After completing a feature or bug fix, type `/new` and start fresh. The exponential cost growth means three 5-turn sessions are cheaper than one 15-turn session. 3. **Install vexp for your main project.** The setup takes 5 minutes. For any codebase larger than a few dozen files, the 58% cost reduction pays for the setup time in your first hour of use. This follows Anthropic's broader push into more efficient agent architectures, as seen with the recent Claude Code Auto Mode release in March 2026. 4. **Monitor your MCPs like dependencies.** Every unused MCP server is a tax on every prompt. Review your `claude_desktop_config.json` monthly and remove what you don't actively use. This aligns with our March 28 article "72% of MCP Servers Have Critical Input Sanitization Flaws"—both security and efficiency demand MCP hygiene. **The Bigger Picture:** This token waste issue explains why Anthropic has been investing in Claude Agent frameworks and context optimization. As Claude Code usage has appeared in 167 articles this week alone (total: 400), efficiency becomes critical for scaling. The vexp approach essentially brings Retrieval-Augmented Generation (RAG) principles to local codebases—a natural evolution given RAG's 70+ mentions in our coverage.

#mcp #tutorial #optimization #claude-code #tools

Enjoyed this article?

Get the weekly AI intelligence briefing

Products & Launches2 shared topics

oh-my-claudecode: Open-Source Multi-Agent Orchestration Layer for Claude Code Boosts Speed 3-5x

Products & Launches2 shared topics

6 Months of Claude Code: The Python Setup That Actually Works

Products & Launches2 shared topics

Claude Paid Subscribers More Than Double in Under Six Months, Credit Card Data Shows

Products & Launches2 shared topics

Transform Your CLAUDE.md from a Note to a Multi-Agent Command Center

Products & Launches2 shared topics

How to Use Claude Code's New 'Auto Mode' for Safer Desktop Automation

Products & Launches2 shared topics

Stop Claude Code's 70% Token Waste: 5 Free Fixes + The MCP That Cuts Costs 58%

The Problem: Your Claude Code Agent Is Reading Everything, Every Time

Why This Happens: No Codebase Map

5 Free Fixes You Can Apply Today

The Structural Fix: vexp MCP Server

Quality Actually Improves

What This Means for the Usage Limits Debate

AI Analysis

Related Articles

oh-my-claudecode: Open-Source Multi-Agent Orchestration Layer for Claude Code Boosts Speed 3-5x

6 Months of Claude Code: The Python Setup That Actually Works

Claude Paid Subscribers More Than Double in Under Six Months, Credit Card Data Shows

Transform Your CLAUDE.md from a Note to a Multi-Agent Command Center

How to Use Claude Code's New 'Auto Mode' for Safer Desktop Automation

How to Build a Remote MCP Server for Azure Data Explorer (Kusto)

More in AI Research

LLMs Show 'Insane' Performance Jump on USAMO 2026 vs 2025, According to Community Analysis

Moonshot AI CEO Yang Zhilin Advocates for Attention Residuals in LLM Architecture

Google's TurboQuant Compresses LLM KV Cache 6x with Zero Accuracy Loss, Cutting GPU Memory by 80%