Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

How Claude Code's Tool Search Saves 90% of Your Context Window

How Claude Code's Tool Search Saves 90% of Your Context Window

Tool search automatically defers MCP tool definitions, replacing them with a single search tool that loads tools on-demand, preserving your context window for actual work.

GAla Smith & AI Research Desk·2h ago·4 min read·4 views·AI-Generated
Share:
Source: dev.tovia devto_claudecode, reddit_claudeCorroborated
How Claude Code's Tool Search Saves 90% of Your Context Window

What Tool Search Actually Does

Every MCP tool you connect to Claude Code comes with a definition — name, description, and JSON schema — that costs 200-800 tokens. With multiple MCP servers (GitHub, Slack, Jira, etc.), you can easily burn 60,000+ tokens on tool definitions alone, every single turn, before Claude even reads your message.

Tool search solves this by deferring most tool definitions. Instead of sending all 147 tool schemas, Claude Code sends:

  • ~25 built-in tool definitions
  • A single ToolSearch tool
  • A prompt telling Claude: "147 deferred tools available — use ToolSearch to load them"

This reduces token overhead from ~90,000 to ~15,000 tokens immediately.

How It Works In Practice

When Claude needs a specific tool, it calls:

ToolSearch({"query": "github create issue"})

The system returns a tool_reference for mcp__github__create_issue. On the next turn, that tool's full schema is available, and Claude can call it normally.

The cost? One extra turn and ~200 tokens for discovery.
The savings? ~1.5 million tokens over a 20-turn conversation.

Which Tools Get Deferred?

The system uses a priority-ordered checklist:

  1. Explicit opt-out first: MCP tools can declare _meta['anthropic/alwaysLoad'] to force loading every turn
  2. MCP tools deferred by default: Most MCP tools are workflow-specific and numerous
  3. ToolSearch never deferred: It's the bootstrap mechanism
  4. Core communication tools never deferred: Agent, Brief — Claude needs these immediately
  5. Built-in tools with shouldDefer flag: Rarely used but available

This follows Anthropic's principle of "fail closed, fail toward asking." If anything is uncertain, the system loads all tools rather than hiding them.

Three Modes You Can Configure

Tool search operates in three modes controlled by ENABLE_TOOL_SEARCH:

Mode 1: tst (Default)

Always defer MCP and shouldDefer tools. This is the right default — if you're using MCP tools, you've already accepted the latency tradeoff for a larger effective context window.

Mode 2: tst-auto

Threshold-based deferral. Only defer when tools exceed a token budget. Use ENABLE_TOOL_SEARCH=auto or ENABLE_TOOL_SEARCH=auto:50 (where 1-99 is the percentage threshold).

Mode 3: standard

Never defer. Use ENABLE_TOOL_SEARCH=false to disable completely.

The Snapshot Mechanism

Discovered tools are preserved across context compaction through a snapshot system. When compaction occurs, the system takes a snapshot of:

  • All currently loaded tools
  • The ToolSearch tool
  • The deferral state

This ensures Claude doesn't lose access to tools it's already discovered, even as the conversation context gets trimmed.

What This Means For Your MCP Servers

If you're building MCP servers, consider:

  1. Mark critical tools with alwaysLoad: If a tool is needed on nearly every turn (like a primary database query), opt it out of deferral
  2. Write clear tool descriptions: Tool search uses semantic matching, so good descriptions improve discovery accuracy
  3. Group related tools: Tools with similar prefixes or descriptions will be discovered together

Try It Now

Check your current configuration:

echo $ENABLE_TOOL_SEARCH

If you're not seeing tool search benefits, ensure:

  1. You have MCP tools configured
  2. You're not in standard mode
  3. Your MCP servers aren't all marked alwaysLoad

For maximum savings with minimal latency impact, use the default tst mode. The one-turn discovery overhead is negligible compared to the context window preservation.

When To Use Each Mode

  • Default (tst): Most users — balances savings with discovery latency
  • tst-auto:20: If you're sensitive to latency but still want some savings
  • standard: Only if you have very few MCP tools or need every tool available immediately

This system represents a fundamental shift in how Claude Code handles tool ecosystems. Instead of paying the token cost for every possible tool upfront, you now pay only for what you actually use.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Claude Code users should immediately check their tool search configuration and understand how it impacts their workflow. If you're using multiple MCP servers, ensure you're in `tst` mode (the default) to maximize context window efficiency. When building or using MCP servers, be strategic about which tools should always load versus which can be deferred. Critical communication tools and frequently used utilities should be marked with `alwaysLoad`, while specialized, rarely-used tools should remain deferred. Monitor your token usage before and after enabling tool search. The savings are most dramatic when you have 50+ MCP tools configured. If you notice Claude struggling to find tools, improve your tool descriptions to include clear keywords that match how you'd naturally search for them.

Mentioned in this article

Enjoyed this article?
Share:

Related Articles

More in Products & Launches

View all