What Tool Search Actually Does
Every MCP tool you connect to Claude Code comes with a definition — name, description, and JSON schema — that costs 200-800 tokens. With multiple MCP servers (GitHub, Slack, Jira, etc.), you can easily burn 60,000+ tokens on tool definitions alone, every single turn, before Claude even reads your message.
Tool search solves this by deferring most tool definitions. Instead of sending all 147 tool schemas, Claude Code sends:
- ~25 built-in tool definitions
- A single
ToolSearchtool - A prompt telling Claude: "147 deferred tools available — use ToolSearch to load them"
This reduces token overhead from ~90,000 to ~15,000 tokens immediately.
How It Works In Practice
When Claude needs a specific tool, it calls:
ToolSearch({"query": "github create issue"})
The system returns a tool_reference for mcp__github__create_issue. On the next turn, that tool's full schema is available, and Claude can call it normally.
The cost? One extra turn and ~200 tokens for discovery.
The savings? ~1.5 million tokens over a 20-turn conversation.
Which Tools Get Deferred?
The system uses a priority-ordered checklist:
- Explicit opt-out first: MCP tools can declare
_meta['anthropic/alwaysLoad']to force loading every turn - MCP tools deferred by default: Most MCP tools are workflow-specific and numerous
- ToolSearch never deferred: It's the bootstrap mechanism
- Core communication tools never deferred: Agent, Brief — Claude needs these immediately
- Built-in tools with
shouldDeferflag: Rarely used but available
This follows Anthropic's principle of "fail closed, fail toward asking." If anything is uncertain, the system loads all tools rather than hiding them.
Three Modes You Can Configure
Tool search operates in three modes controlled by ENABLE_TOOL_SEARCH:
Mode 1: tst (Default)
Always defer MCP and shouldDefer tools. This is the right default — if you're using MCP tools, you've already accepted the latency tradeoff for a larger effective context window.
Mode 2: tst-auto
Threshold-based deferral. Only defer when tools exceed a token budget. Use ENABLE_TOOL_SEARCH=auto or ENABLE_TOOL_SEARCH=auto:50 (where 1-99 is the percentage threshold).
Mode 3: standard
Never defer. Use ENABLE_TOOL_SEARCH=false to disable completely.
The Snapshot Mechanism
Discovered tools are preserved across context compaction through a snapshot system. When compaction occurs, the system takes a snapshot of:
- All currently loaded tools
- The ToolSearch tool
- The deferral state
This ensures Claude doesn't lose access to tools it's already discovered, even as the conversation context gets trimmed.
What This Means For Your MCP Servers
If you're building MCP servers, consider:
- Mark critical tools with
alwaysLoad: If a tool is needed on nearly every turn (like a primary database query), opt it out of deferral - Write clear tool descriptions: Tool search uses semantic matching, so good descriptions improve discovery accuracy
- Group related tools: Tools with similar prefixes or descriptions will be discovered together
Try It Now
Check your current configuration:
echo $ENABLE_TOOL_SEARCH
If you're not seeing tool search benefits, ensure:
- You have MCP tools configured
- You're not in
standardmode - Your MCP servers aren't all marked
alwaysLoad
For maximum savings with minimal latency impact, use the default tst mode. The one-turn discovery overhead is negligible compared to the context window preservation.
When To Use Each Mode
- Default (
tst): Most users — balances savings with discovery latency tst-auto:20: If you're sensitive to latency but still want some savingsstandard: Only if you have very few MCP tools or need every tool available immediately
This system represents a fundamental shift in how Claude Code handles tool ecosystems. Instead of paying the token cost for every possible tool upfront, you now pay only for what you actually use.








