Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A developer stares at a cluttered terminal screen filled with thousands of API endpoint entries, surrounded by…
AI ResearchScore: 67

MCP Tool Overload Eats 1.1M Tokens — Code Mode Fixes It

MCP tool definitions for a 2,600-endpoint API consume 1.1M tokens, breaking agent context. Code mode using TypeScript types in under 1K tokens and sandboxed execution offers a fix.

·2d ago·4 min read··16 views·AI-Generated·Report error
Share:
Source: pub.towardsai.netvia towards_aiCorroborated
How does code mode fix MCP tool overload for AI agents?

Converting a large production API with 2,600 endpoints into MCP tools consumes 1.1 million tokens. Code mode replaces tools with TypeScript type definitions in under 1,000 tokens, and sandboxed execution resolves security concerns.

TL;DR

MCP tools consume 1.1M tokens for large APIs. · Code mode uses TypeScript types in under 1,000 tokens. · Sandboxed execution makes AI-written code safe.

A large production API with 2,600 endpoints consumes 1.1 million tokens when converted to MCP tools. The Model Context Protocol's design, meant to standardize agent-tool connections, instead creates a context window crisis.

Key facts

  • 1.1M tokens consumed by MCP tools for a large API.
  • 2,600 endpoints expressed as TypeScript types in under 1,000 tokens.
  • Sandboxed V8 execution blocks file system and network access.
  • Tool search uses ~2,100 tokens per query, only ~500 relevant.
  • MCP hit 10K servers and 97M monthly SDK downloads by May 2026.

Key Takeaways

  • MCP tool definitions for a 2,600-endpoint API consume 1.1M tokens, breaking agent context.
  • Code mode using TypeScript types in under 1K tokens and sandboxed execution offers a fix.

The Token Math That Breaks Agents

The Model Context Protocol (MCP), introduced by Anthropic in November 2024, aimed to standardize how AI agents connect to external services. Before MCP, every team bundled their own tools privately. MCP fixed duplication but created a new problem: scale.

According to the source, a large production API specification can contain 2.3 million tokens of documentation. Converting each endpoint into an MCP tool still yields 1.1 million tokens. Typical context windows run 100,000 to 200,000 tokens. The agent's working memory is exhausted before processing a single user request.

Teams first try splitting: one MCP server per service category. Sixteen servers for a large platform. That reduces tokens per server but forces users to pre-select servers before knowing what they need, and coverage drops—a product server might expose six tools when the full API has thirty endpoints.

Three Approaches to Progressive Discovery

The Tooling Bottleneck: An Overview of the AI/MCP Tool Overload Problem

Three real solutions share a common principle: progressive discovery—loading the right tool at the right moment rather than everything upfront.

The CLI Approach. The agent uses a command-line tool like a developer, navigating sub-commands and requesting parameter help. No tool definitions enter context. The catch: the agent needs shell access, ruling out cloud-hosted or sandboxed clients.

Tool Search. A user's question triggers keyword search; the eight most relevant tools load into context, using about 2,100 tokens per query (only ~500 relevant). Quality depends entirely on keyword matching accuracy. A bad match means the right tool never appears, and the agent confidently gives a wrong answer.

Code Mode. Instead of pre-loading tools, the agent receives TypeScript type definitions generated from the API specification. A complete API with 2,600 endpoints—which would consume 1.1 million tokens as tools—can be expressed in under 1,000 tokens. The agent writes a small function tailored to the current request, executes it, and returns the result. Every improvement to the API specification automatically improves the agent.

The Security Sandbox That Unblocked Code Mode

Code mode stalled because running AI-written code without review is a textbook security vulnerability. AI-generated code could read file systems, steal environment variables, or exfiltrate secrets. These were not theoretical; they happened to real teams.

What changed the equation was not a new model or code quality breakthrough, but where the code runs. Small, fast sandboxes built on the V8 engine—the same engine running JavaScript in web browsers—start in milliseconds and impose hard limits: no file system access, no network access by default, programmable guardrails. AI writes code, that code runs in isolation, and the dangerous parts of the world are simply not reachable. The security objection disappears because the blast radius is zero.

As of June 2026, MCP has hit 10,000 servers and 97 million monthly SDK downloads [as previously reported by gentic.news]. The tool overload problem is widespread, but code mode offers a path that doesn't require waiting for larger context windows.

What to watch

Watch for Anthropic's next MCP specification update—expected Q3 2026—to see if code mode becomes a recommended pattern. Also track whether OpenAI or Google adopt similar type-based tooling in their agent SDKs, which would signal industry convergence.


Source: pub.towardsai.net


Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The tool overload problem is a structural consequence of MCP's design, not a bug. MCP was built to standardize tool exposure, but its one-tool-per-endpoint model fails at enterprise scale where APIs have thousands of endpoints. The article correctly identifies that the intuitive fix—splitting into multiple servers—creates coverage gaps and forces premature user decisions. Code mode is elegant but has its own limitations. It assumes the agent can write correct TypeScript and that the generated types are comprehensive. For APIs with dynamic endpoints (e.g., query parameters that change per user), static types may not capture all valid states. The sandbox requirement also adds latency: each function call requires spinning up a V8 isolate, which, while fast at milliseconds, still adds overhead versus a direct tool call. The security argument is the strongest part. The industry has been debating whether to run AI-generated code since early 2023. Sandboxing is the right answer, and the V8-based approach is practical because it reuses existing browser security infrastructure. Expect more agent frameworks to adopt this pattern, especially as MCP adoption grows—97 million monthly SDK downloads as of May 2026 means the tool overload problem affects thousands of teams.
Compare side-by-side
Model Context Protocol vs TypeScript
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all