Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A bar chart comparing MCP server performance versus indexed context, showing higher token usage and task failure…

Glean benchmark: Off-the-shelf MCP costs 30% more tokens than indexed context

Glean benchmark: off-the-shelf MCP in Claude Cowork loses 2.5x more tasks and uses 30% more tokens than indexed context.

AAAla SMITH & AI Research Desk·3h ago·3 min read··5 views·AI-Generated·Report error

Source: x.comvia @hasantoxrSingle Source

How much more expensive are off-the-shelf MCP servers compared to a properly indexed context layer in Claude Cowork?

Glean's benchmark of MCP servers in Claude Cowork found off-the-shelf MCP loses 2.5x more tasks and uses 30% more tokens than a properly indexed context layer, per @hasantoxr.

TL;DR

Glean benchmarked MCP servers vs indexed context in Claude Cowork · Off-the-shelf MCP loses 2.5x more often · MCP burns 30% more tokens per task

Glean benchmarked MCP servers inside Claude Cowork and found off-the-shelf MCP loses 2.5x more tasks. The same setup burns 30% more tokens than a properly indexed context layer.

Key facts

Off-the-shelf MCP loses 2.5x more tasks than indexed context in Claude Cowork
Off-the-shelf MCP burns 30% more tokens per task
User reported cutting Claude token bill by 30% using Glean's approach
Glean's benchmark is the first public comparison of MCP servers inside Claude Cowork
Methodology details (task set, trials) were not disclosed

A new benchmark from Glean, shared by @hasantoxr, provides the first real-world comparison of MCP server performance inside Claude Cowork. The data shows that off-the-shelf MCP servers — the ones most teams are wiring up today — fail 2.5x more often and consume 30% more tokens per task than Glean's indexed context layer [According to @hasantoxr].

Why this matters more than the press release suggests

This is not just a vendor comparison. It reveals a structural inefficiency in the current MCP ecosystem. Most teams wire up MCP servers naively — dumping full tool outputs into the context window without indexing or retrieval. Glean's benchmark suggests that approach wastes tokens and degrades reliability. The 30% token savings translates directly to cost: a user reported cutting their Claude token bill by 30% using Glean's method [Per @hasantoxr].

How the benchmark works

Glean's test measures task completion rate and token consumption across two setups: off-the-shelf MCP servers (the default wiring most developers use) versus Glean's indexed context layer, which pre-processes and retrieves only relevant context. The indexed layer reduced failures by 2.5x and cut token usage by 30% [Per the tweet thread].

Who this affects

This matters for any team running Claude Cowork at scale — especially those building custom MCP integrations for enterprise workflows. The token cost differential directly impacts operating margins for heavy Claude users. Teams that invest in proper context indexing (whether via Glean or a custom solution) will see immediate cost and reliability improvements.

Limitations

Glean's benchmark is not independent — it compares its own product against an unspecified baseline of 'off-the-shelf MCP.' The exact task set, number of trials, and token measurement methodology were not disclosed [According to the source]. The 30% figure may not generalize to all MCP configurations or all task types.

What to watch

Watch for independent replication of this benchmark, ideally from a neutral party like LMSYS or Artifact. If the 30% token savings holds across diverse task sets, expect a wave of teams migrating from naive MCP wiring to indexed context layers — and a potential pricing response from MCP server providers.

Sources cited in this article

User

Source: gentic.news · 3h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This benchmark exposes a hidden tax on the current MCP ecosystem. Most teams wire MCP servers naively, dumping all tool outputs into the context window without any retrieval or indexing. That approach wastes tokens and degrades reliability. Glean's data suggests the cost is material — 30% token overhead and 2.5x more failures. The interesting question is whether this is a Glean-specific advantage or a general principle. Their indexed context layer is proprietary, but the underlying idea — retrieve only relevant context, don't dump everything — is well-known in information retrieval. Any team could build a similar layer using vector search or keyword retrieval. What's missing is an independent benchmark. Glean is comparing its own product against an unspecified baseline. The 30% figure may be real but may not generalize. Still, the structural observation is correct: naive MCP wiring is wasteful. Teams that invest in proper context indexing will see real savings.

#claude #mcp #ai engineering #benchmarks

Mentioned in this article

Glean Claude Cowork

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research

MCP vs CLI Debate Resolved by Anthropic's Code Mode: 98.7% Token Drop

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in AI Research

View all

Large Hadron Collider tunnel with glowing blue detector components, scientists monitoring control room screens…

AI Research

Collider-Bench Tests LLM Agents on LHC Analysis Reproduction

Collider-Bench tests LLM agents on reproducing LHC analyses from papers. No agent beats physicist-in-the-loop, highlighting gaps in scientific reasoning.

arxiv.org/11h ago/3 min read/Widely Reported

benchmarksai researchscience

A glowing blue digital shield with a mythical winged figure in the center, surrounded by abstract network lines and…

AI ResearchBreakthrough

100

Claude Mythos Clears All UK Cyberattack Simulators, Doubling Speed Revised

Claude Mythos Preview became the first AI model to clear all UK AISI cyberattack simulations, forcing the agency to double its capability-doubling estimate twice in five months.

the-decoder.com/1d ago/3 min read/Widely Reported

anthropicai safetycybersecurity

Diagram of Hermes agent's three-tier memory architecture with MEMORY.md and USER.md files as tier 1 core…

AI Research

Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core

Hermes agent's three-tier memory uses two tiny markdown files (2,200 chars), SQLite FTS5 search (10ms over 10K docs), and 8 pluggable providers. The composition solves the always-on vs. deep recall trade-off.

x.com/1d ago/3 min read/Multi-Source

open sourceai agentsmemory systems

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Claude Mythos Clears All UK Cyberattack Simulators, Doubling Speed Revised

RRCM Uses GRPO to Decide When to Retrieve for LLM Recommendation

Simple Graph Heuristic Beats Generative Recommenders on 10 of 14 Benchmarks

AMD ROCm Performance Jumps 75x in 14 Days Post-DeepSeek v4

Claude Code's Six-Layer Architecture: Harness, Not Magic

MCP vs CLI Debate Resolved by Anthropic's Code Mode: 98.7% Token Drop

The framework underneath this story

More in AI Research

Collider-Bench Tests LLM Agents on LHC Analysis Reproduction

Claude Mythos Clears All UK Cyberattack Simulators, Doubling Speed Revised

Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core