Glean benchmarked MCP servers inside Claude Cowork and found off-the-shelf MCP loses 2.5x more tasks. The same setup burns 30% more tokens than a properly indexed context layer.
Key facts
- Off-the-shelf MCP loses 2.5x more tasks than indexed context in Claude Cowork
- Off-the-shelf MCP burns 30% more tokens per task
- User reported cutting Claude token bill by 30% using Glean's approach
- Glean's benchmark is the first public comparison of MCP servers inside Claude Cowork
- Methodology details (task set, trials) were not disclosed
A new benchmark from Glean, shared by @hasantoxr, provides the first real-world comparison of MCP server performance inside Claude Cowork. The data shows that off-the-shelf MCP servers — the ones most teams are wiring up today — fail 2.5x more often and consume 30% more tokens per task than Glean's indexed context layer [According to @hasantoxr].
Why this matters more than the press release suggests
This is not just a vendor comparison. It reveals a structural inefficiency in the current MCP ecosystem. Most teams wire up MCP servers naively — dumping full tool outputs into the context window without indexing or retrieval. Glean's benchmark suggests that approach wastes tokens and degrades reliability. The 30% token savings translates directly to cost: a user reported cutting their Claude token bill by 30% using Glean's method [Per @hasantoxr].
How the benchmark works
Glean's test measures task completion rate and token consumption across two setups: off-the-shelf MCP servers (the default wiring most developers use) versus Glean's indexed context layer, which pre-processes and retrieves only relevant context. The indexed layer reduced failures by 2.5x and cut token usage by 30% [Per the tweet thread].
Who this affects
This matters for any team running Claude Cowork at scale — especially those building custom MCP integrations for enterprise workflows. The token cost differential directly impacts operating margins for heavy Claude users. Teams that invest in proper context indexing (whether via Glean or a custom solution) will see immediate cost and reliability improvements.
Limitations
Glean's benchmark is not independent — it compares its own product against an unspecified baseline of 'off-the-shelf MCP.' The exact task set, number of trials, and token measurement methodology were not disclosed [According to the source]. The 30% figure may not generalize to all MCP configurations or all task types.
What to watch
Watch for independent replication of this benchmark, ideally from a neutral party like LMSYS or Artifact. If the 30% token savings holds across diverse task sets, expect a wave of teams migrating from naive MCP wiring to indexed context layers — and a potential pricing response from MCP server providers.









