Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A bar chart comparing grep and vector search performance across multiple model-harness pairs, with grep consistently…

Grep Beats Vector Search in Agent Benchmarks, New Paper Finds

Grep beats vector search on LongMemEval across all harness-model pairs, showing agent design matters more than retrieval method for evidence-location tasks.

AAAla SMITH & AI Research Desk·May 17, 2026·3 min read··93 views·AI-Generated·Report error

Source: x.comvia @rohanpaul_aiSingle Source

Does grep beat vector search for AI agents?

A new arXiv paper (2605.15184) finds grep-style search beats vector retrieval across every harness-model pair on LongMemEval, suggesting agent design matters more than retrieval method for evidence-location tasks.

TL;DR

Grep outperforms vector retrieval on LongMemEval. · Agent harness matters more than search method. · Evidence-location tasks favor exact search over embeddings.

A new paper (arXiv 2605.15184) shows grep-style search beats vector retrieval across every harness-model pair on LongMemEval. The finding undermines the default assumption that every serious agent stack needs embeddings.

Key facts

Paper: arXiv 2605.15184
Benchmark: LongMemEval
Inline grep beats inline vector across all harness-model pairs.
Grep wins on evidence-location tasks (names, dates, file paths).
Retrieval method performance depends on agent harness design.

The paper, "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search," compares grep and vector retrieval on LongMemEval, where agents recover facts from long conversation histories full of distractors. [According to @rohanpaul_ai's summary] Inline grep beats inline vector across every harness-model pair in their main experiment, sometimes by wide margins.

The surprising result is not that grep is powerful, but that agent design makes it powerful. The paper says not that grep beats vectors, but that agents fail or win through their harness.

Why Grep Wins for Evidence-Location

Grep Contains Quotes

When the answer is anchored in literal evidence—names, dates, file paths, function names, error strings, user preferences—grep gives the model a clean mechanical advantage. Embeddings are built to tolerate paraphrase, but tolerance has a cost: they can pull in semantically nearby clutter, especially when a short agent query is vague. Grep has the opposite failure mode: dumb, cheap, and narrow, but when the agent knows the right string to hunt for, dumb becomes a feature.

The unique take here is that retrieval is not a component you can benchmark in isolation. The same search method behaves differently depending on whether results are injected inline, written to files, routed through a CLI, or wrapped in a custom agent loop.

What This Means for Agent Architecture

BigQuery Vector Search: A Practition…

For coding agents, a surprising amount of work is evidence-location: find the symbol, trace the call, inspect the diff, read the failing test, recover the exact line. Vectors still matter at scale and for fuzzy conceptual search, but this paper weakens the lazy default that every serious agent stack begins with embeddings. Sometimes the upgrade is not a smarter index—it is giving the model primitive tools, clean files, disciplined context, and a harness that lets exact search do exact work.

What to watch

Watch for follow-up ablation studies on agent harness design versus retrieval method, and whether companies like LangChain or Anthropic incorporate exact-search primitives into their agent frameworks in H2 2026.

Source: gentic.news · May 17, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The paper's core insight—that agent harness design shapes retrieval effectiveness more than the retrieval method itself—echoes a pattern seen in recent work on tool-use agents. In 2025, multiple papers from Google DeepMind and Microsoft showed that how results are presented to the model (inline vs. file vs. CLI) changes accuracy by 10-20 points on code-generation benchmarks. This paper extends that finding to retrieval, suggesting the field has over-rotated on embedding quality while under-investing in agent architecture. The contrarian take: vector databases may be overbuilt for the dominant use case in coding agents—exact symbol and error-string lookup. For these tasks, a simple grep harness with proper context management could match or exceed performance at a fraction of the cost. The paper does not test at scale (millions of documents), but for codebases under 100K files, grep's O(n) scan is often fast enough. Limitation: the paper only tests on LongMemEval, which is designed for evidence-location. Results may not generalize to open-ended semantic search or knowledge retrieval. The authors do not disclose compute costs or latency benchmarks, which would be critical for production deployment.

#agents #research #retrieval

Mentioned in this article

LongMemEval Grep

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research

MirrorCode Rebuilds Programs from Behavior Alone, Beats GPT-4o by 37%

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

Grep Beats Vector Search in Agent Benchmarks, New Paper Finds

Why Grep Wins for Evidence-Location

What This Means for Agent Architecture

What to watch

AI Analysis

✨AI Toolslive

Related Articles

Epoch AI's EBR-Bench: Top Models Score 30-50% on Experience-Based Reasoning

Google TPU Humufish Drops TSMC CoWoS for Intel EMIB-T

NVIDIA Blackwell Cuts DeepSeek V4 Token Costs 5x in One Month

Meituan Open-Sources 1.6T-Parameter LongCat-2.0 Trained on Domestic Chips

Instacart Uses PyFixest to Solve High-Cardinality Fixed Effects in

MirrorCode Rebuilds Programs from Behavior Alone, Beats GPT-4o by 37%

The framework underneath this story

More in AI Research

DART: One-Shot Robot Adaptation via Weight Space Arithmetic

ELDR: Expert-Locality Decode Routing Cuts MoE TPOT by 13.9%