A viral technical thread from AI engineer Akshay Pachaar is making a compelling case for a classic piece of search infrastructure: the BM25 ranking algorithm. The argument is a direct counter-narrative to the industry's rush to apply vector embeddings and neural search to every retrieval problem.
Pachaar's core thesis is that BM25—a probabilistic retrieval function developed in the 1990s—remains not just relevant, but essential. It powers the core search functionality in Elasticsearch, OpenSearch, and countless other production systems, requiring zero training data, no embedding models, and no fine-tuning.
How BM25 Works: Three Simple Questions
The thread breaks down BM25's elegance through an intuitive analogy: searching for "transformer attention mechanism" in a library of machine learning papers. The algorithm effectively asks three statistical questions about each document:
"How rare is this word?" This is the Inverse Document Frequency (IDF) component. Common words like "the" or "is" are nearly worthless for ranking, but a specific term like "transformer" is highly informative. BM25 automatically boosts the weight of rare, distinctive terms.
"How many times does it appear?" This is the term frequency component,
f(qᵢ, D), modulated by a saturation parameterk₁. If "attention" appears 10 times in a paper, that's a strong signal. However, BM25 applies diminishing returns; a document with 100 occurrences isn't considered 10x more relevant than one with 10. This prevents spammy keyword stuffing from dominating results."Is this document unusually long?" A 50-page paper will naturally contain more keyword mentions than a 5-page abstract. BM25 uses document length normalization (controlled by parameter
b) to level the playing field, preventing long documents from artificially ranking higher.
The result is a robust, interpretable, and computationally cheap scoring function. As Pachaar notes, "Three questions. No neural networks. No training data. Just elegant math."
The Critical Weakness of Embeddings: Exact Matching
The thread highlights a key, often overlooked weakness of pure vector search: its struggle with exact keyword matching. Dense retrieval models are designed to find semantic similarity. If a user searches for a specific "error code 5012," a vector search might return documents about "HTTP 500 errors" or "troubleshooting steps," based on semantic proximity. BM25, in contrast, will efficiently surface the exact document containing that precise string.
This failure mode is particularly damaging in technical, legal, or diagnostic search contexts where precision is non-negotiable.
The Hybrid Search Imperative
The logical conclusion, and the current state-of-the-art in production Retrieval-Augmented Generation (RAG) systems, is hybrid search. This approach combines the strengths of both worlds:
- BM25 for precise lexical (keyword) recall.
- Vector Search for semantic, conceptual understanding.
The scores from both retrieval methods are combined (often via weighted reciprocal rank fusion) to produce a final ranked list. This gives users the "best of both worlds": the ability to find documents that talk about a concept in different words and documents that contain the exact terminology they're looking for.
Pachaar's final recommendation is a call for engineering pragmatism: "So before you throw GPUs at every search problem, consider BM25. It might already solve your problem, or make your semantic search even better when combined."
gentic.news Analysis
This thread taps into a significant and growing undercurrent in AI engineering: the re-evaluation of classical techniques in the age of deep learning. While the narrative often focuses on the latest 100B-parameter model, practical system architecture frequently involves blending new and old methods. We saw a similar pattern in our coverage of Chroma's hybrid search API launch last year, which formalized this exact BM25+vectors approach for the vector database ecosystem.
The argument for BM25 aligns with a broader trend of cost-aware and deterministic AI. As companies like OpenAI, Anthropic, and Google push the boundaries of semantic understanding with models like GPT-4o and Claude 3.5, their embedding APIs add latency and cost. For many use cases—especially those requiring high recall of exact strings—a deterministic, zero-cost algorithm like BM25 is not just "good enough," it's superior. This creates a layered architecture where cheap, rule-based systems handle predictable tasks, reserving expensive neural inference for problems that truly require semantic reasoning.
Furthermore, this discussion directly impacts the RAG optimization pipeline. Many teams struggling with RAG performance immediately look to re-embedding, chunking strategies, or finer-grained vector search. Pachaar's thread is a crucial reminder that the first step should be auditing the retrieval layer itself. Often, simply adding a parallel BM25 retrieval path and fusing the results yields a greater performance lift than weeks of tuning embedding models, at a fraction of the cost and complexity. It's a classic case of an 80/20 solution being overlooked in the pursuit of a "full AI" stack.
Frequently Asked Questions
What is BM25 used for today?
BM25 is the core ranking algorithm for full-text search in widely-used search engines like Elasticsearch and OpenSearch. It is responsible for scoring and ranking documents based on a user's query keywords. Its primary use is in keyword-based retrieval systems, and it is increasingly being used as one half of a hybrid search system alongside vector-based semantic search.
Is BM25 better than vector search?
It is not universally better; it solves a different problem. BM25 excels at exact keyword and phrase matching. Vector search excels at finding semantically similar content even when different words are used. For most production search applications today, especially in RAG, the best approach is a hybrid of both, leveraging the strengths of each method.
Why is hybrid search important for RAG?
Hybrid search dramatically improves the reliability of the retrieval step in RAG. Pure vector search can miss critical documents that contain the exact key terms a user is looking for (e.g., a product code, error ID, or legal citation). By combining BM25's lexical recall with a vector model's semantic recall, the system is far more likely to retrieve the most relevant context for the large language model, leading to more accurate and trustworthy answers.
Do I need to train a BM25 model?
No. This is one of its major advantages. BM25 is a statistical ranking function, not a machine learning model. It has tunable parameters (k₁ and b), but it requires no training dataset, gradient descent, or fine-tuning. It operates directly on the term statistics of your document corpus.









