search & retrieval
30 articles about search & retrieval in AI news
Walmart Research Proposes Unified Training for Sponsored Search Retrieval
A new arXiv preprint details Walmart's novel bi-encoder training framework for sponsored search retrieval. It addresses the limitations of using user engagement as a sole training signal by combining graded relevance labels, retrieval priors, and engagement data. The method outperformed the production system in offline and online tests.
A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search
A new research paper presents a reference architecture for 'agentic hybrid retrieval' that orchestrates BM25, dense embeddings, and LLM agents to handle underspecified queries against sparse metadata. It introduces offline metadata augmentation and analyzes two architectural styles for quality attributes like governance and performance.
ColBERT-Att: New Research Enhances Neural Retrieval by Integrating Attention into Late Interaction
Researchers propose ColBERT-Att, a novel neural information retrieval model that integrates attention weights into the late-interaction framework. The method shows improved recall accuracy on standard benchmarks like MS-MARCO, BEIR, and LoTTE.
ReBOL: A New AI Retrieval Method Combines Bayesian Optimization with LLMs to Improve Search
Researchers propose ReBOL, a retrieval method using Bayesian Optimization and LLM relevance scoring. It outperforms standard LLM rerankers on recall, achieving 46.5% vs. 35.0% recall@100 on one dataset, with comparable latency. This is a technical advance in information retrieval.
New Research Proposes Authority-aware Generative Retrieval (AuthGR) for
A new arXiv paper introduces an Authority-aware Generative Retriever (AuthGR) framework. It uses multimodal signals to score document trustworthiness and trains a model to prioritize authoritative sources. Large-scale online A/B tests on a commercial search platform report significant improvements in user engagement and reliability.
New Research Reveals Fundamental Limitations of Vector Embeddings for Retrieval
A new theoretical paper demonstrates that embedding-based retrieval systems have inherent limitations in representing complex relevance relationships, even with simple queries. This challenges the assumption that better training data alone can solve all retrieval problems.
New Research Validates Retrieval Metrics as Proxies for RAG Information Coverage
A new arXiv study systematically examines the relationship between retrieval quality and RAG generation effectiveness. It finds strong correlations between coverage-based retrieval metrics and the information coverage in final responses, providing empirical support for using retrieval metrics as performance indicators.
New Research Improves Text-to-3D Motion Retrieval with Interpretable Fine-Grained Alignment
Researchers propose a novel method for retrieving 3D human motion sequences from text descriptions using joint-angle motion images and token-patch interaction. It outperforms state-of-the-art methods on standard benchmarks while offering interpretable correspondences.
New Research Shows Pre-Aligned Multi-Modal Models Advance 3D Shape Retrieval from Images
A new arXiv paper demonstrates that pre-aligned image and 3D shape encoders, combined with hard contrastive learning, achieve state-of-the-art performance for image-based shape retrieval. This enables zero-shot retrieval without database-specific training.
Beyond Simple Search: How Advanced Image Retrieval Transforms Luxury Discovery
New research reveals major flaws in current visual search tech. For luxury retail, this means missed sales from poor multi-item inspiration and inconsistent results. A new benchmark and method promise more accurate, nuanced product discovery.
Grep Beats Vector Search in Agent Benchmarks, New Paper Finds
Grep beats vector search on LongMemEval across all harness-model pairs, showing agent design matters more than retrieval method for evidence-location tasks.
ECLASS-Augmented Semantic Product Search
Researchers systematically evaluated LLM-assisted dense retrieval for semantic product search on industrial electronic components. Augmenting embeddings with ECLASS hierarchical metadata created a crucial semantic bridge, achieving 94.3% Hit_Rate@5 versus 31.4% for BM25.
Skill-RAG Uses Hidden-State Probes to Trigger Retrieval Only When Needed
Researchers introduced Skill-RAG, a system that uses hidden-state probing to detect when an LLM is about to fail, triggering targeted retrieval. This improves over uniform RAG baselines on HotpotQA, Natural Questions, and TriviaQA.
Rethinking the Necessity of Adaptive Retrieval-Augmented Generation
Researchers propose AdaRankLLM, a framework that dynamically decides when to retrieve external data for LLMs. It reduces computational overhead while maintaining performance, shifting adaptive retrieval's role based on model strength.
FRAGATA: A Hybrid RAG System for Semantic Search Over 20 Years of HPC
A new paper details FRAGATA, a system enabling semantic search over two decades of technical support tickets at a supercomputing center. It uses hybrid retrieval-augmented generation (RAG) to find relevant past incidents despite typos, language, or wording differences, showing a qualitative improvement over the legacy search.
New Research Proposes Lightweight Method to Fix Stale Semantic IDs in
Researchers propose a method to update 'stale' Semantic IDs in generative retrieval systems without full retraining. Their alignment technique improves key metrics and reduces compute costs by ~8-9x, addressing a core challenge in dynamic recommendation environments.
From BM25 to Corrective RAG: A Benchmark Study Challenges the Dominance of Semantic Search for Tabular Data
A systematic benchmark of 10 RAG retrieval strategies on a financial QA dataset reveals that a two-stage hybrid + reranking pipeline performs best. Crucially, the classic BM25 algorithm outperformed modern dense retrieval models, challenging a core assumption in semantic search. The findings provide actionable, cost-aware guidance for building retrieval systems over heterogeneous documents.
Nemotron ColEmbed V2: NVIDIA's New SOTA Embedding Models for Visual Document Retrieval
NVIDIA researchers have released Nemotron ColEmbed V2, a family of three models (3B, 4B, 8B parameters) that set new state-of-the-art performance on the ViDoRe benchmark for visual document retrieval. The models use a 'late interaction' mechanism and are built on top of pre-trained VLMs like Qwen3-VL and NVIDIA's own Eagle 2. This matters because it directly addresses the challenge of retrieving information from visually rich documents like PDFs and slides within RAG systems.
Storing Less, Finding More: Novelty Filtering Architecture for Cross-Modal Retrieval on Edge Cameras
A new streaming retrieval architecture uses an on-device 'epsilon-net' filter to retain only semantically novel video frames, dramatically improving cross-modal search accuracy while reducing power consumption to 2.7 mW. This addresses the fundamental problem of redundant frames crowding out correct results in continuous video streams.
Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy
Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.
Late Interaction Retrieval Models Show Length Bias, MaxSim Operator Efficiency Confirmed in New Study
New arXiv research analyzes two dynamics in Late Interaction retrieval models: a documented length bias in scoring and the efficiency of the MaxSim operator. Findings validate theoretical concerns and confirm the pooling method's effectiveness, with implications for high-precision search systems.
New Benchmark and Methods Target Few-Shot Text-to-Image Retrieval for Complex Queries
Researchers introduce FSIR-BD, a benchmark for few-shot text-to-image retrieval, and two optimization methods to improve performance on compositional and out-of-distribution queries. This addresses a key weakness in pre-trained vision-language models.
Federated RAG: A New Architecture for Secure, Multi-Silo Knowledge Retrieval
Researchers propose a secure Federated Retrieval-Augmented Generation (RAG) system using Flower and confidential compute. It enables LLMs to query knowledge across private data silos without centralizing sensitive documents, addressing a major barrier for enterprise AI.
Sparton: A New GPU Kernel Dramatically Speeds Up Learned Sparse Retrieval
Researchers propose Sparton, a fused Triton GPU kernel for Learned Sparse Retrieval models like Splade. It avoids materializing a massive vocabulary-sized matrix, achieving up to 4.8x speedups and 26x larger batch sizes. This is a core infrastructure breakthrough for efficient AI-powered search.
Add Semantic Search to Claude Code with pmem: A Local RAG That Cuts Token Costs 75%
Install pmem, a local RAG MCP server, to give Claude Code instant semantic search over your entire project's history, slashing token usage for file retrieval.
flexvec: A New SQL Kernel for Programmable Vector Retrieval
A new research paper introduces flexvec, a retrieval kernel that exposes the embedding matrix and score array as a programmable surface via SQL, enabling complex, real-time query-time operations called Programmatic Embedding Modulation (PEM). This approach allows AI agents to dynamically manipulate retrieval logic and achieves sub-100ms performance on million-scale corpora on a CPU.
RAG Fails at Boundaries, Not Search: A Critical Look at Chunking and Context Limits
An analysis argues that RAG system failures are often due to fundamental data boundary issues—chunking, context limits, and source segmentation—rather than search algorithm performance. This reframes the primary challenge for AI practitioners implementing knowledge retrieval.
Retrieval-Augmented LLM Agents: Combined Fine-Tuning and Experience Retrieval Boosts Unseen Task Generalization
Researchers propose a pipeline integrating supervised fine-tuning with in-context experience retrieval for LLM agents. The combined approach significantly improves generalization to unseen tasks compared to using either method alone.
Continual Fine-Tuning with Provably Accurate, Parameter-Free Task Retrieval: A New Paradigm for Sequential Model Adaptation
Researchers propose a novel continual fine-tuning method that combines adaptive module composition with clustering-based retrieval, enabling models to learn new tasks sequentially without forgetting old ones. The approach provides theoretical guarantees linking retrieval accuracy to cluster structure.
AMES: A Scalable, Backend-Agnostic Architecture for Multimodal Enterprise Search
Researchers propose AMES, a unified multimodal retrieval system using late interaction. It enables cross-modal search (text, image, video) within existing enterprise engines like Solr without major redesign, balancing speed and accuracy.