ai retrieval

30 articles about ai retrieval in AI news

ReBOL: A New AI Retrieval Method Combines Bayesian Optimization with LLMs to Improve Search

Researchers propose ReBOL, a retrieval method using Bayesian Optimization and LLM relevance scoring. It outperforms standard LLM rerankers on recall, achieving 46.5% vs. 35.0% recall@100 on one dataset, with comparable latency. This is a technical advance in information retrieval.

Mar 24, 202676% relevant

Perplexity's pplx-embed: The Bidirectional Breakthrough Transforming Web-Scale AI Retrieval

Perplexity has launched pplx-embed, a new family of multilingual embedding models that set state-of-the-art benchmarks for web-scale retrieval. Built on Qwen3 architecture with bidirectional attention, these models specifically address the noise and complexity of real-world web data.

Feb 27, 202675% relevant

Future-Proof Your AI Search: Why Static Knowledge Bases Fail Luxury Retail

New research reveals AI retrieval benchmarks degrade over time as information changes. For luxury brands using AI for product recommendations and clienteling, this means static knowledge bases become stale, hurting customer experience and sales.

Mar 6, 202660% relevant

WebAI's Open-Source Model Hits #1 on MTEB Retrieval Leaderboard

WebAI has open-sourced a document retrieval model that currently holds the #1 position on the Massive Text Embedding Benchmark (MTEB) leaderboard. This provides a high-performance, free alternative to closed-source embedding APIs used in Retrieval-Augmented Generation (RAG) pipelines.

Apr 17, 202687% relevant

Walmart Research Proposes Unified Training for Sponsored Search Retrieval

A new arXiv preprint details Walmart's novel bi-encoder training framework for sponsored search retrieval. It addresses the limitations of using user engagement as a sole training signal by combining graded relevance labels, retrieval priors, and engagement data. The method outperformed the production system in offline and online tests.

Apr 10, 202699% relevant

8 RAG Architectures Explained for AI Engineers: From Naive to Agentic Retrieval

A technical thread explains eight distinct RAG architectures with specific use cases, from basic vector similarity to complex agentic systems. This provides a practical framework for engineers choosing the right approach for different retrieval tasks.

Apr 3, 202685% relevant

Anthropic: AI agents fail biology retrieval, miss 261 Ebola sequences

Anthropic research shows Claude Sonnet 4 returning 5–106 Ebola sequences instead of 266, shifting outbreak origin from 2014 to 1922. Repeatable retrieval tool fixes the variance.

Jun 8, 202687% relevant

Zalando Introduces MLLM-Based Evaluation for Product Retrieval

Zalando presents a multimodal LLM-based evaluation for product retrieval, aiming to enhance search relevance in e-commerce. This matters as it could set a new standard for assessing AI in retail search.

Jun 21, 202692% relevant

AFMRL: Using MLLMs to Generate Attributes for Better Product Retrieval in

AFMRL uses MLLMs to generate product attributes, then uses those attributes to train better multimodal representations for e-commerce retrieval. Achieves SOTA on large-scale datasets.

Apr 23, 202684% relevant

A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search

A new research paper presents a reference architecture for 'agentic hybrid retrieval' that orchestrates BM25, dense embeddings, and LLM agents to handle underspecified queries against sparse metadata. It introduces offline metadata augmentation and analyzes two architectural styles for quality attributes like governance and performance.

Apr 21, 202684% relevant

Skill-RAG Uses Hidden-State Probes to Trigger Retrieval Only When Needed

Researchers introduced Skill-RAG, a system that uses hidden-state probing to detect when an LLM is about to fail, triggering targeted retrieval. This improves over uniform RAG baselines on HotpotQA, Natural Questions, and TriviaQA.

Apr 20, 202685% relevant

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation

Researchers propose AdaRankLLM, a framework that dynamically decides when to retrieve external data for LLMs. It reduces computational overhead while maintaining performance, shifting adaptive retrieval's role based on model strength.

Apr 20, 202674% relevant

Indexing Multimodal LLMs for Large-Scale Image Retrieval

A new arXiv paper proposes using Multimodal LLMs (MLLMs) for instance-level image-to-image retrieval. By prompting models with paired images and converting next-token probabilities into scores, the method enables training-free re-ranking. It shows superior robustness to clutter and occlusion compared to specialized models, though struggles with severe appearance changes.

Apr 16, 202672% relevant

Nemotron ColEmbed V2: NVIDIA's New SOTA Embedding Models for Visual Document Retrieval

NVIDIA researchers have released Nemotron ColEmbed V2, a family of three models (3B, 4B, 8B parameters) that set new state-of-the-art performance on the ViDoRe benchmark for visual document retrieval. The models use a 'late interaction' mechanism and are built on top of pre-trained VLMs like Qwen3-VL and NVIDIA's own Eagle 2. This matters because it directly addresses the challenge of retrieving information from visually rich documents like PDFs and slides within RAG systems.

Apr 2, 202674% relevant

GRank: A New Target-Aware, Index-Free Retrieval Paradigm for Billion-Scale Recommender Systems

A new paper introduces GRank, a structured-index-free retrieval framework that unifies target-aware candidate generation with fine-grained ranking. It significantly outperforms tree- and graph-based methods on recall and latency, and is already deployed at massive scale.

Apr 2, 202683% relevant

FGR-ColBERT: A New Retrieval Model That Pinpoints Relevant Text Spans Efficiently

A new arXiv paper introduces FGR-ColBERT, a modified ColBERT retrieval model that integrates fine-grained relevance signals distilled from an LLM. It achieves high token-level accuracy while preserving retrieval efficiency, offering a practical alternative to post-retrieval LLM analysis.

Apr 2, 202672% relevant

Storing Less, Finding More: Novelty Filtering Architecture for Cross-Modal Retrieval on Edge Cameras

A new streaming retrieval architecture uses an on-device 'epsilon-net' filter to retain only semantically novel video frames, dramatically improving cross-modal search accuracy while reducing power consumption to 2.7 mW. This addresses the fundamental problem of redundant frames crowding out correct results in continuous video streams.

Apr 1, 202682% relevant

Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy

Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.

Mar 31, 202685% relevant

New Benchmark and Methods Target Few-Shot Text-to-Image Retrieval for Complex Queries

Researchers introduce FSIR-BD, a benchmark for few-shot text-to-image retrieval, and two optimization methods to improve performance on compositional and out-of-distribution queries. This addresses a key weakness in pre-trained vision-language models.

Mar 30, 202686% relevant

Sparton: A New GPU Kernel Dramatically Speeds Up Learned Sparse Retrieval

Researchers propose Sparton, a fused Triton GPU kernel for Learned Sparse Retrieval models like Splade. It avoids materializing a massive vocabulary-sized matrix, achieving up to 4.8x speedups and 26x larger batch sizes. This is a core infrastructure breakthrough for efficient AI-powered search.

Mar 27, 202672% relevant

Federated RAG: A New Architecture for Secure, Multi-Silo Knowledge Retrieval

Researchers propose a secure Federated Retrieval-Augmented Generation (RAG) system using Flower and confidential compute. It enables LLMs to query knowledge across private data silos without centralizing sensitive documents, addressing a major barrier for enterprise AI.

Mar 27, 202672% relevant

Mistral Forge Targets RAG, Sparking Debate on Custom Models vs. Retrieval

Mistral AI's new 'Forge' platform reportedly focuses on custom model creation, challenging the prevailing RAG paradigm. This reignites the strategic debate between fine-tuning and retrieval-augmented generation for enterprise AI.

Mar 25, 202695% relevant

flexvec: A New SQL Kernel for Programmable Vector Retrieval

A new research paper introduces flexvec, a retrieval kernel that exposes the embedding matrix and score array as a programmable surface via SQL, enabling complex, real-time query-time operations called Programmatic Embedding Modulation (PEM). This approach allows AI agents to dynamically manipulate retrieval logic and achieves sub-100ms performance on million-scale corpora on a CPU.

Mar 25, 202676% relevant

New Research Reveals Fundamental Limitations of Vector Embeddings for Retrieval

A new theoretical paper demonstrates that embedding-based retrieval systems have inherent limitations in representing complex relevance relationships, even with simple queries. This challenges the assumption that better training data alone can solve all retrieval problems.

Mar 13, 202697% relevant

Differentiable Geometric Indexing: A Technical Breakthrough for Generative Retrieval Systems

New research introduces Differentiable Geometric Indexing (DGI), solving core optimization and geometric conflicts in generative retrieval. This enables end-to-end training that better surfaces long-tail items, validated on e-commerce datasets.

Mar 12, 202679% relevant

Beyond Simple Retrieval: The Rise of Agentic RAG Systems That Think for Themselves

Traditional RAG systems are evolving into 'agentic' architectures where AI agents actively control the retrieval process. A new 5-layer evaluation framework helps developers measure when these intelligent pipelines make better decisions than static systems.

Mar 11, 202681% relevant

RF-Mem: A Dual-Path Memory Retrieval System for Personalized LLMs

Researchers propose RF-Mem, a memory retrieval system for LLMs that mimics human cognitive processes. It adaptively switches between fast 'familiarity' and deep 'recollection' paths to personalize responses efficiently, outperforming existing methods under constrained budgets.

Mar 11, 202677% relevant

New Research Shows Pre-Aligned Multi-Modal Models Advance 3D Shape Retrieval from Images

A new arXiv paper demonstrates that pre-aligned image and 3D shape encoders, combined with hard contrastive learning, achieve state-of-the-art performance for image-based shape retrieval. This enables zero-shot retrieval without database-specific training.

Mar 10, 202675% relevant

Google's STATIC Framework Revolutionizes LLM Retrieval with 948x Speed Boost

Google AI's STATIC framework uses sparse matrix computation to accelerate constrained decoding in generative retrieval systems by up to 948x. This breakthrough enables LLMs to enforce business logic while maintaining real-time performance in recommendation systems.

Mar 1, 202675% relevant

Beyond Relevance: A New Framework for Utility-Centric Retrieval in the LLM Era

This tutorial paper posits that the rise of Retrieval-Augmented Generation (RAG) changes the fundamental goal of information retrieval. Instead of finding documents relevant to a query, systems must now retrieve information that is most *useful* to an LLM for generating a high-quality answer. This requires new evaluation frameworks and system designs.

Apr 13, 202692% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety