What Happened
A new research paper, flexvec: SQL Vector Retrieval with Programmatic Embedding Modulation, was posted to arXiv on March 23, 2026. The work proposes a fundamental shift in how retrieval systems are architected for an AI agent-driven future. The core thesis is that as AI agents become the primary consumers of retrieval APIs, the traditional "black box" retrieval pipeline is insufficient. Instead, systems should expose their internal state—specifically the embedding matrix and the intermediate score array—as a programmable surface.
The authors introduce flexvec, a retrieval kernel that does exactly this. It allows for Programmatic Embedding Modulation (PEM), defined as composing arithmetic operations on embeddings and scores at query time, before the final selection of results. These operations are integrated into a SQL interface via a "query materializer," creating a set of composable query primitives.
The performance claims are significant for a CPU-based, exact-search system: on a production corpus of 240,000 text chunks, three composed modulations execute in 19 ms end-to-end. Scaling to one million chunks, the same operations take 82 ms—all without resorting to approximate nearest neighbor (ANN) indexing, which trades accuracy for speed.
Technical Details
At its heart, flexvec reimagines the retrieval pipeline. In a standard vector search, a query embedding is compared (e.g., via cosine similarity) against a database of stored embeddings, producing a score array. The top-k items from this array are returned. This process is opaque and fixed.
flexvec breaks open this pipeline:
- Programmable Surfaces: It exposes two key data structures:
- The embedding matrix (all stored vectors).
- The score array (the intermediate similarity scores).
- Programmatic Embedding Modulation (PEM): Callers (like AI agents) can apply arithmetic operations to these surfaces. The paper describes a set of such operations, which could include:
- Weighting: Adjusting scores based on metadata (e.g., boost scores for items tagged "new season").
- Blending: Combining multiple query vectors with different weights before scoring.
- Filtering via Arithmetic: Applying conditional logic directly to the score array (e.g.,
IF(metadata_field == X, score * 1.5, score * 0.8)).
- SQL Integration: These operations are not just API calls; they are integrated into a SQL interface. A "query materializer" translates high-level SQL-like commands that include PEM operations into efficient execution plans. This makes the modulations declarative, composable, and familiar to developers working with data.
The result is a system where the retrieval logic is not hard-coded but dynamically programmable at runtime, enabling sophisticated, context-aware search behaviors that are difficult or inefficient to achieve with traditional RAG (Retrieval-Augmented Generation) pipelines.
Retail & Luxury Implications
While the paper is not retail-specific, the implications for high-end retail and luxury AI systems are profound. The core value proposition—dynamic, programmable retrieval—aligns perfectly with the complex, multi-faceted search needs of the sector.

1. Hyper-Personalized & Context-Aware Product Discovery: A luxury shopper's query is never just about a product. It's about heritage, craftsmanship, seasonality, exclusivity, and personal taste. A traditional vector search for "black evening bag" might return a list sorted by generic similarity. With PEM, an AI shopping agent could, in a single retrieval call:
- Boost scores for bags from the current haute couture collection.
- Modulate scores based on the customer's known preference for a specific house (e.g., Dior over Chanel).
- Down-weight or filter out items that are out of stock at the customer's preferred boutique.
- Blend the query for "black evening bag" with an embedding for "timeless classic" if the agent infers the customer is not trend-focused.
This moves search from static matching to dynamic, reasoning-enhanced retrieval.
2. Agentic Merchandising & Assortment Planning: AI agents tasked with analyzing market trends or planning assortments could use PEM to perform complex, multi-objective retrieval over internal catalogs and competitor analyses. For example, an agent could retrieve products that are "visually similar to competitor product X but have a higher sustainability score and are priced within our aspirational tier." This requires modulating the similarity score with arithmetic based on metadata attributes—a direct use case for PEM.
3. Operational Knowledge Retrieval: For internal systems, such as retrieving past CRM notes, supply chain documents, or artisan techniques, PEM allows agents to prioritize information by recency, department relevance, or problem type dynamically. A store manager's agent asking "how did we resolve this client complaint before?" could boost cases with similar product categories and high-resolution satisfaction scores.
The SQL interface is a particularly elegant fit for retail, where data teams are already fluent in SQL for analytics. Bridging the gap between vector operations and relational logic could significantly lower the barrier to implementing advanced, agent-driven search.
The Critical Caveat: This is a research paper, not a production library. The 82ms performance on 1M vectors is impressive for exact search on CPU but remains a benchmark on a specific corpus. Real-world luxury catalogs with multi-modal embeddings (high-resolution images, detailed text) would be larger and more complex. The transition from this promising kernel to a robust, enterprise-grade retrieval system is non-trivial.






