What Happened
Researchers have developed a novel method called structured distillation that dramatically compresses personalized AI agent conversation histories while preserving their utility for retrieval. The core problem is straightforward: as users engage in long conversations with AI agents (like coding assistants, creative collaborators, or customer service bots), the verbatim history becomes unwieldy and expensive to store and process in context windows. This paper presents a solution that distills each conversational exchange into a compact, structured object optimized for later search.
Technical Details
The method transforms each raw exchange in a conversation history into a compound object with four specific fields:
- exchange_core: The essential question and answer.
- specific_context: Key details unique to that exchange.
- thematic_room_assignments: Categorical tags linking the exchange to broader themes or topics.
- regex-extracted files_touched: Specific entities (like filenames, product SKUs, or project names) mentioned, extracted via regular expressions.
This structured distillation reduces the average exchange from 371 tokens to just 38 tokens, achieving an 11x compression ratio. The distilled text becomes a searchable "retrieval layer."
The research was rigorously evaluated on a dataset of 4,182 conversations (14,340 exchanges) from six software engineering projects. To test whether "personalized recall"—the ability to find relevant past information—survives compression, the team used 201 recall-oriented queries across 107 different search configurations. These configurations tested five "pure" search modes (using only distilled or only verbatim data) and five "cross-layer" modes (combining both). Results were graded by five different LLMs, creating 214,519 consensus-graded query-result pairs.
Key Findings:
- Retrieval Performance: The best configuration using only the distilled memory achieved 96% of the performance of the best verbatim baseline (Mean Reciprocal Rank/MRR of 0.717 vs. 0.745).
- Search Mechanism Dependence: Performance is highly dependent on the retrieval algorithm.
- Vector Search (Embeddings): All 20 tested configurations showed no statistically significant degradation after Bonferroni correction.
- BM25 (Keyword Search): All 20 configurations degraded significantly, with effect sizes ranging from |d|=0.031 to 0.756.
- Cross-Layer Potential: The best setup, which combined the distilled layer with access to the original verbatim data for final verification, slightly exceeded the best pure verbatim baseline (MRR 0.759).
The authors conclude that structured distillation allows for compressing single-user agent memory "without uniformly sacrificing retrieval quality." The implementation and analysis pipeline have been released as open-source software.
Retail & Luxury Implications
While the research was conducted in a software engineering context, the underlying technology—efficiently managing long-term, personalized interaction history with an AI agent—has direct parallels in high-touch retail and luxury.

1. The Personal Client Advisor Agent: Imagine an AI agent that acts as a 24/7 digital personal shopper for a VIP client. Over months or years, this agent would learn the client's style evolution, size changes, past purchases, disliked materials, family gift preferences, and event history (e.g., "bought a navy suit for the Monaco Grand Prix in 2024"). Storing every conversation verbatim would be prohibitive. Structured distillation could compress this rich history into a searchable profile a fraction of the size, enabling the agent to instantly recall that the client prefers "Italian silk ties for board meetings" or "avoids wool in humid climates" without re-reading thousands of tokens.
2. Hyper-Personalized Customer Service: For brands offering dedicated concierge services via chat, a compressed memory of each customer's entire support history—past issues, resolutions, product registrations, and preferences—could be made available to every agent (human or AI) in real-time. This creates seamless, context-aware service where the customer never has to repeat themselves.
3. Creative and Product Development Collaboration: Design teams might use AI agents as brainstorming partners. Distilling these creative sessions would allow the AI to maintain a coherent "memory" of the project's thematic evolution ("thematic_room_assignments"), rejected concepts ("specific_context"), and the final moodboard or material selections ("files_touched"), all searchable for future inspiration.
Critical Implementation Note for Retail: The research shows vector search is robust to this compression, while keyword (BM25) search is not. This strongly suggests that luxury brands looking to adopt such a system must have a mature embedding strategy. Success depends on moving beyond simple keyword matching to semantic understanding of client needs and product attributes.



