Retrieval-Augmented LLM Agents: Learning to Learn from Experience
A new research paper proposes a systematic framework to enhance the generalization capabilities of large language model (LLM) agents by combining supervised fine-tuning with training-free, memory-augmented generation using retrieved experience. The work, "Retrieval-Augmented LLM Agents: Learning to Learn from Experience," addresses a core limitation in current agent development: robust performance on tasks not seen during training.
The Core Problem: Generalization in LLM Agents
While LLMs have become the foundation for general-purpose agents, their ability to generalize to novel tasks remains inconsistent. Current methodologies typically fall into two categories:
- Supervised Fine-Tuning (SFT): Trains the model on a specific dataset of task demonstrations. While it can achieve high performance on seen tasks, it often fails to extrapolate effectively to new, unseen task distributions.
- Training-Free Experience Retrieval: Augments the LLM's context window with relevant past successful trajectories (sequences of actions and observations) retrieved from a memory bank. This approach is more flexible but frequently underperforms compared to supervised baselines, as the model is not explicitly trained to utilize this retrieved information effectively.
The paper posits that neither approach alone is sufficient for building agents that can reliably "learn to learn" from past experience.
What the Researchers Built: A Combined Training Pipeline
The core contribution is a pipeline that integrates experience retrieval directly into the fine-tuning process. The methodology is broken down into three systematic components:

- A Robust SFT Recipe: The researchers first established a strong supervised fine-tuning baseline using Low-Rank Adaptation (LoRA). This recipe was designed to outperform several existing state-of-the-art agent training pipelines, providing a solid foundation.
- Analysis of Experience Retrieval Design: The paper provides a detailed ablation study on the key design choices for a retrieval system:
- Storage: What format of successful trajectories (e.g., full interaction history, summarized steps) should be stored in the memory bank?
- Querying: How should the current task or state be embedded to retrieve the most relevant past experiences?
- Trajectory Selection: How many retrieved examples are optimal, and how should they be ranked or filtered before being placed in the context window?
The study identifies optimal strategies for each of these components.
- Integrated Fine-Tuning Pipeline: The final and key proposal is a training pipeline where the LLM agent is fine-tuned not just on task demonstrations, but on demonstrations that are augmented with retrieved relevant experiences. This teaches the model to condition its responses on both the task instruction and helpful in-context examples of similar past successes.
Key Results and Implications
The results demonstrate that the combined approach leads to significant improvements in generalization to unseen tasks compared to using either fine-tuning or experience retrieval in isolation. By training the model to leverage retrieved trajectories, the agent learns a more robust policy that can adapt to novel situations by analogizing to stored knowledge.

The framework is presented as scalable and effective, moving beyond the trade-off between specialization (via fine-tuning) and flexibility (via retrieval). It provides a concrete path toward agents that can continuously improve their performance by learning from their own expanding history of successful interactions.
Technical Context and Method
The work is situated within the growing field of retrieval-augmented generation (RAG) for agents, not just for question-answering. By using LoRA for efficient fine-tuning, the method remains parameter-efficient. The systematic analysis of retrieval design choices—storage, querying, selection—provides practical engineering guidance that has often been missing from prior work, which frequently treats the retrieval component as a black box.

The proposed pipeline essentially operationalizes meta-learning or "learning to learn" for LLM agents. The model is trained on a distribution of tasks where part of the learning objective is to effectively use provided in-context examples (retrieved experiences). This improves its ability to perform the same skill—leveraging examples—at test time on new tasks.
Paper: Ferraz, T. P. "Retrieval-Augmented LLM Agents: Learning to Learn from Experience." arXiv preprint arXiv:2603.18272 (2026).





