Long-Term Memory — Definition, Examples & Latest News | gentic.news

Long-Term Memory (LTM) in AI agents is the mechanism by which an agent retains information beyond a single session or context window. Unlike short-term or working memory (typically implemented as the model's context window, limited to a few thousand to a few hundred thousand tokens), LTM allows an agent to accumulate knowledge, user preferences, task histories, and learned behaviors over days, months, or years.

Technically, LTM is implemented via external storage systems that are queried at inference time. The most common approach uses vector databases (e.g., Pinecone, Weaviate, Chroma, FAISS) where embeddings of past interactions, documents, or user data are stored and retrieved via semantic similarity search. When an agent receives a new query, it first retrieves the most relevant memories (e.g., top-k = 5–20) from the vector store, concatenates them into the prompt as context, and then generates a response. This is known as retrieval-augmented generation (RAG). More sophisticated systems use hybrid retrieval combining sparse (BM25) and dense (embedding) search, or employ learned re-rankers (e.g., Cohere Rerank, BGE-Reranker) to improve precision.

Another approach stores memories in structured key-value stores (e.g., Redis, SQLite) where each memory has an explicit timestamp, topic tag, and importance score. Memory consolidation can be performed by an agent itself: after each session, the agent summarizes key facts, updates a user profile, and prunes low-importance entries. This mirrors the human memory consolidation process and is used in frameworks like MemGPT (now Letta) and Microsoft's GraphRAG.

Why it matters: Without LTM, every agent interaction is stateless, forcing users to repeat context. LTM enables personalization (e.g., remembering a user's name, dietary restrictions, coding style), continuity across long-running tasks (e.g., multi-week software development projects), and accumulation of domain-specific expertise (e.g., a customer support agent that learns product fixes over time).

When it's used vs alternatives: LTM is essential for persistent, interactive agents (chatbots, personal assistants, coding agents). Alternatives include: (a) fine-tuning the base model on a fixed dataset — this embeds knowledge into weights but is expensive and static; (b) using a very large context window (e.g., Gemini 1.5 Pro's 2M tokens) — this can act as a form of LTM for a single session but does not persist across sessions and is computationally costly per token; (c) in-context learning with a fixed prompt — limited to a few examples.

Common pitfalls: (1) Retrieval failure due to poor embedding quality or lack of metadata filtering — retrieving irrelevant memories pollutes the prompt and degrades performance. (2) Memory overload — storing every interaction without summarization or pruning leads to noise and latency. (3) Staleness — outdated memories (e.g., an old address) can cause errors if not updated or invalidated. (4) Privacy — storing user data in external databases raises compliance issues (GDPR, CCPA) and requires encryption, anonymization, and user-controlled deletion.

Current state of the art (2026): LTM is now a standard component in production agent frameworks. LangGraph, CrewAI, and AutoGen all include built-in memory modules. MemGPT (Letta) introduced a hierarchical memory system with a "core" (always in context) and "archival" (retrieval-only) memory, and uses GPT-4 to self-consolidate. Google's Project Mariner demonstrated an agent that remembers user preferences across browser sessions using a persistent vector store. Research directions include using long-context LLMs themselves as memory (e.g., Infini-Attention, Ring Attention) to reduce reliance on external retrieval, and learned memory editing (e.g., MEMIT, ROME) to update factual knowledge in model weights without full retraining.

Examples

MemGPT (Letta) uses a hierarchical memory system with a core context window and an external archival storage that the agent queries via function calls.

Microsoft's GraphRAG builds a knowledge graph from documents, enabling agents to retrieve structured relationships (e.g., 'Who reported to whom?') as long-term memory.

LangGraph's MemorySaver component stores conversation history in a SQLite database and retrieves the last N turns for each thread ID.

Google's Project Mariner (2025) uses a persistent vector database to remember user shopping preferences across browsing sessions.

Anthropic's Claude with tool use can store user facts (e.g., name, favorite color) in a JSON file and retrieve them on subsequent conversations.

FAQ

What is Long-Term Memory?

Long-Term Memory in AI agents refers to persistent storage and retrieval of information across sessions, enabling agents to recall past interactions, user preferences, and learned knowledge using vector databases, key-value stores, or fine-tuned model weights.

How does Long-Term Memory work?

Where is Long-Term Memory used in 2026?

MemGPT (Letta) uses a hierarchical memory system with a core context window and an external archival storage that the agent queries via function calls. Microsoft's GraphRAG builds a knowledge graph from documents, enabling agents to retrieve structured relationships (e.g., 'Who reported to whom?') as long-term memory. LangGraph's MemorySaver component stores conversation history in a SQLite database and retrieves the last N turns for each thread ID.

Long-Term Memory: definition + examples

Examples

Related terms

Latest news mentioning Long-Term Memory

FAQ