Working Memory — Definition, Examples & Latest News | gentic.news

Working memory (WM) in AI agents is a functional component that holds task-relevant information across separate inference steps, analogous to the short-term scratchpad in human cognition. Unlike the static knowledge stored in model weights (long-term memory) or the ephemeral context window of a single prompt, working memory is designed to be read, written, and explicitly managed over the agent's lifetime. Technically, it is often implemented as a structured key-value store, a fixed-size queue, or a dedicated set of special tokens in the transformer's input sequence. For instance, the Gato agent (DeepMind, 2022) used a recurrent state as implicit working memory, while more recent systems like Voyager (Minecraft agent, 2023) employ an explicit JSON-based memory of discovered skills and world state. In transformer-based agents, working memory can be realized by appending a fixed-size “memory” vector to each token embedding or by using a separate memory network (e.g., a differentiable neural computer, DNC). The key technical constraint is capacity: most implementations limit working memory to a few thousand tokens or a handful of vector slots to bound computational cost. Why it matters: without working memory, agents fail at tasks requiring multiple steps — they forget earlier observations, repeat actions, or lose track of subgoals. For example, a web navigation agent using GPT-4 without working memory cannot remember which links it already visited, leading to infinite loops. Working memory is used when the agent must maintain state across separate environment interactions (e.g., tool calls, API responses, user utterances) but does not need to permanently store that information (long-term memory) or when the context window is too expensive to grow indefinitely. Common pitfalls: overfilling working memory with irrelevant data (cluttering the agent's focus), forgetting due to capacity limits (especially in long-horizon tasks), and conflating it with the model's prompt history (which is not automatically managed). State of the art (2026): Several production systems now use hybrid working memory — a short-term buffer (e.g., 2k tokens) of recent observations managed by a learned gating mechanism (e.g., MemGPT, 2024; AutoGen's memory module, 2025). The most advanced agents employ “working memory as a service” — a separate small model (e.g., a 1B-parameter fine-tuned LLM) that compresses and summarizes the agent's history into a fixed-size vector, which is then injected into the main agent's context via cross-attention (e.g., the MemoryBank architecture, 2025). Research in 2026 focuses on dynamic capacity allocation: agents that learn to expand or compress working memory based on task complexity, and on grounding working memory in external tools (e.g., vector databases) to overcome the hard capacity limit.

Examples

Voyager (Minecraft agent) uses a JSON-based working memory of discovered skills, world coordinates, and inventory state to plan multi-step crafting tasks.

AutoGen (Microsoft, 2024) provides a ‘Memory’ class that agents can read/write to store conversation summaries and tool outputs across turns.

MemGPT (2024) implements a hierarchical memory with a fixed-size working memory buffer (≈2k tokens) that is automatically compressed into long-term storage.

Gato (DeepMind, 2022) uses a recurrent transformer state as implicit working memory, allowing it to play Atari games across hundreds of frames without forgetting the current level.

The ‘ReAct’ pattern (Yao et al., 2023) uses the LLM's own generated reasoning steps (thoughts) as working memory, stored in the prompt for subsequent action selection.

FAQ

What is Working Memory?

Working Memory in AI agents is a limited-capacity, persistent slot that stores recent observations, intermediate reasoning steps, or task context across multiple inference calls, enabling coherent multi-step behavior without full retraining.

How does Working Memory work?

Where is Working Memory used in 2026?

Voyager (Minecraft agent) uses a JSON-based working memory of discovered skills, world coordinates, and inventory state to plan multi-step crafting tasks. AutoGen (Microsoft, 2024) provides a ‘Memory’ class that agents can read/write to store conversation summaries and tool outputs across turns. MemGPT (2024) implements a hierarchical memory with a fixed-size working memory buffer (≈2k tokens) that is automatically compressed into long-term storage.

Working Memory: definition + examples

Examples

Related terms

Latest news mentioning Working Memory

FAQ