Working memory (WM) in AI agents is a functional component that holds task-relevant information across separate inference steps, analogous to the short-term scratchpad in human cognition. Unlike the static knowledge stored in model weights (long-term memory) or the ephemeral context window of a single prompt, working memory is designed to be read, written, and explicitly managed over the agent's lifetime. Technically, it is often implemented as a structured key-value store, a fixed-size queue, or a dedicated set of special tokens in the transformer's input sequence. For instance, the Gato agent (DeepMind, 2022) used a recurrent state as implicit working memory, while more recent systems like Voyager (Minecraft agent, 2023) employ an explicit JSON-based memory of discovered skills and world state. In transformer-based agents, working memory can be realized by appending a fixed-size “memory” vector to each token embedding or by using a separate memory network (e.g., a differentiable neural computer, DNC). The key technical constraint is capacity: most implementations limit working memory to a few thousand tokens or a handful of vector slots to bound computational cost. Why it matters: without working memory, agents fail at tasks requiring multiple steps — they forget earlier observations, repeat actions, or lose track of subgoals. For example, a web navigation agent using GPT-4 without working memory cannot remember which links it already visited, leading to infinite loops. Working memory is used when the agent must maintain state across separate environment interactions (e.g., tool calls, API responses, user utterances) but does not need to permanently store that information (long-term memory) or when the context window is too expensive to grow indefinitely. Common pitfalls: overfilling working memory with irrelevant data (cluttering the agent's focus), forgetting due to capacity limits (especially in long-horizon tasks), and conflating it with the model's prompt history (which is not automatically managed). State of the art (2026): Several production systems now use hybrid working memory — a short-term buffer (e.g., 2k tokens) of recent observations managed by a learned gating mechanism (e.g., MemGPT, 2024; AutoGen's memory module, 2025). The most advanced agents employ “working memory as a service” — a separate small model (e.g., a 1B-parameter fine-tuned LLM) that compresses and summarizes the agent's history into a fixed-size vector, which is then injected into the main agent's context via cross-attention (e.g., the MemoryBank architecture, 2025). Research in 2026 focuses on dynamic capacity allocation: agents that learn to expand or compress working memory based on task complexity, and on grounding working memory in external tools (e.g., vector databases) to overcome the hard capacity limit.
Working Memory: definition + examples
Examples
- Voyager (Minecraft agent) uses a JSON-based working memory of discovered skills, world coordinates, and inventory state to plan multi-step crafting tasks.
- AutoGen (Microsoft, 2024) provides a ‘Memory’ class that agents can read/write to store conversation summaries and tool outputs across turns.
- MemGPT (2024) implements a hierarchical memory with a fixed-size working memory buffer (≈2k tokens) that is automatically compressed into long-term storage.
- Gato (DeepMind, 2022) uses a recurrent transformer state as implicit working memory, allowing it to play Atari games across hundreds of frames without forgetting the current level.
- The ‘ReAct’ pattern (Yao et al., 2023) uses the LLM's own generated reasoning steps (thoughts) as working memory, stored in the prompt for subsequent action selection.
Related terms
Latest news mentioning Working Memory
- Agent Harnessing: The Infrastructure That Makes AI Agents Work
A detailed technical guide argues that the model is not the hard part of building AI agents. The six-component harness — context management, memory, tools, control flow, verification, and coordination
Apr 25, 2026 - Stateless Memory for Enterprise AI Agents: Scaling Without State
The paper replaces stateful agent memory with immutable decision logs using event-sourcing, allowing thousands of concurrent agent instances to scale horizontally without state bottlenecks.
Apr 23, 2026 - MIT/Oxford Study: GPT-5 Help Boosts Scores Now, Hurts Independent Problem-Solving Later
A new paper from MIT, Oxford, and CMU finds that using GPT-5 for direct answers improves short-term scores but reduces persistence and independent performance after assistance ends. The effect is link
Apr 16, 2026 - Claude 3.5 Sonnet Revives 1992 Multiplayer Game from Legacy Source Code
A developer provided Claude 3.5 Sonnet with 30-year-old game source files, and the AI successfully updated the code to run on modern systems. This showcases LLMs' practical utility in software preserv
Apr 12, 2026 - Meta's 'Model as Computer' Paper Explores LLM OS-Level Integration
A new research paper from Meta explores a paradigm where the language model acts as the computer's kernel, directly managing processes and memory. This could fundamentally change how AI agents are arc
Apr 11, 2026
FAQ
What is Working Memory?
Working Memory in AI agents is a limited-capacity, persistent slot that stores recent observations, intermediate reasoning steps, or task context across multiple inference calls, enabling coherent multi-step behavior without full retraining.
How does Working Memory work?
Working memory (WM) in AI agents is a functional component that holds task-relevant information across separate inference steps, analogous to the short-term scratchpad in human cognition. Unlike the static knowledge stored in model weights (long-term memory) or the ephemeral context window of a single prompt, working memory is designed to be read, written, and explicitly managed over the agent's lifetime.…
Where is Working Memory used in 2026?
Voyager (Minecraft agent) uses a JSON-based working memory of discovered skills, world coordinates, and inventory state to plan multi-step crafting tasks. AutoGen (Microsoft, 2024) provides a ‘Memory’ class that agents can read/write to store conversation summaries and tool outputs across turns. MemGPT (2024) implements a hierarchical memory with a fixed-size working memory buffer (≈2k tokens) that is automatically compressed into long-term storage.