memory
30 articles about memory in AI news
Memory as a Model: Augmenting LLMs with Trained Memory
Paper augments LLMs with trained memory for long-term recall. Model-agnostic approach stores external knowledge without retraining.
Neo4j's agent-memory: Open-source unified memory for AI agents via knowledge graphs
Neo4j releases agent-memory, an open-source unified memory layer for AI agents using knowledge graphs, enabling persistent structured recall.
Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core
Hermes agent's three-tier memory uses two tiny markdown files (2,200 chars), SQLite FTS5 search (10ms over 10K docs), and 8 pluggable providers. The composition solves the always-on vs. deep recall trade-off.
GBrain: Garry Tan's Agent Memory Uses Markdown as System of Record
GBrain is Garry Tan's agent memory system using markdown as the system of record, with a self-wiring knowledge graph and overnight dream cycle.
CLAUDE.md Explained: How Anthropic's Agent Memory Works
CLAUDE.md is Anthropic's project config file for Claude Code, now two years old with settled best practices for agent memory and context.
Roundhill Memory ETF (DRAM) Surges 90% in 36 Days, Fastest ETF Ever
Roundhill Memory ETF surged 90% since April 2, hitting $6.5B assets in 36 days—fastest ETF ever—driven by AI demand for DRAM.
MNEMA: A Witness Lattice for Multi-Agent AI Memory
Today's agentic AI fails three ways: agents miscoordinate, memory gets quietly poisoned, and decisions can't be audited. A new EUMAS 2026 submission argues the fix is to stop treating memory as static records. Make it *living* — every memory unit becomes an autonomous cryptographic witness that interacts with other witnesses (agree, disagree, give birth to new witnesses, split, coalesce, retire), and decisions emerge from a fixed signed protocol rather than from a single orchestrator.
Large Memory Models: New Architecture Beyond RAG and Vector Search
Researchers with 160+ Nature and ICLR publications have built Large Memory Models (LMMs), a new architecture designed to emulate human memory processes, offering an alternative to RAG and vector search paradigms.
AI Memory Survey: Three Systems Needed for Human-Like Recall
A new survey paper proposes that modern AI requires three distinct memory systems—parametric, retrieval, and agent memory—to achieve human-like cognition, highlighting control as the key bottleneck.
Stateless Memory for Enterprise AI Agents: Scaling Without State
The paper replaces stateful agent memory with immutable decision logs using event-sourcing, allowing thousands of concurrent agent instances to scale horizontally without state bottlenecks.
Replace Karpathy's Agent Memory Automation with This 30-Line /close-day Hook
Background automation fails on laptops; use a simple /close-day skill and date tags in MEMORY.md instead.
OpenAI Codex Update Adds macOS Agent, Browser, Memory; 3M Weekly Users
OpenAI released a major Codex update featuring background macOS automation, an in-app browser, persistent memory, and 90+ plugins. With 3M weekly users and nearly half of usage now non-coding, Codex is being repositioned as a general work agent.
Microsoft's MEMENTO Method Reduces LLM Reasoning Memory by 3x
Microsoft researchers introduced MEMENTO, a method where LLMs generate structured 'notes' during multi-step reasoning, reducing the memory footprint of the reasoning process by 3x while maintaining performance. This addresses a key bottleneck in deploying complex reasoning models.
Google's Memory Caching Bridges RNN-Transformer Gap with O(NL) Complexity
Google's 'Memory Caching' method saves RNN memory states at segment boundaries, allowing tokens to reference past checkpoints. This O(NL) approach significantly improves RNN performance on recall tasks, narrowing the gap with Transformers.
Cognee Open-Source Framework Unifies Vector, Graph, and Relational Memory for AI Agents
Developer Akshay Pachaar argues AI agent memory requires three data stores—vector, graph, and relational—to handle semantics, relationships, and provenance. His open-source project Cognee unifies them behind a simple API.
Nvidia to Ship 1.19 Exabytes of HBM in 2026, Apple iPhone Memory 2x Larger
An analysis projects Nvidia will ship ~1.19 exabytes of HBM memory in 2026 for AI infrastructure, while Apple will ship ~2.4 exabytes of LPDDR5 for iPhones, putting AI's massive hardware scale in consumer market perspective.
Claude-Mem Plugin Adds Persistent Memory to Claude Code, Cuts Token Use 10x
Developer Akshay Pachaar released Claude-Mem, a free plugin that adds persistent memory across Claude Code sessions. It captures tool usage and implements a 3-layer retrieval system, saving up to 10x tokens.
Karpathy's LLM Wiki Hits 5k Stars, Gains Memory Lifecycle Extension
Andrej Karpathy's LLM Wiki repository gained 5,000 GitHub stars in two days. A developer has now extended it with memory lifecycle features, addressing a noted gap.
Mind: Open-Source Persistent Memory for AI Coding Agents
An open-source tool called Mind creates a shared memory layer for AI coding agents, allowing them to remember project context across sessions and different interfaces like Claude Code, Cursor, and Windsurf.
Build a Self-Improving Memory Layer for Claude Code with Hooks and RAG
Implement automatic hooks to capture Claude Code's work into a ChromaDB vector store and a CLAUDE.md file, creating a persistent, searchable memory for your project.
MemPalace Hits 96.6% on LongMemEval, Beats Paid AI Memory Tools
MemPalace, an open-source AI memory system built by actress Milla Jovovich and developer Ben Sigman, achieved 96.6% on the LongMemEval benchmark—the highest local-only score ever recorded—using a memory palace architecture that stores all conversations verbatim.
Engramme Building 'Large Memory Models' to Surface Personal Context
Engramme, founded by Gabriel Kreiman, is developing 'Large Memory Models' (LMMs) designed to connect to a user's digital life and surface relevant context without explicit prompting. The goal is to augment human memory by making personal data available at the right moment.
MIA Framework Boosts GPT-5.4 by 9% on LiveVQA with Bidirectional Memory
Researchers introduced Memory Intelligence Agent (MIA), a framework combining parametric and non-parametric memory with test-time learning. It boosts GPT-5.4 by up to 9% on LiveVQA and achieves 31% average improvement across 11 benchmarks.
Nous Research's Hermes Agent Features Self-Improving Skills, Persistent Memory
A new evaluation of Nous Research's Hermes Agent highlights its self-improving ability to build reusable tools from experience and a smarter persistent memory system that conserves token usage. The agent reportedly improves with continued use, representing a shift towards more adaptive AI systems.
Memory Systems for AI Agents: Architectures, Frameworks, and Challenges
A technical analysis details the multi-layered memory architectures—short-term, episodic, semantic, procedural—required to transform stateless LLMs into persistent, reliable AI agents. It compares frameworks like MemGPT and LangMem that manage context limits and prevent memory drift.
Building a Memory Layer for a Voice AI Agent: A Developer's Blueprint
A developer shares a technical case study on building a voice-first journal app, focusing on the critical memory layer. The article details using Redis Agent Memory Server for working/long-term memory and key latency optimizations like streaming APIs and parallel fetches to meet voice's strict responsiveness demands.
MemFactory Framework Unifies Agent Memory Training & Inference, Reports 14.8% Gains Over Baselines
Researchers introduced MemFactory, a unified framework treating agent memory as a trainable component. It supports multiple memory paradigms and shows up to 14.8% relative improvement over baseline methods.
MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization
Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.
Memory Sparse Attention (MSA) Achieves 100M Token Context with Near-Linear Complexity
A new attention architecture, Memory Sparse Attention (MSA), breaks the 100M token context barrier while maintaining 94% accuracy at 1M tokens. It uses document-wise RoPE and end-to-end sparse attention to outperform RAG systems and frontier models.
Google's TurboQuant Compresses LLM KV Cache 6x with Zero Accuracy Loss, Cutting GPU Memory by 80%
Google researchers introduced TurboQuant, a method that compresses LLM KV cache from 32-bit to 3-bit precision without accuracy degradation. This reduces GPU memory consumption by over 80% and speeds up inference 8x on H100 GPUs.