memory & context
30 articles about memory & context in AI news
Engramme Building 'Large Memory Models' to Surface Personal Context
Engramme, founded by Gabriel Kreiman, is developing 'Large Memory Models' (LMMs) designed to connect to a user's digital life and surface relevant context without explicit prompting. The goal is to augment human memory by making personal data available at the right moment.
Memory Sparse Attention (MSA) Achieves 100M Token Context with Near-Linear Complexity
A new attention architecture, Memory Sparse Attention (MSA), breaks the 100M token context barrier while maintaining 94% accuracy at 1M tokens. It uses document-wise RoPE and end-to-end sparse attention to outperform RAG systems and frontier models.
Stop Pasting Context: Add Persistent Memory to Claude Code with Bossa MCP
Bossa MCP gives Claude Code persistent filesystem memory across sessions, eliminating repetitive context pasting and enabling smarter progressive disclosure.
Memory Sparse Attention (MSA) Enables 100M Token Context Windows with Minimal Performance Loss
Memory Sparse Attention (MSA) is a proposed architecture that allows AI models to store and reason over massive long-term memory directly within their attention mechanism, eliminating the need for external retrieval systems. The approach reportedly enables context windows of up to 100 million tokens with minimal performance degradation.
Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core
Hermes agent's three-tier memory uses two tiny markdown files (2,200 chars), SQLite FTS5 search (10ms over 10K docs), and 8 pluggable providers. The composition solves the always-on vs. deep recall trade-off.
Stop Losing Agent Context: Implement Session Memory Files in Your Claude
A simple pattern using structured markdown files to persist session state across context windows, preventing Claude Code agents from redoing work or making inconsistent decisions.
Codex 'Chronicle' Research Preview Adds Memory for Daily Developer Context
A research preview of 'Chronicle' for Codex has been released. It enables the AI coding assistant to accumulate memories from a developer's daily workflow to improve context.
Anthropic Democratizes AI Memory: Claude's Free Tier Gets Contextual Recall
Anthropic has expanded access to Claude's memory feature, making it available to all free users. This strategic move coincides with new tools to import conversations from rival chatbots, positioning Claude as a more personalized and sticky alternative in the competitive AI assistant market.
Beyond Single Prompts: How 'Codified Context' Solves AI's Memory Problem in Large-Scale Development
A new research paper reveals why single-file AI agent instructions fail for complex projects and introduces a three-tier memory architecture that successfully managed a 108,000-line distributed system. The approach replaces simple prompts with structured, evolving documentation that becomes load-bearing infrastructure for AI development.
Sleep Phase Cuts Transformer Costs by Consolidating Memory
Paper proposes sleep phase to consolidate context into fixed-size memory, reducing inference cost while improving long-horizon task performance on GSM-Infinite.
CLAUDE.md Explained: How Anthropic's Agent Memory Works
CLAUDE.md is Anthropic's project config file for Claude Code, now two years old with settled best practices for agent memory and context.
Mind: Open-Source Persistent Memory for AI Coding Agents
An open-source tool called Mind creates a shared memory layer for AI coding agents, allowing them to remember project context across sessions and different interfaces like Claude Code, Cursor, and Windsurf.
Memory Systems for AI Agents: Architectures, Frameworks, and Challenges
A technical analysis details the multi-layered memory architectures—short-term, episodic, semantic, procedural—required to transform stateless LLMs into persistent, reliable AI agents. It compares frameworks like MemGPT and LangMem that manage context limits and prevent memory drift.
How to Run Claude Code 24/7 Without Burning Your Context Window
Implement a hard 50K token session cap and a three-tier memory system (daily notes, MEMORY.md, PARA knowledge graph) to prevent context bloat and memory decay in long-running Claude Code agents.
MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization
Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.
Add Vector Memory to Claude Code: The claude-memory-mcp Server Solves CLAUDE.md's 200-Line Limit
Install this open-source MCP server to give Claude Code persistent, searchable memory across projects. It surfaces only relevant context, solving CLAUDE.md's scaling problems.
Add Persistent Memory to Claude Code in 5 Minutes with memoclaw-mcp
Stop re-explaining your preferences. Install the memoclaw-mcp server to give Claude Code persistent, semantic memory across sessions using the Model Context Protocol.
Google's TurboQuant Cuts LLM KV Cache Memory by 6x, Enables 3-Bit Storage Without Accuracy Loss
Google released TurboQuant, a novel two-stage quantization algorithm that compresses the KV cache in long-context LLMs. It reduces memory by 6x, achieves 3-bit storage with no accuracy drop, and speeds up attention scoring by up to 8x on H100 GPUs.
Context Graph for Agentic Coding: A New Abstraction for LLM-Powered Development
A new "context graph" abstraction is emerging for AI coding agents, designed to manage project state and memory across sessions. It aims to solve the persistent context problem in long-running development tasks.
The File Paradigm: How Simple File Systems Could Revolutionize AI Context Management
New research proposes treating all AI context as files within a unified system, potentially solving memory and organization challenges in complex AI workflows. This approach could dramatically simplify how AI systems access and manage information.
The Unix Philosophy Returns: How File Systems Could Solve AI's Memory Crisis
A new research paper proposes treating AI context management like a Unix file system, with OpenClaw demonstrating that storing memory, tools, and knowledge as files creates traceable, auditable AI systems. This approach could solve fragmentation and transparency issues plaguing current agent frameworks.
Google's 'Always-On Memory Agent' Could Revolutionize How AI Remembers and Learns
Google has unveiled an experimental 'Always-On Memory Agent' system that gives AI persistent, evolving memory capabilities. This breakthrough could transform how AI assistants learn from continuous interactions and maintain context across sessions.
Anthropic's Memory Transfer Feature Escalates AI Personalization Race
Anthropic has launched a memory feature allowing users to transfer context and preferences from other AI tools directly into Claude. This enables seamless continuation of conversations with retained context across platforms, available to all paid subscribers.
Claude Code Gains Auto-Memory: A Game-Changer for AI-Assisted Programming
Anthropic's Claude Code now features auto-memory capabilities, allowing the AI to retain context across coding sessions. This breakthrough addresses a fundamental limitation in AI programming assistants by creating persistent memory of project details, preferences, and patterns.
Claude Code's Auto-Memory: The AI Assistant That Remembers Your Entire Project
Anthropic's Claude Code now features auto-memory capabilities, allowing the AI coding assistant to retain context across sessions and recall project details automatically. This breakthrough addresses a fundamental limitation of current AI tools and could transform developer workflows.
Physics-Inspired AI Memory: How Continuous Fields Could Solve AI's Forgetting Problem
Researchers have developed a revolutionary memory system for AI agents that treats information as continuous fields governed by physics-inspired equations rather than discrete database entries. The approach shows dramatic improvements in long-context reasoning, with +116% performance on multi-session tasks and near-perfect collective intelligence in multi-agent scenarios.
NVIDIA's Memory Compression Breakthrough: How Forgetting Makes LLMs Smarter
NVIDIA researchers have developed Dynamic Memory Sparsification, a technique that compresses LLM working memory by 8× while improving reasoning capabilities. This counterintuitive approach addresses the critical KV cache bottleneck in long-context AI applications.
Beyond the Token Limit: How Claude Opus 4.6's Architectural Breakthrough Enables True Long-Context Reasoning
Anthropic's Claude Opus 4.6 represents a fundamental shift in large language model architecture, moving beyond simple token expansion to create genuinely autonomous reasoning systems. The breakthrough enables practical use of million-token contexts through novel memory management and hierarchical processing.
Memory as a Model: Augmenting LLMs with Trained Memory
Paper augments LLMs with trained memory for long-term recall. Model-agnostic approach stores external knowledge without retraining.
Neo4j's agent-memory: Open-source unified memory for AI agents via knowledge graphs
Neo4j releases agent-memory, an open-source unified memory layer for AI agents using knowledge graphs, enabling persistent structured recall.