memory & context

30 articles about memory & context in AI news

DeepSeek-V4 Hits 500K Context with 90% Less KV Cache via FlashMemory

DeepSeek-V4 achieves 500K context with 90% less KV cache via FlashMemory's lookahead sparse attention, keeping only 13.5% of cache in GPU memory without retraining.

Jun 9, 202698% relevant

Engramme Building 'Large Memory Models' to Surface Personal Context

Engramme, founded by Gabriel Kreiman, is developing 'Large Memory Models' (LMMs) designed to connect to a user's digital life and surface relevant context without explicit prompting. The goal is to augment human memory by making personal data available at the right moment.

Apr 9, 202685% relevant

Memory Sparse Attention (MSA) Achieves 100M Token Context with Near-Linear Complexity

A new attention architecture, Memory Sparse Attention (MSA), breaks the 100M token context barrier while maintaining 94% accuracy at 1M tokens. It uses document-wise RoPE and end-to-end sparse attention to outperform RAG systems and frontier models.

Mar 29, 202695% relevant

Stop Pasting Context: Add Persistent Memory to Claude Code with Bossa MCP

Bossa MCP gives Claude Code persistent filesystem memory across sessions, eliminating repetitive context pasting and enabling smarter progressive disclosure.

Mar 22, 202697% relevant

Memory Sparse Attention (MSA) Enables 100M Token Context Windows with Minimal Performance Loss

Memory Sparse Attention (MSA) is a proposed architecture that allows AI models to store and reason over massive long-term memory directly within their attention mechanism, eliminating the need for external retrieval systems. The approach reportedly enables context windows of up to 100 million tokens with minimal performance degradation.

Mar 21, 202685% relevant

Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core

Hermes agent's three-tier memory uses two tiny markdown files (2,200 chars), SQLite FTS5 search (10ms over 10K docs), and 8 pluggable providers. The composition solves the always-on vs. deep recall trade-off.

May 14, 202691% relevant

Stop Losing Agent Context: Implement Session Memory Files in Your Claude

A simple pattern using structured markdown files to persist session state across context windows, preventing Claude Code agents from redoing work or making inconsistent decisions.

Apr 22, 2026100% relevant

Codex 'Chronicle' Research Preview Adds Memory for Daily Developer Context

A research preview of 'Chronicle' for Codex has been released. It enables the AI coding assistant to accumulate memories from a developer's daily workflow to improve context.

Apr 20, 202693% relevant

Anthropic Democratizes AI Memory: Claude's Free Tier Gets Contextual Recall

Anthropic has expanded access to Claude's memory feature, making it available to all free users. This strategic move coincides with new tools to import conversations from rival chatbots, positioning Claude as a more personalized and sticky alternative in the competitive AI assistant market.

Mar 2, 202675% relevant

Beyond Single Prompts: How 'Codified Context' Solves AI's Memory Problem in Large-Scale Development

A new research paper reveals why single-file AI agent instructions fail for complex projects and introduces a three-tier memory architecture that successfully managed a 108,000-line distributed system. The approach replaces simple prompts with structured, evolving documentation that becomes load-bearing infrastructure for AI development.

Feb 28, 202685% relevant

EvoEmbedding Beats Static Embedders 3× Larger via Latent Memory Queue

EvoEmbedding uses a latent memory queue to beat static embedders 3× its size on long-context retrieval, per @HuggingPapers.

Jun 27, 202685% relevant

Qwen-Image-Agent: Alibaba's Agentic Framework for Context-Aware Image Gen

Alibaba's Qwen-Image-Agent uses planning, reasoning, search, and memory to build context for text-to-image models, bridging the context gap in real-world generation.

Jun 26, 202687% relevant

HydraDB Raises $6.5M for Persistent Agent Memory, Solving the Session Gap

HydraDB raised $6.5M for persistent agent memory, solving the session-gap problem context windows ignored. The round signals memory as a startup thesis.

Jun 1, 202678% relevant

Sleep Phase Cuts Transformer Costs by Consolidating Memory

Paper proposes sleep phase to consolidate context into fixed-size memory, reducing inference cost while improving long-horizon task performance on GSM-Infinite.

May 28, 202684% relevant

CLAUDE.md Explained: How Anthropic's Agent Memory Works

CLAUDE.md is Anthropic's project config file for Claude Code, now two years old with settled best practices for agent memory and context.

May 12, 202695% relevant

Mind: Open-Source Persistent Memory for AI Coding Agents

An open-source tool called Mind creates a shared memory layer for AI coding agents, allowing them to remember project context across sessions and different interfaces like Claude Code, Cursor, and Windsurf.

Apr 12, 202685% relevant

Memory Systems for AI Agents: Architectures, Frameworks, and Challenges

A technical analysis details the multi-layered memory architectures—short-term, episodic, semantic, procedural—required to transform stateless LLMs into persistent, reliable AI agents. It compares frameworks like MemGPT and LangMem that manage context limits and prevent memory drift.

Apr 5, 202695% relevant

How to Run Claude Code 24/7 Without Burning Your Context Window

Implement a hard 50K token session cap and a three-tier memory system (daily notes, MEMORY.md, PARA knowledge graph) to prevent context bloat and memory decay in long-running Claude Code agents.

Apr 3, 202695% relevant

MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization

Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.

Mar 30, 202674% relevant

Add Vector Memory to Claude Code: The claude-memory-mcp Server Solves CLAUDE.md's 200-Line Limit

Install this open-source MCP server to give Claude Code persistent, searchable memory across projects. It surfaces only relevant context, solving CLAUDE.md's scaling problems.

Mar 26, 202695% relevant

Add Persistent Memory to Claude Code in 5 Minutes with memoclaw-mcp

Stop re-explaining your preferences. Install the memoclaw-mcp server to give Claude Code persistent, semantic memory across sessions using the Model Context Protocol.

Mar 25, 202695% relevant

Google's TurboQuant Cuts LLM KV Cache Memory by 6x, Enables 3-Bit Storage Without Accuracy Loss

Google released TurboQuant, a novel two-stage quantization algorithm that compresses the KV cache in long-context LLMs. It reduces memory by 6x, achieves 3-bit storage with no accuracy drop, and speeds up attention scoring by up to 8x on H100 GPUs.

Mar 25, 202695% relevant

Context Graph for Agentic Coding: A New Abstraction for LLM-Powered Development

A new "context graph" abstraction is emerging for AI coding agents, designed to manage project state and memory across sessions. It aims to solve the persistent context problem in long-running development tasks.

Mar 23, 202689% relevant

The File Paradigm: How Simple File Systems Could Revolutionize AI Context Management

New research proposes treating all AI context as files within a unified system, potentially solving memory and organization challenges in complex AI workflows. This approach could dramatically simplify how AI systems access and manage information.

Mar 8, 202685% relevant

The Unix Philosophy Returns: How File Systems Could Solve AI's Memory Crisis

A new research paper proposes treating AI context management like a Unix file system, with OpenClaw demonstrating that storing memory, tools, and knowledge as files creates traceable, auditable AI systems. This approach could solve fragmentation and transparency issues plaguing current agent frameworks.

Mar 7, 202685% relevant

Google's 'Always-On Memory Agent' Could Revolutionize How AI Remembers and Learns

Google has unveiled an experimental 'Always-On Memory Agent' system that gives AI persistent, evolving memory capabilities. This breakthrough could transform how AI assistants learn from continuous interactions and maintain context across sessions.

Mar 6, 202685% relevant

Anthropic's Memory Transfer Feature Escalates AI Personalization Race

Anthropic has launched a memory feature allowing users to transfer context and preferences from other AI tools directly into Claude. This enables seamless continuation of conversations with retained context across platforms, available to all paid subscribers.

Mar 1, 202685% relevant

Claude Code Gains Auto-Memory: A Game-Changer for AI-Assisted Programming

Anthropic's Claude Code now features auto-memory capabilities, allowing the AI to retain context across coding sessions. This breakthrough addresses a fundamental limitation in AI programming assistants by creating persistent memory of project details, preferences, and patterns.

Feb 27, 202685% relevant

Claude Code's Auto-Memory: The AI Assistant That Remembers Your Entire Project

Anthropic's Claude Code now features auto-memory capabilities, allowing the AI coding assistant to retain context across sessions and recall project details automatically. This breakthrough addresses a fundamental limitation of current AI tools and could transform developer workflows.

Feb 26, 202685% relevant

Physics-Inspired AI Memory: How Continuous Fields Could Solve AI's Forgetting Problem

Researchers have developed a revolutionary memory system for AI agents that treats information as continuous fields governed by physics-inspired equations rather than discrete database entries. The approach shows dramatic improvements in long-context reasoning, with +116% performance on multi-session tasks and near-perfect collective intelligence in multi-agent scenarios.

Feb 26, 202675% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety