Tencent open-sourced TencentDB Agent Memory, a long-term memory system for AI agents that cuts token usage by 61.38% on the WideSearch benchmark. The system runs fully local with zero external API dependencies, compressing conversation history into a 4-tier semantic pyramid.
Key facts
- Token consumption down 61.38% on WideSearch.
- Task success rate up 51.52% on WideSearch.
- PersonaMem accuracy jumped from 48% to 76%.
- Benchmarks measured over 50 consecutive tasks each.
- Runs fully local with zero external API dependencies.
Tencent has released TencentDB Agent Memory, an open-source long-term memory framework designed to solve the context-window bottleneck that plagues multi-turn AI agents. According to the announcement via @hasantoxr, the system cuts token consumption by 61.38% while improving task success rate by 51.52% on the WideSearch benchmark. On the PersonaMem evaluation, accuracy jumped from 48% to 76%.
Four-tier semantic pyramid

The architecture organizes agent experience into four hierarchical layers. L0 stores raw conversation logs. L1 extracts atomic facts from those logs. L2 groups related facts into scene blocks. L3 distills everything into a full user persona. The agent reads the persona first and drills down to raw logs only when verifying a specific detail. "Upper layers carry judgment. Lower layers carry evidence," the documentation states.
Short-term memory compression
For short-term memory, the system compresses heavy tool logs into Mermaid symbol graphs. Instead of thousands of tokens of verbose output sitting in context, agents receive a lightweight node map. The agent navigates using node IDs, pulling the full raw text only when an error occurs.
Benchmark rigor
The reported gains come from continuous long-horizon sessions running 50 consecutive tasks each, not isolated single-turn evaluations. This matters because real-world agent deployments — coding assistants, customer support bots, research copilots — accumulate context over dozens of interactions. Most benchmarks test only single-turn or short-horizon scenarios, making Tencent's results more representative of production conditions.
Unique take
Tencent's approach inverts the dominant trend in agent memory: instead of paying for larger context windows (Gemini 2M tokens, Claude 200K), it compresses aggressively on-device. This is a contrarian bet that local compression — not infinite context — is the cheaper, more scalable path. If the benchmarks hold, it suggests the industry's race to billion-token context windows may be a detour, not a destination. The system's zero-external-API constraint also makes it viable for privacy-sensitive deployments (healthcare, finance, government) that cannot send conversation logs to third-party model providers.
Limitations
Tencent has not disclosed the underlying model used in the benchmarks, the hardware configuration for local inference, or whether the compression introduces latency tradeoffs. The system's effectiveness may vary with task complexity and conversation length beyond 50 turns.
What to watch
Watch for independent replication of the WideSearch and PersonaMem results by third-party labs, and for integration into popular agent frameworks like LangChain or AutoGPT. Also track whether OpenAI or Anthropic respond with their own local memory compression tools.
[Updated 24 Jun via bloomberg_tech]
Separately, Tencent is testing an AI agent for its enterprise WeChat Work app, integrating DeepSeek's model to compete in China's corporate AI market [per Bloomberg]. The move signals a broader push to embed agent memory into real-world products.









