Hermes agent's three-tier memory system uses two tiny markdown files—MEMORY.md (2,200 chars) and USER.md (1,375 chars)—as its always-present tier 1. The architecture solves the agent memory trade-off between shallow always-on context and deep but passive vector stores.
Key facts
- MEMORY.md is 2,200 chars; USER.md is 1,375 chars.
- Tier 2 FTS5 search takes ~10ms over 10,000+ docs.
- 8 pluggable external providers in tier 3.
- Periodic nudge fires every ~300 seconds.
- MEMORY.md consolidates at ~80% capacity.
Current agent memory systems face a binary trade-off: either pack everything into the prompt (always-on but shallow, limited by context window) or rely on vector stores that rarely fire at the right moment. Hermes agent, described by developer @akshay_pachaar, introduces a three-tier composition that splits the difference.
How the Three Tiers Compose
Tier 1 is two tiny markdown files—MEMORY.md (2,200 chars) and USER.md (1,375 chars)—injected into the system prompt at session start as a frozen snapshot [According to @akshay_pachaar]. MEMORY.md holds project conventions, tool quirks, and lessons learned; USER.md stores user profile data such as name, communication style, and skill level. When MEMORY.md hits ~80% capacity, the agent consolidates: merges related entries, drops redundancy, and keeps only the densest facts. This is natural selection pressure applied to memory—the files stay small, but what's inside gets sharper over time.
Tier 2 is SQLite with FTS5 indexing, storing every conversation for full-text search. When the agent calls session_search, FTS5 ranks matches in ~10ms over 10,000+ docs, an LLM summarizes the top hits, and a concise result returns to context [According to @akshay_pachaar]. Tier 1 is always present but tiny; tier 2 has unlimited capacity but requires an active search.
Tier 3 offers 8 pluggable external providers that run alongside tiers 1 and 2, never replacing them. Notable providers include Honcho (dialectic user modeling, 12 identity layers), Holographic (local-first, HRR vectors, no external calls), and Supermemory (context fencing that prevents infinite re-storage of the same fact). When active, Hermes auto-syncs every turn: prefetch before, sync after, extract at session end.
The Five-Step Turn Cycle
The tiers compose on every turn through a five-step cycle:
- Turn opens. Tier 1 is already in prompt, tier 3 prefetches and prepends.
- Agent responds using all three tiers as context.
- A periodic nudge fires every ~300s. The agent reflects: "has anything worth persisting happened?" If yes, it writes; if no, it returns silently.
- Memory written to MEMORY.md on disk. Invisible this session because the prefix cache stays warm.
- Session closes. Tier 2 logs the transcript, tier 3 extracts semantics. Next session opens with the new state.
Unique Take: Composition Over Single-Store
The structural insight here is that Hermes composes across multiple memory tiers rather than choosing one. Most agent frameworks pick a single memory mechanism (vector store, long-term context, or fine-tuning). Hermes uses tiny always-present files for critical facts, full-text search for deep recall, and external providers for semantic modeling—all orchestrated by a nudge that decides autonomously what's worth saving. The agent doesn't just store memories; it curates them under pressure.
What to watch
Watch for open-source release of Hermes agent's memory orchestration code, which would allow benchmarking against MemGPT and Letta. Also track whether the periodic nudge interval (300s) proves optimal across diverse agent workloads—too short wastes tokens, too long misses ephemeral context.








