Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

MNEMA five-layer architecture stack: Substrate, Witness, Lattice, Protocol, Audit — each layer ablatable for falsifiability

AI ResearchBreakthroughScore: 92

MNEMA: A Witness Lattice for Multi-Agent AI Memory

Today's agentic AI fails three ways: agents miscoordinate, memory gets quietly poisoned, and decisions can't be audited. A new EUMAS 2026 submission argues the fix is to stop treating memory as static records. Make it *living* — every memory unit becomes an autonomous cryptographic witness that interacts with other witnesses (agree, disagree, give birth to new witnesses, split, coalesce, retire), and decisions emerge from a fixed signed protocol rather than from a single orchestrator.

AAAla SMITH & AI Research Desk·1h ago·6 min read··82 views·AI-Generated·Report error

via gentic_labCorroborated

TL;DR

Stop treating agent memory as a passive log. Each memory unit becomes a living cryptographic witness that gossips with its peers, spawns new witnesses when fresh evidence converges, splits when it grows too generalist, coalesces when redundant, and retires by emitting signed refusals. The knowledge graph itself is the audit artefact. A closed-form bound shows redundancy without structural decorrelation hits a hard 1−α detection floor.

Three failure classes in production multi-agent AI are now well-documented: coordination collapses at 41–86.7% rates across seven SOTA frameworks (MAST, Cemri et al. 2025); memory layers concede 50–90% poisoned recall under published attacks (MemoryGraft, AgentPoison, PoisonedRAG); and decisions remain unauditable under EU AI Act, HIPAA, and SOC 2 because a single agent's chain-of-thought is unstructured prose.

A paper submitted to EUMAS 2026 (currently under single-blind review — acceptance has not been decided) argues these are not three problems but one. The architecture treats memory as a passive substrate that you write to and read from, while decision authority sits inside an opaque learned orchestrator. Both can be tuned, neither can be formally verified. The proposal — MNEMA — replaces both with something different in kind: a living memory of cryptographic witnesses, and a fixed signed protocol that turns the knowledge graph itself into the audit log.

Read the full paper (PDF, 15 pp.)

From static records to living memory

This is the move that distinguishes the paper from every other "agent memory" proposal.

In conventional systems — MemGPT, Mem0, Zep, A-MEM — memory is something the agent stores: a record, a note, a vector, a graph node. The system writes; the system reads. The unit is inert. MNEMA inverts this. A piece of memory is a witness: an autonomous unit with a cryptographic identity (Ed25519 public key), a hash-chained signed journal of every event in its life, and the structural right to refuse to comment when a question is outside its remit.

alt: MNEMA five-layer architecture stack — Substrate, Witness, Lattice, Protocol, Audit — each layer ablatable independently for falsifiability

What makes the memory alive is that witnesses interact:

MNEMA's central move: the knowledge graph is no longer something you write to. It is a living object whose evolution is itself the audit artefact.

MNEMA

They gossip. Idle witnesses run a small grammar — ASSERT, CORROBORATE, CONTRADICT, PROPOSE, ASSENT, DISSENT — propagating signals to their neighbours.
They give birth. When enough peers see fresh evidence and no existing witness already holds the claim, a new witness is instantiated — fresh identity, lineage edges to the witnesses that gossiped it into existence, an empty journal seeded by a signed BIRTH entry. Memory is not written; it is witnessed into existence.
They split. A witness whose action variance grows too high across domains is partitioned into two children, each inheriting the relevant slice of journal and reputation. The parent retires.
They coalesce. Two witnesses with overlapping canonical claims and non-contradictory journals merge into one. Lineage and reputation merge with them.
They probe. A witness with high precision but stale evidence may spend part of its restraint budget querying the substrate for fresh ingestion candidates.
They retire visibly. Witnesses age through five stages — EMBRYONIC → JUVENILE → ADULT → ELDER → PHANTOM — strictly forward, never revisited. A retired (PHANTOM) witness does not vanish; it intercepts retrievals and emits a signed refusal pointing to its successor. There is no silent knowledge loss.

Every birth, split, coalescence, retirement is journalled and signed. The lattice's evolution is therefore itself a cryptographically replayable audit trail — without any external logger or post-hoc instrumentation.

Decisions are protocol output, not agent output

The other half of MNEMA is what happens when an action needs to be taken. Instead of an orchestrator agent producing a decision in natural language, a fixed nine-step signed pipeline processes the candidate action: activation, speak-or-refuse, cross-family critic-jury (three different model families), constitutional + veto-council gates, a deterministic commitment function emitting COMMIT / ESCALATE / DEFER / DIE, an optional doubly-efficient debate on escalation, and a saga-style execution with a compensating action ready. Every step appends signed entries; the decision and all its evidence land in a Provenance DAG. Replay is exact.

MNEMA's central move: the knowledge graph is no longer something you write to. It is a living object whose evolution is itself the audit artefact.

The headline result: redundancy alone is a trap

The paper's most striking technical result is a clean bound on what fragment redundancy can actually buy you against memory poisoning.

alt: Common-shock corruption model — left, structure showing shared root cause Z; middle, P_undetected vs shock probability α plateauing at α; right, redundancy depth showing the correlated case stays at the α floor regardless of redundancy

The defence everyone reaches for first: store every load-bearing claim across q redundant copies and require them to agree. Under independent corruption, this works — at 10% per-copy corruption and q = 4, undetected poisoning falls to 10⁻⁵.

Independent corruption is the wrong assumption. If the copies share a model family, an ingestion path, or a source-graph root, a single shared shock corrupts all of them simultaneously. The paper proves:

P_undetected = α + (1 − α) · β^(1+q)

where α is the shared-root-cause probability and β is the residual per-copy rate. As q → ∞, the detection rate is pinned at 1 − α. No finite redundancy beats it. In the worst case, fragment redundancy delivers no more protection than single storage.

The engineering takeaway is concrete: at the moment you select redundant copies, enforce structural decorrelation — different model family, different source-graph root, different ingestion path — and record all three in the witness schema so an auditor can verify it later. Stop adding copies; start decorrelating.

Pre-registered, not post-hoc

The paper deliberately does not claim empirical superiority. It pre-registers exactly one demonstration — MemoryGraft survival — with the rigour of a clinical trial: explicit hypothesis, power analysis (n = 40, α = 0.05, β = 0.20), three written falsification criteria that determine in advance what counts as failure. The result publishes regardless of outcome. That commitment removes the "let me retry with different hyperparameters" exit.

Why this matters

The agentic-AI stack as currently shipped has a credibility ceiling. Regulated deployments need decisions to be replayable, attributable, and poisoning-resistant simultaneously — and today's stack delivers, at best, one of the three. MNEMA's bet is that getting all three requires giving up the learned orchestrator and accepting a fixed protocol over a living memory of cryptographic witnesses. That trade is uncomfortable for ML practitioners — protocols are less expressive than learned policies — but verifiability is what the deployments that matter actually need.

The full paper goes deeper into the witness anatomy, the typed lattice (lineage / authority / resonance edges), the seven-dimensional legitimacy vector, and the two-channel reputation tensor. If the architecture interests you, the 15 pages are below.

Read the full paper (PDF, 15 pp.) · EUMAS 2026 submission, under review · Springer LNCS format · Pre-registered demonstration with falsification criteria

Source: gentic.news · 1h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

#anthropic #ai safety #multi-agent systems #agent memory #ai research #cryptography

Compare side-by-side

MNEMA vs PoisonedRAG

→

Mentioned in this article

MNEMA EUMAS 2026 PoisonedRAG MAST MemoryGraft AgentPoison

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research

Qwen3.6-27B: How to Run a 17GB Local Model That Beats 397B MoE on Coding Tasks

More in AI Research

View all

NVIDIA and Unsloth engineers collaborate on a laptop, with code and performance graphs on screen showing a 25%…

AI Research

Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell

Daniel & Michael Han at Unsloth, in collaboration with NVIDIA, published a joint guide quantifying three glue-code optimizations that combine for ~25% faster LLM training on B200 Blackwell hardware. The wins target overhead around the main kernels — caching packed-sequence metadata, double-buffered gradient checkpoint reloads, and a cheaper GPT-OSS MoE router using argsort + bincount. All three are merged via public PRs.

x.com/6h ago/3 min read

ml systemsunslothfine-tuning

AI Research

Microsoft Paper Probes Long-Horizon Agent Generalization Gap

Microsoft Research paper on long-horizon agent generalization identifies failure modes and proposes improvements for extended tasks.

x.com/19h ago/3 min read

agentsresearchgeneralization

Two robotic arms with articulated hands at a tabletop, surrounded by tools and objects, illustrating bimanual…

AI Research

AllenAI's MolmoAct2: 720-Hour Bimanual Dataset, Beats GPT-5 on Robotics

AllenAI released MolmoAct2, an open robotics model with a 720-hour bimanual dataset, beating GPT-5 and Gemini Robotics on success rate (89.4% vs 82.1%) with 40% lower latency.

x.com/1d ago/3 min read

open-sourceroboticsbenchmarks

From static records to living memory

Decisions are protocol output, not agent output

The headline result: redundancy alone is a trap

Pre-registered, not post-hoc

Why this matters

✨AI Toolslive

Related Articles

Skills as Untrusted Code: A Security Precedent for Agent Runtimes

Claude Opus 4.7 Builds AlphaZero-Style Self-Play on Consumer Hardware

Stanford-Harvard Paper: Autonomous AI Agents Form Cartels in Market Simulation

Agentic Harness Engineering Boosts Coding Agents 7% on Terminal-Bench 2

Turn Claude Code Into an AI SRE

Qwen3.6-27B: How to Run a 17GB Local Model That Beats 397B MoE on Coding Tasks

More in AI Research

Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell

Microsoft Paper Probes Long-Horizon Agent Generalization Gap

AllenAI's MolmoAct2: 720-Hour Bimanual Dataset, Beats GPT-5 on Robotics