The fix · sub-page of When Agents Read

What needs to be built.

Seven defensive layers. Each addresses part of the crisis. Each is real today at some level of deployment. None is sufficient alone. The honest answer is layered defence — every layer partial, every gap covered by another.

For each layer below: what it does, current deployment status, and the honest critique of where it fails or is gameable. The truth is: there is no single fix. But the stack is buildable.

tl;dr · 60 seconds

01Seven defensive layers exist today. None complete. None standalone.
02The strongest single layer: multi-source witness-lattice retrieval. Anthropic Citations API: 10% → 0% hallucinations on Endex. Connects to MNEMA work.
03The weakest single layer: watermarking alone. UnMarker defeats 79% of image watermarks. Text watermarks degrade under paraphrasing.
04The forgotten layer: economic repair. Fixing threats 1 and 3 without paying creators (threat 2) leaves the corpus shrinking forever.
05The architectural move: apply the MNEMA witness-lattice idea (built for internal multi-agent memory) to external retrieval. Same primitive, different scope.

i · the layered defence

Seven layers. Each has a scorecard.

For each layer: what it does, current deployment status, and where it fails. No sugar-coating. The composite is real but the parts are partial.

Content provenance (C2PA / Content Credentials)

What it does: Cryptographic signing of content at capture or generation. Adobe-led standard. Hardware support in Leica, Sony, Nikon, Galaxy S25, Pixel 10. Browser display via Microsoft Edge and the Digimarc / Truepic extensions. Mandated under EU AI Act Article 50 from August 2026.

Status: Spec v2.2 (May 2025). 6,000+ Content Authenticity Initiative members. Adopted by Meta (360M+ pieces labeled in one 29-day window), OpenAI, Microsoft, Google.

Honest critique: Opt-in. Bad actors don't sign. Midjourney ships zero AI provenance metadata. Nikon Z6 III had a signing vulnerability that invalidated every credential issued before fix. The standard is real and growing; the coverage is thin and unevenly enforced.

Watermarking (SynthID and successors)

What it does: Statistical signature embedded in generated text, image, audio, video — detectable by paired classifier without changing content meaningfully. Google DeepMind reports 10B+ pieces watermarked (Dec 2025). Unified SynthID Detector launched in Gemini Nov 2025.

Status: Production at Google scale. SynthID-Text in Gemini, SynthID-Image in Imagen, audio in Lyria, video in Veo.

Honest critique: Defeated. UnMarker paper (IEEE S&P 2025): 79% image-watermark removal. ETH Zurich SRI Lab: text watermark detection trivial in black-box setting. Paraphrasing + back-translation degrades text watermarks significantly. Schneier: 'Watermarking alone cannot meet the challenge.'

Source-reputation systems

What it does: Score every retrievable source for trustworthiness based on: historical accuracy, contradiction graphs, recency, authority, signed credentials. Apply weights at retrieval time. CONFACT (IJCAI 2025), ReliabilityRAG (arXiv 2509.23519).

Status: Research only. No production frontier-lab agent does this rigorously at scale yet.

Honest critique: Adversarial: anyone who knows the scoring rules games them (Goodhart). Centralisation risk: who decides reputation? Multi-source corroboration is more robust but more expensive per query.

Multi-source corroboration / witness-lattice retrieval

What it does: Don't trust any single retrieved page. Require N independent sources to corroborate before treating a claim as grounded. Maximum Independent Set on the contradiction graph. MADAM-RAG (COLM 2025). The MNEMA witness-lattice architecture applied to external retrieval.

Status: Research deployed in narrow systems. Anthropic Citations API (Jan 2025): sentence-chunked sources, +15% recall, hallucinations 10%→0% (Endex measurement). Perplexity inline source tiles. OpenAI Deep Research tool-trace.

Honest critique: Strongest current defence. Requires N parallel retrievals (cost). Vulnerable when adversary controls multiple sources (the cycle attack). Connects directly to gentic.news's MNEMA work.

Sandboxed / pre-cleaned retrieval indices

What it does: Don't query the raw web. Use a curated, signed, pre-vetted index. Brave Search API (only Zero-Data-Retention option, SOC 2 Type II, 40B-page independent index). LLM Context API endpoint. Bing/Google effectively closed for agent use.

Status: Brave is the only major independent option after Bing API shutdown (Aug 2024). DuckDuckGo Instant Answer remains free fallback.

Honest critique: Removes the wild-web attack surface but doesn't solve the corpus quality problem. Brave still indexes AI slop. Centralises trust in the curator. Pre-cleaning at scale is genuinely hard.

Economic repair (Pay-Per-Crawl, attribution, licensing)

What it does: Pay creators when agents extract value. Cloudflare Pay-Per-Crawl (HTTP 402, launched July 2025). Perplexity Comet Plus 80/20 split, $42.5M pool. TollBit token-based scrape billing, 5,000+ publishers. OpenAI partnerships ($1-250M).

Status: Operational at infrastructure layer. Limited adoption among AI labs (OpenAI declined Cloudflare's marketplace). EU AI Act may eventually mandate.

Honest critique: Fixes threat 2 economics partially. Does nothing for threats 1 or 3. Stablecoin payment rails being explored. The current AI revenue flowing to publishers is a tiny fraction of the ad revenue Google AI Overviews destroyed (~$2B/year per industry estimate).

Constitutional / character-based defence

What it does: Train the agent itself to be robust to adversarial input via principles + character training. Anthropic Constitutional Classifiers (Feb 2025): jailbreak success 86% → 4.4%. Claude Opus 4.5 prompt-injection: 10.8% → 1.4% on browser-use.

Status: Production at Anthropic. Cited 23.7% compute overhead. Best 2026 baseline.

Honest critique: Defeated by output-obfuscation and adaptive attacks. CAISI/AISI red-teamed but Claude Opus 4.5 still leaks at 1.4%. Not zero. Treats the symptom (the agent's vulnerable behaviour) not the cause (the substrate is hostile).

ii · the architectural move

Witness-lattice retrieval applied externally.

The MNEMA paper (gentic.news, 2026) proposed a witness lattice for multi-agent AI memory: every claim is signed, scoped, refutable, with cryptographic identity per writer and a closed-form bound on undetected memory poisoning.

The architectural move proposed here is simple: apply the same primitive to external retrieval. Treat every retrieved page as a claim from a witness. Verify the witness identity (C2PA). Score the witness reliability (reputation layer). Require multiple independent witnesses to corroborate (multi-source). Bound the failure mode mathematically the way MNEMA bounds memory corruption.

The same protection that closes the silent-corruption gap inside an agent's memory should close the silent-corruption gap at the boundary where an agent reads the world.

This is the move no frontier lab has yet shipped end-to-end. Components exist (Anthropic Citations API is one piece, Brave ZDR is another, C2PA is a third). The integration is the open work.

closing

The fix is real, layered, and not single-shot. The next decade is who ships it.

Most lab manifestos end with abstractions. This one ends with a buildable list. Seven layers. Status of each. Honest critique of each. None is sufficient. The composite is sufficient if all seven mature together — and that is the work of the next decade for the AI-retrieval industry.

The lab's contribution will be the witness-lattice integration — applying MNEMA to external retrieval — because nobody else is doing it under a single name.

The substrate can be rebuilt. It will not rebuild itself. And it will not rebuild from any single layer.

re-read the threats this fixes

↑ hub

When Agents Read

The unified frame and the flywheel.

← threat 1

Poisoned Pages

Adversarial inputs. Prompt injection.

← threat 2

Withheld Knowledge

Why creators stop sharing. The AI tax.

← threat 3

The Slop Tide

When AI reads AI. The collapse science.