For each layer: what it does, current deployment status, and where it fails. No sugar-coating. The composite is real but the parts are partial.
What it does: Cryptographic signing of content at capture or generation. Adobe-led standard. Hardware support in Leica, Sony, Nikon, Galaxy S25, Pixel 10. Browser display via Microsoft Edge and the Digimarc / Truepic extensions. Mandated under EU AI Act Article 50 from August 2026.
Status: Spec v2.2 (May 2025). 6,000+ Content Authenticity Initiative members. Adopted by Meta (360M+ pieces labeled in one 29-day window), OpenAI, Microsoft, Google.
Honest critique: Opt-in. Bad actors don't sign. Midjourney ships zero AI provenance metadata. Nikon Z6 III had a signing vulnerability that invalidated every credential issued before fix. The standard is real and growing; the coverage is thin and unevenly enforced.
What it does: Statistical signature embedded in generated text, image, audio, video — detectable by paired classifier without changing content meaningfully. Google DeepMind reports 10B+ pieces watermarked (Dec 2025). Unified SynthID Detector launched in Gemini Nov 2025.
Status: Production at Google scale. SynthID-Text in Gemini, SynthID-Image in Imagen, audio in Lyria, video in Veo.
Honest critique: Defeated. UnMarker paper (IEEE S&P 2025): 79% image-watermark removal. ETH Zurich SRI Lab: text watermark detection trivial in black-box setting. Paraphrasing + back-translation degrades text watermarks significantly. Schneier: 'Watermarking alone cannot meet the challenge.'
What it does: Score every retrievable source for trustworthiness based on: historical accuracy, contradiction graphs, recency, authority, signed credentials. Apply weights at retrieval time. CONFACT (IJCAI 2025), ReliabilityRAG (arXiv 2509.23519).
Status: Research only. No production frontier-lab agent does this rigorously at scale yet.
Honest critique: Adversarial: anyone who knows the scoring rules games them (Goodhart). Centralisation risk: who decides reputation? Multi-source corroboration is more robust but more expensive per query.
What it does: Don't trust any single retrieved page. Require N independent sources to corroborate before treating a claim as grounded. Maximum Independent Set on the contradiction graph. MADAM-RAG (COLM 2025). The MNEMA witness-lattice architecture applied to external retrieval.
Status: Research deployed in narrow systems. Anthropic Citations API (Jan 2025): sentence-chunked sources, +15% recall, hallucinations 10%→0% (Endex measurement). Perplexity inline source tiles. OpenAI Deep Research tool-trace.
Honest critique: Strongest current defence. Requires N parallel retrievals (cost). Vulnerable when adversary controls multiple sources (the cycle attack). Connects directly to gentic.news's MNEMA work.
What it does: Don't query the raw web. Use a curated, signed, pre-vetted index. Brave Search API (only Zero-Data-Retention option, SOC 2 Type II, 40B-page independent index). LLM Context API endpoint. Bing/Google effectively closed for agent use.
Status: Brave is the only major independent option after Bing API shutdown (Aug 2024). DuckDuckGo Instant Answer remains free fallback.
Honest critique: Removes the wild-web attack surface but doesn't solve the corpus quality problem. Brave still indexes AI slop. Centralises trust in the curator. Pre-cleaning at scale is genuinely hard.
What it does: Pay creators when agents extract value. Cloudflare Pay-Per-Crawl (HTTP 402, launched July 2025). Perplexity Comet Plus 80/20 split, $42.5M pool. TollBit token-based scrape billing, 5,000+ publishers. OpenAI partnerships ($1-250M).
Status: Operational at infrastructure layer. Limited adoption among AI labs (OpenAI declined Cloudflare's marketplace). EU AI Act may eventually mandate.
Honest critique: Fixes threat 2 economics partially. Does nothing for threats 1 or 3. Stablecoin payment rails being explored. The current AI revenue flowing to publishers is a tiny fraction of the ad revenue Google AI Overviews destroyed (~$2B/year per industry estimate).
What it does: Train the agent itself to be robust to adversarial input via principles + character training. Anthropic Constitutional Classifiers (Feb 2025): jailbreak success 86% → 4.4%. Claude Opus 4.5 prompt-injection: 10.8% → 1.4% on browser-use.
Status: Production at Anthropic. Cited 23.7% compute overhead. Best 2026 baseline.
Honest critique: Defeated by output-obfuscation and adaptive attacks. CAISI/AISI red-teamed but Claude Opus 4.5 still leaks at 1.4%. Not zero. Treats the symptom (the agent's vulnerable behaviour) not the cause (the substrate is hostile).