Threat 1 · sub-page of When Agents Read

Poisoned Pages.

Pages designed to trick the agent. The OWASP #1 LLM vulnerability for two consecutive years. And, per OpenAI's CISO publicly in December 2025: “remains a frontier, unsolved security problem.”

Every agent that reads the web is reading attacker-controlled input. The UK National Cyber Security Centre says it “may never be totally mitigated.” This sub-page collects the evidence — 11 production incidents in 2025, six attack techniques, six defence approaches and their honest scorecard.

tl;dr · 60 seconds

01Indirect prompt injection (IPI) is the attack class where instructions hidden in retrieved content take control of an LLM agent. Foundational paper: Greshake et al. arXiv:2302.12173 (Feb 2023).
02OWASP LLM Top 10: #1 vulnerability, two consecutive editions (2024, 2025). Treated as architectural flaw, not patchable bug.
0311 production incidents in 2025 — Comet, Atlas, Amp Code, Gemini, Claude Chrome, Sourcegraph. Multiple critical.
04OpenAI CISO Dane Stuckey (Dec 2025): “remains a frontier, unsolved security problem.” UK NCSC: “may never be totally mitigated.”
05Best 2026 defence (Claude Opus 4.5 browser-use): 10.8% → 1.4% attack success. SOTA. Still not zero.

i · the 2025 incident timeline

Eleven documented production attacks.

Not theoretical. Not lab demos. Real attacks against shipped agent products, disclosed in 2025 by named security researchers and frontier-lab vendors themselves.

Date	Attack	Target	Vector	Severity	Source
Feb 2025	Zero-Interaction Exfiltration	ChatGPT Operator	Hidden GitHub instructions	high	Johann Rehberger
Aug 2025	Comet Prompt Injection	Perplexity Comet	'Summarize this webpage' feeds raw content to LLM	critical	Brave disclosure
Aug 2025	Scamlexity	Perplexity Comet	Buys from fake stores via injection	high	Guardio Labs
Sep 2025	Gemini Trifecta	Google Gemini	Sensitive data leaked via background API calls	high	Tenable
Oct 2025	CometJacking	Perplexity Comet	URL-param hijack, Base64 exfiltrates Gmail/Calendar	critical	LayerX
Oct 2025	Tainted Memories	ChatGPT Atlas	CSRF poisoning long-term memory	critical	LayerX
Nov 2025	HashJack	Perplexity Comet	Injection via URL fragments after #	medium	Cato Networks
Nov 2025	Sourcegraph Amp Code	Amp Code	Unicode Tags triggered grep → exfil via markdown image URL	high	Embrace The Red
Dec 2025	In-the-Wild IDPI	Ad-review system	Indirect injection bypassing automated review	high	Palo Alto Unit 42
Dec 2025	ShadowPrompt	Claude Chrome extension	DOM-XSS in a-cdn.claude.ai	critical	Arkose Labs
Dec 2025	CISO public admission	Industry-wide	OpenAI's Dane Stuckey: 'remains a frontier, unsolved security problem'	critical	TechCrunch

ii · attack techniques

Six ways a page hides instructions for the agent.

Hidden HTML text

Text in pages styled invisible to humans (white-on-white, opacity:0, position absolute off-screen, display:none) but tokenised normally by LLMs reading raw HTML.

Unicode Tag characters (U+E0000-E007F)

Invisible to humans, tokenised by LLMs. Hugo Batista's unicode-injection PoC on GitHub. Used in Amp Code 2025 incident to exfiltrate env vars.

CSS-based smuggling

data-* attributes, base64 runtime decode into hidden DOM, canvas OCR-readable elements. Unit 42 (Dec 2025) catalogued the full palette.

URL fragment injection (HashJack)

Content after # in URLs not always sent to server but processed by client-side LLM browsers. Cato Networks Nov 2025.

Faint-text image injection

Light-blue text on yellow background in images. Brave's 'Unseeable Screenshot Injections' attack on Comet.

Indirect via summarisation

Page asks LLM to 'summarise' but actual content is instruction to LLM to do something else. The Comet Aug 2025 disclosure.

iii · named RAG poisoning attacks

The academic threat model.

PoisonedRAG

Zou et al., USENIX Security 2025

90% attack success rate with 5 malicious documents injected among millions. The benchmark threat model.

ConfusedPilot

RoyChowdhury et al., UT Austin + Symmetry Systems, Oct 2024

Targets Microsoft 365 Copilot. Persists after malicious content removal. Threatens 65% of Fortune 500.

POISONCRAFT / PoisonArena / ADMIT / Poisoned-MRAG / GragPoison / RIPRAG

Multiple labs

Variants for multimodal, GraphRAG, fact-checking systems. The attack surface generalises beyond text.

RAG Security Bench (RSB)

arXiv 2505.18543, May 2025

13 attacks × 7 defences benchmark. All existing defences fail under at least one attack vector.

iv · the defence scorecard

Six defences. Each partial. None complete.

Every published defence improves the baseline. None reaches zero. Under adaptive attack most fall significantly. This is what the field actually has in 2026.

Spotlighting (Microsoft)

Research deployed

Drops ASR from >50% to <2% per Hines et al. arXiv:2403.14720. Defeated under adaptive attack.

Instruction Hierarchy (OpenAI)

Production

Privileged-instruction training. Partial defence; reconstruction attacks remain.

Constitutional Classifiers (Anthropic)

Production · Claude 4+

86% → 4.4% jailbreak success. +23.7% compute. CAISI/AISI red-teamed. Defeated by output-obfuscation attacks.

PromptArmor (arXiv 2507.15219)

Research

Adversarial training framework. Not yet production-deployed.

LLMail-Inject benchmark

Microsoft challenge, Dec 2024

All entered defences fail under adaptive attack.

Claude Opus 4.5 browser-use

Production · Nov 2025

10.8% → 1.4% prompt-injection success per Anthropic transparency disclosure. SOTA. Still not zero.

closing

This is the threat you cannot patch. It is the threat you can only contain.

Indirect prompt injection is structurally similar to SQL injection circa 1998 — a fundamental flaw in how the system treats trusted vs untrusted input. SQL got parameterised queries. LLMs have no equivalent. The model sees instructions and content as the same token stream.

The honest 2026 position: containment, not elimination. Multi-source corroboration (the witness-lattice move). Sandboxed retrieval. Signed content from trusted sources. And — this is where threat 2 enters — accepting that the high-trust sources may stop being publicly available at all.

continue the section

↑ hub

When Agents Read

The unified frame and the flywheel.

threat 2 →

Withheld Knowledge

Why creators stop sharing. The AI tax.

threat 3 →

The Slop Tide

When AI reads AI. The collapse science.

the fix →

What needs to be built

C2PA, SynthID, witness-lattice retrieval.