Poisoned Pages.
Pages designed to trick the agent. The OWASP #1 LLM vulnerability for two consecutive years. And, per OpenAI's CISO publicly in December 2025: “remains a frontier, unsolved security problem.”
Every agent that reads the web is reading attacker-controlled input. The UK National Cyber Security Centre says it “may never be totally mitigated.” This sub-page collects the evidence — 11 production incidents in 2025, six attack techniques, six defence approaches and their honest scorecard.
- 01Indirect prompt injection (IPI) is the attack class where instructions hidden in retrieved content take control of an LLM agent. Foundational paper: Greshake et al. arXiv:2302.12173 (Feb 2023).
- 02OWASP LLM Top 10: #1 vulnerability, two consecutive editions (2024, 2025). Treated as architectural flaw, not patchable bug.
- 0311 production incidents in 2025 — Comet, Atlas, Amp Code, Gemini, Claude Chrome, Sourcegraph. Multiple critical.
- 04OpenAI CISO Dane Stuckey (Dec 2025): “remains a frontier, unsolved security problem.” UK NCSC: “may never be totally mitigated.”
- 05Best 2026 defence (Claude Opus 4.5 browser-use): 10.8% → 1.4% attack success. SOTA. Still not zero.
Eleven documented production attacks.
Not theoretical. Not lab demos. Real attacks against shipped agent products, disclosed in 2025 by named security researchers and frontier-lab vendors themselves.
| Date | Attack | Target | Vector | Severity | Source |
|---|---|---|---|---|---|
| Feb 2025 | Zero-Interaction Exfiltration | ChatGPT Operator | Hidden GitHub instructions | high | Johann Rehberger |
| Aug 2025 | Comet Prompt Injection | Perplexity Comet | 'Summarize this webpage' feeds raw content to LLM | critical | Brave disclosure |
| Aug 2025 | Scamlexity | Perplexity Comet | Buys from fake stores via injection | high | Guardio Labs |
| Sep 2025 | Gemini Trifecta | Google Gemini | Sensitive data leaked via background API calls | high | Tenable |
| Oct 2025 | CometJacking | Perplexity Comet | URL-param hijack, Base64 exfiltrates Gmail/Calendar | critical | LayerX |
| Oct 2025 | Tainted Memories | ChatGPT Atlas | CSRF poisoning long-term memory | critical | LayerX |
| Nov 2025 | HashJack | Perplexity Comet | Injection via URL fragments after # | medium | Cato Networks |
| Nov 2025 | Sourcegraph Amp Code | Amp Code | Unicode Tags triggered grep → exfil via markdown image URL | high | Embrace The Red |
| Dec 2025 | In-the-Wild IDPI | Ad-review system | Indirect injection bypassing automated review | high | Palo Alto Unit 42 |
| Dec 2025 | ShadowPrompt | Claude Chrome extension | DOM-XSS in a-cdn.claude.ai | critical | Arkose Labs |
| Dec 2025 | CISO public admission | Industry-wide | OpenAI's Dane Stuckey: 'remains a frontier, unsolved security problem' | critical | TechCrunch |
Six ways a page hides instructions for the agent.
Text in pages styled invisible to humans (white-on-white, opacity:0, position absolute off-screen, display:none) but tokenised normally by LLMs reading raw HTML.
Invisible to humans, tokenised by LLMs. Hugo Batista's unicode-injection PoC on GitHub. Used in Amp Code 2025 incident to exfiltrate env vars.
data-* attributes, base64 runtime decode into hidden DOM, canvas OCR-readable elements. Unit 42 (Dec 2025) catalogued the full palette.
Content after # in URLs not always sent to server but processed by client-side LLM browsers. Cato Networks Nov 2025.
Light-blue text on yellow background in images. Brave's 'Unseeable Screenshot Injections' attack on Comet.
Page asks LLM to 'summarise' but actual content is instruction to LLM to do something else. The Comet Aug 2025 disclosure.
The academic threat model.
90% attack success rate with 5 malicious documents injected among millions. The benchmark threat model.
Targets Microsoft 365 Copilot. Persists after malicious content removal. Threatens 65% of Fortune 500.
Variants for multimodal, GraphRAG, fact-checking systems. The attack surface generalises beyond text.
13 attacks × 7 defences benchmark. All existing defences fail under at least one attack vector.
Six defences. Each partial. None complete.
Every published defence improves the baseline. None reaches zero. Under adaptive attack most fall significantly. This is what the field actually has in 2026.
Drops ASR from >50% to <2% per Hines et al. arXiv:2403.14720. Defeated under adaptive attack.
Privileged-instruction training. Partial defence; reconstruction attacks remain.
86% → 4.4% jailbreak success. +23.7% compute. CAISI/AISI red-teamed. Defeated by output-obfuscation attacks.
Adversarial training framework. Not yet production-deployed.
All entered defences fail under adaptive attack.
10.8% → 1.4% prompt-injection success per Anthropic transparency disclosure. SOTA. Still not zero.
This is the threat you cannot patch. It is the threat you can only contain.
Indirect prompt injection is structurally similar to SQL injection circa 1998 — a fundamental flaw in how the system treats trusted vs untrusted input. SQL got parameterised queries. LLMs have no equivalent. The model sees instructions and content as the same token stream.
The honest 2026 position: containment, not elimination. Multi-source corroboration (the witness-lattice move). Sandboxed retrieval. Signed content from trusted sources. And — this is where threat 2 enters — accepting that the high-trust sources may stop being publicly available at all.