Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…
Threat 1 · sub-page of When Agents Read

Poisoned Pages.

Pages designed to trick the agent. The OWASP #1 LLM vulnerability for two consecutive years. And, per OpenAI's CISO publicly in December 2025: “remains a frontier, unsolved security problem.”

Every agent that reads the web is reading attacker-controlled input. The UK National Cyber Security Centre says it “may never be totally mitigated.” This sub-page collects the evidence — 11 production incidents in 2025, six attack techniques, six defence approaches and their honest scorecard.

tl;dr · 60 seconds
  • 01Indirect prompt injection (IPI) is the attack class where instructions hidden in retrieved content take control of an LLM agent. Foundational paper: Greshake et al. arXiv:2302.12173 (Feb 2023).
  • 02OWASP LLM Top 10: #1 vulnerability, two consecutive editions (2024, 2025). Treated as architectural flaw, not patchable bug.
  • 0311 production incidents in 2025 — Comet, Atlas, Amp Code, Gemini, Claude Chrome, Sourcegraph. Multiple critical.
  • 04OpenAI CISO Dane Stuckey (Dec 2025): “remains a frontier, unsolved security problem.” UK NCSC: “may never be totally mitigated.”
  • 05Best 2026 defence (Claude Opus 4.5 browser-use): 10.8% → 1.4% attack success. SOTA. Still not zero.
i · the 2025 incident timeline

Eleven documented production attacks.

Not theoretical. Not lab demos. Real attacks against shipped agent products, disclosed in 2025 by named security researchers and frontier-lab vendors themselves.

DateAttackTargetVectorSeveritySource
Feb 2025Zero-Interaction ExfiltrationChatGPT OperatorHidden GitHub instructionshighJohann Rehberger
Aug 2025Comet Prompt InjectionPerplexity Comet'Summarize this webpage' feeds raw content to LLMcriticalBrave disclosure
Aug 2025ScamlexityPerplexity CometBuys from fake stores via injectionhighGuardio Labs
Sep 2025Gemini TrifectaGoogle GeminiSensitive data leaked via background API callshighTenable
Oct 2025CometJackingPerplexity CometURL-param hijack, Base64 exfiltrates Gmail/CalendarcriticalLayerX
Oct 2025Tainted MemoriesChatGPT AtlasCSRF poisoning long-term memorycriticalLayerX
Nov 2025HashJackPerplexity CometInjection via URL fragments after #mediumCato Networks
Nov 2025Sourcegraph Amp CodeAmp CodeUnicode Tags triggered grep → exfil via markdown image URLhighEmbrace The Red
Dec 2025In-the-Wild IDPIAd-review systemIndirect injection bypassing automated reviewhighPalo Alto Unit 42
Dec 2025ShadowPromptClaude Chrome extensionDOM-XSS in a-cdn.claude.aicriticalArkose Labs
Dec 2025CISO public admissionIndustry-wideOpenAI's Dane Stuckey: 'remains a frontier, unsolved security problem'criticalTechCrunch
ii · attack techniques

Six ways a page hides instructions for the agent.

Hidden HTML text

Text in pages styled invisible to humans (white-on-white, opacity:0, position absolute off-screen, display:none) but tokenised normally by LLMs reading raw HTML.

Unicode Tag characters (U+E0000-E007F)

Invisible to humans, tokenised by LLMs. Hugo Batista's unicode-injection PoC on GitHub. Used in Amp Code 2025 incident to exfiltrate env vars.

CSS-based smuggling

data-* attributes, base64 runtime decode into hidden DOM, canvas OCR-readable elements. Unit 42 (Dec 2025) catalogued the full palette.

URL fragment injection (HashJack)

Content after # in URLs not always sent to server but processed by client-side LLM browsers. Cato Networks Nov 2025.

Faint-text image injection

Light-blue text on yellow background in images. Brave's 'Unseeable Screenshot Injections' attack on Comet.

Indirect via summarisation

Page asks LLM to 'summarise' but actual content is instruction to LLM to do something else. The Comet Aug 2025 disclosure.

iii · named RAG poisoning attacks

The academic threat model.

PoisonedRAG
Zou et al., USENIX Security 2025

90% attack success rate with 5 malicious documents injected among millions. The benchmark threat model.

ConfusedPilot
RoyChowdhury et al., UT Austin + Symmetry Systems, Oct 2024

Targets Microsoft 365 Copilot. Persists after malicious content removal. Threatens 65% of Fortune 500.

POISONCRAFT / PoisonArena / ADMIT / Poisoned-MRAG / GragPoison / RIPRAG
Multiple labs

Variants for multimodal, GraphRAG, fact-checking systems. The attack surface generalises beyond text.

RAG Security Bench (RSB)
arXiv 2505.18543, May 2025

13 attacks × 7 defences benchmark. All existing defences fail under at least one attack vector.

iv · the defence scorecard

Six defences. Each partial. None complete.

Every published defence improves the baseline. None reaches zero. Under adaptive attack most fall significantly. This is what the field actually has in 2026.

Research deployed

Drops ASR from >50% to <2% per Hines et al. arXiv:2403.14720. Defeated under adaptive attack.

Privileged-instruction training. Partial defence; reconstruction attacks remain.

86% → 4.4% jailbreak success. +23.7% compute. CAISI/AISI red-teamed. Defeated by output-obfuscation attacks.

PromptArmor (arXiv 2507.15219)
Research

Adversarial training framework. Not yet production-deployed.

LLMail-Inject benchmark
Microsoft challenge, Dec 2024

All entered defences fail under adaptive attack.

Production · Nov 2025

10.8% → 1.4% prompt-injection success per Anthropic transparency disclosure. SOTA. Still not zero.

closing

This is the threat you cannot patch. It is the threat you can only contain.

Indirect prompt injection is structurally similar to SQL injection circa 1998 — a fundamental flaw in how the system treats trusted vs untrusted input. SQL got parameterised queries. LLMs have no equivalent. The model sees instructions and content as the same token stream.

The honest 2026 position: containment, not elimination. Multi-source corroboration (the witness-lattice move). Sandboxed retrieval. Signed content from trusted sources. And — this is where threat 2 enters — accepting that the high-trust sources may stop being publicly available at all.

continue the section