Threat 3 · sub-page of When Agents Read

The Slop Tide.

When AI reads itself. Three-quarters of newly created webpages contain AI text. Model collapse science says even 1/1000 synthetic content triggers irreversible degradation. The corpus is becoming the system.

What fills the gap when creators withhold? AI slop. Nine measurements, four pieces of model-collapse science, six secondary signals. The 2024-2026 evidence base.

tl;dr · 60 seconds

0174.2% of newly created webpages contain AI text (Ahrefs, April 2025). 2.5% pure AI, 71.7% mixed. The substrate is now hybrid by default.
023,006 AI content farms tracked by NewsGuard across 16 languages (May 2026). Up from 125 in May 2023 — 24× in three years.
03Model collapse is real and proven. Shumailov Nature 2024 + Strong Model Collapse ICLR 2025: 1/1000 synthetic-data fraction is enough to trigger irreversible degradation.
04The dead internet has gone mainstream. Sam Altman: 'really a lot of LLM-run Twitter accounts now.' Alexis Ohanian: 'so much of the internet is now just dead.'
05Mitigation requires anchor sets. Gerstgrasser et al.: accumulating real + synthetic prevents collapse. But the real content (threat 2) is moving behind walls.

i · the measurements

Nine numbers tell the story.

74.2%

of newly created webpages contain AI-generated text (2.5% pure AI, 71.7% mixed)

Ahrefs · April 2025

17.31%

of Google top-20 search results AI-generated (peak 19.56% July 2025)

Originality.ai · September 2025

14.7%

of Reddit posts likely AI-generated · up 146% since 2021

Originality.ai · 2025

3,006

AI content farm sites tracked, across 16 languages · up from 125 in May 2023

NewsGuard · May 2026

35%

rate at which chatbots repeat false claims from the corpus (up from ~18% Aug 2024)

NewsGuard · August 2025

of front-page Amazon reviews in beauty category are AI-generated · 80% MoM growth since June 2023

Pangram Labs · 2024

82%

of herbal-remedy books on Amazon are AI-generated

Originality.ai · 2024-2025

~33%

of all web traffic is bots (Imperva ~50% non-human, ~20% 'bad bots')

Cloudflare · 2025

Bot > human

predicted traffic crossover

Industry estimate (Matthew Prince) · by 2027

ii · the science

Model collapse is no longer theoretical.

Four papers establish that recursively trained models degrade irreversibly. The 2024-2025 results moved this from speculation to active empirical concern at every frontier lab.

Shumailov et al. — 'AI models collapse when trained on recursively generated data'

Nature 631:755-759, July 2024

Foundational result. Recursive training on model output causes irreversible distribution collapse in LLMs, VAEs, GMMs. Perplexity rose 20-28 points across generations.

Dohmatob et al. — 'Strong Model Collapse'

ICLR 2025

Even a 1/1000 synthetic-data fraction triggers collapse. Scale does not save you. This is the load-bearing 2025 result.

'Knowledge Collapse'

NeurIPS 2025

Factual accuracy degrades while fluency persists — 'confidently wrong' outputs. Quality of explanation falls before quality of prose.

Gerstgrasser et al. (arXiv 2404.01413)

2024

Accumulating (not replacing) real+synthetic data prevents collapse. The 'anchor set' must be preserved. Implication: every retrain needs 25-30% verified human content.

iii · secondary evidence

Six signals from beyond the labs.

Google AI Overviews 'eat rocks' / 'pizza glue' (May 2024)

Pizza glue traced to 11-year-old Reddit joke post. Rocks from Onion satire. Classic RAG ungrounded retrieval. Triggered a feedback loop: news coverage about the bug became a source, causing AI Overviews to repeat similar errors.

NewsGuard AI false-claim rate

Chatbots repeat false claims 35% of the time (Aug 2025), up from ~18% (Aug 2024). Non-response rate fell from 31% to 0%. Models now answer more — and answer worse.

Sam Altman (Sept 2025, TIME)

'Really a lot of LLM-run Twitter accounts now.' First major frontier-lab CEO to publicly confirm the dead-internet shift.

Alexis Ohanian (Oct 2025, Fortune)

'So much of the internet is now just dead.' Reddit co-founder explicitly using the dead-internet vocabulary.

404 Media tracking

AI Reel viewed 362M times — more than every 404 Media article combined, multiplied tens of times. AI slop mentions: 461K (2024) → 2.4M (Nov 2025) — 9× increase.

Academic publishing

Retraction Watch: 46.3% of 325 AI-related retractions in 2023, 22.7% in 2024. Tell-tale phrases ('As of my last knowledge update', 'regenerate response') caught dozens of papers. Only 24% of top publishers have explicit GenAI guidelines.

iv · the dead internet, mainstreamed

From fringe theory to industry consensus.

The Dead Internet theory was a half-joke conspiracy of the early 2020s — “most of the internet is bots talking to bots.” In 2024-2026 it crossed over.

The Atlantic seeded the academic term in 2021 (“wrong but feels true”). The Guardian, Fortune, Forbes, BI, Fast Company all ran 2024-25 explainers. Ed Zitron (50K+ Substack subs, “Where's Your Ed At”) coined Habsburg AI — generic slop from models that have inbred on their own outputs.

The CEO of OpenAI publicly confirmed the shift in September 2025. The co-founder of Reddit confirmed it in October. The Internet Archive's storage costs spiked because hyperscalers buying drives for AI training pushed 30TB HDDs to 3× normal price.

The recursive corpus has begun. Agents read pages written by agents that read pages written by agents. The original signal — verified, human, grounded — is the rare resource. The slop is the default.

closing

The corpus is the system. The system is degrading.

The slop tide is the most quantifiable of the three threats. The numbers exist and they are stark. Three-quarters of new pages. Three thousand content farms. A 1/1000 synthetic-data threshold that is now being crossed at every scale.

The defence cannot be technical alone. You cannot scale your way out of training on your own output. The only stable equilibrium requires preserving a verified human anchor set — which threat 2 is actively shrinking.

The three threats are not separate. They are the same crisis. See the flywheel on the hub.

continue the section

↑ hub

When Agents Read

The unified frame and the flywheel.

← threat 1

Poisoned Pages

Adversarial inputs. Prompt injection.

← threat 2

Withheld Knowledge

Why creators stop sharing. The AI tax.

the fix →

What needs to be built

C2PA, SynthID, witness-lattice retrieval.