The Slop Tide.
When AI reads itself. Three-quarters of newly created webpages contain AI text. Model collapse science says even 1/1000 synthetic content triggers irreversible degradation. The corpus is becoming the system.
What fills the gap when creators withhold? AI slop. Nine measurements, four pieces of model-collapse science, six secondary signals. The 2024-2026 evidence base.
- 0174.2% of newly created webpages contain AI text (Ahrefs, April 2025). 2.5% pure AI, 71.7% mixed. The substrate is now hybrid by default.
- 023,006 AI content farms tracked by NewsGuard across 16 languages (May 2026). Up from 125 in May 2023 — 24× in three years.
- 03Model collapse is real and proven. Shumailov Nature 2024 + Strong Model Collapse ICLR 2025: 1/1000 synthetic-data fraction is enough to trigger irreversible degradation.
- 04The dead internet has gone mainstream. Sam Altman: 'really a lot of LLM-run Twitter accounts now.' Alexis Ohanian: 'so much of the internet is now just dead.'
- 05Mitigation requires anchor sets. Gerstgrasser et al.: accumulating real + synthetic prevents collapse. But the real content (threat 2) is moving behind walls.
Nine numbers tell the story.
Model collapse is no longer theoretical.
Four papers establish that recursively trained models degrade irreversibly. The 2024-2025 results moved this from speculation to active empirical concern at every frontier lab.
Foundational result. Recursive training on model output causes irreversible distribution collapse in LLMs, VAEs, GMMs. Perplexity rose 20-28 points across generations.
Even a 1/1000 synthetic-data fraction triggers collapse. Scale does not save you. This is the load-bearing 2025 result.
Factual accuracy degrades while fluency persists — 'confidently wrong' outputs. Quality of explanation falls before quality of prose.
Accumulating (not replacing) real+synthetic data prevents collapse. The 'anchor set' must be preserved. Implication: every retrain needs 25-30% verified human content.
Six signals from beyond the labs.
Pizza glue traced to 11-year-old Reddit joke post. Rocks from Onion satire. Classic RAG ungrounded retrieval. Triggered a feedback loop: news coverage about the bug became a source, causing AI Overviews to repeat similar errors.
Chatbots repeat false claims 35% of the time (Aug 2025), up from ~18% (Aug 2024). Non-response rate fell from 31% to 0%. Models now answer more — and answer worse.
'Really a lot of LLM-run Twitter accounts now.' First major frontier-lab CEO to publicly confirm the dead-internet shift.
'So much of the internet is now just dead.' Reddit co-founder explicitly using the dead-internet vocabulary.
AI Reel viewed 362M times — more than every 404 Media article combined, multiplied tens of times. AI slop mentions: 461K (2024) → 2.4M (Nov 2025) — 9× increase.
Retraction Watch: 46.3% of 325 AI-related retractions in 2023, 22.7% in 2024. Tell-tale phrases ('As of my last knowledge update', 'regenerate response') caught dozens of papers. Only 24% of top publishers have explicit GenAI guidelines.
From fringe theory to industry consensus.
The Dead Internet theory was a half-joke conspiracy of the early 2020s — “most of the internet is bots talking to bots.” In 2024-2026 it crossed over.
The Atlantic seeded the academic term in 2021 (“wrong but feels true”). The Guardian, Fortune, Forbes, BI, Fast Company all ran 2024-25 explainers. Ed Zitron (50K+ Substack subs, “Where's Your Ed At”) coined Habsburg AI — generic slop from models that have inbred on their own outputs.
The CEO of OpenAI publicly confirmed the shift in September 2025. The co-founder of Reddit confirmed it in October. The Internet Archive's storage costs spiked because hyperscalers buying drives for AI training pushed 30TB HDDs to 3× normal price.
The recursive corpus has begun. Agents read pages written by agents that read pages written by agents. The original signal — verified, human, grounded — is the rare resource. The slop is the default.
The corpus is the system. The system is degrading.
The slop tide is the most quantifiable of the three threats. The numbers exist and they are stark. Three-quarters of new pages. Three thousand content farms. A 1/1000 synthetic-data threshold that is now being crossed at every scale.
The defence cannot be technical alone. You cannot scale your way out of training on your own output. The only stable equilibrium requires preserving a verified human anchor set — which threat 2 is actively shrinking.
The three threats are not separate. They are the same crisis. See the flywheel on the hub.