What is Karpathy's LLM Wiki?

The LLM Wiki is a pattern proposed by Andrej Karpathy where an LLM incrementally builds and maintains a persistent, interlinked wiki from raw source documents. Instead of using RAG to re-derive knowledge every query, the wiki compiles knowledge once and keeps it current.

How is the LLM Wiki different from RAG?

RAG retrieves chunks from raw documents on every query and re-derives knowledge from scratch. The LLM Wiki compiles knowledge once into structured markdown pages with cross-references. Knowledge accumulates over time instead of being forgotten between queries.

What tools do you need for an LLM Wiki?

Karpathy recommends Obsidian as the markdown vault, Obsidian Web Clipper for capturing web articles, Claude Code or OpenAI Codex as the LLM agent, and optionally qmd for search, Marp for slides, and Dataview for queries.

What is the three-layer architecture of the LLM Wiki?

The three layers are: Raw Sources (immutable source documents), The Wiki (LLM-generated markdown pages with cross-references), and The Schema (a CLAUDE.md or AGENTS.md file that tells the LLM how to maintain the wiki).

Concept Explainer

The LLM Wiki

A pattern for building personal knowledge bases using LLMs.

TL;DR

Instead of RAG — where every query re-derives knowledge from raw chunks — the LLM incrementally builds a persistent wiki. When you add a source, the LLM reads it, extracts information, and integrates it into existing pages. Cross-references are built. Contradictions are flagged. Knowledge compounds over time. You never write the wiki yourself — the LLM writes and maintains all of it.

Read Karpathy's original gist

The Core Insight

How RAG Works

Every query starts from scratch — the LLM has no memory of previous questions or synthesis
Documents are split into chunks and stored as embeddings — no structure, no relationships
If two documents contradict each other, the system has no way to detect or flag it
Ask a question requiring synthesis across 5 documents — the LLM must find and piece together fragments every time
Nothing accumulates. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

How LLM Wiki Works

Knowledge is compiled once into structured markdown pages, then kept current — not re-derived
When you add a new source, the LLM updates entity pages, revises summaries, and notes where new data contradicts old claims
Cross-references are already built. Contradictions are already flagged. Synthesis reflects everything you’ve read.
Good answers get filed back into the wiki — your explorations compound just like ingested sources
The wiki is a persistent, compounding artifact. It keeps getting richer with every source and every question.

Side-by-Side Comparison

Dimension	RAG	LLM Wiki
Knowledge state	Re-derived every query from raw chunks	Compiled once into structured pages, kept current
Cross-references	None — chunks are isolated vectors	Built automatically when new sources arrive
Contradictions	Go undetected until a user notices	Flagged during ingestion by the LLM
Maintenance cost	Zero effort, but zero value accumulation	Zero effort, but compounding value over time
Best scale	Any number of documents	Sweet spot ~100–500 sources with deep synthesis
Query quality	Depends on chunk retrieval quality	Benefits from pre-built synthesis and connections
Human role	Upload documents, ask questions	Curate sources, direct analysis, ask good questions

Architecture: Three Layers

Every LLM Wiki has the same three-layer structure. The LLM bridges them.

The Schema

A document — CLAUDE.md for Claude Code, AGENTS.md for Codex — that tells the LLM how the wiki is structured. It defines the conventions, page formats, and workflows the LLM should follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file: it’s what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

The Wiki

A directory of LLM-generated markdown files: summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely — it creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it. Two special files help navigation: index.md is a content catalog (each page with a link and one-line summary), and log.md is a chronological record of what happened and when.

Raw Sources

Your curated collection of source documents: articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth. Use Obsidian Web Clipper (a browser extension) to convert web articles to markdown and drop them into your raw collection. Optionally download images locally so the LLM can reference them directly.

Three Operations

Everything you do with an LLM Wiki falls into one of three categories.

Ingest

Add a source and watch the wiki grow

1.Drop a new source into the raw collection

2.Tell the LLM to process it

3.LLM reads the source and discusses key takeaways with you

4.Writes a summary page in the wiki

5.Updates the index and relevant entity/concept pages

6.Appends an entry to the log — a single source might touch 10–15 pages

Query

Ask questions against compiled knowledge

1.Ask a question against the wiki

2.LLM reads index.md to find relevant pages

3.Drills into those pages and synthesizes an answer

4.Answer can take different forms: markdown page, comparison table, chart, slide deck

5.Good answers get filed back into the wiki as new pages — explorations compound

Lint

Health-check and improve over time

1.Ask the LLM to health-check the wiki periodically

2.Find contradictions between pages

3.Detect stale claims that newer sources have superseded

4.Identify orphan pages with no inbound links or missing cross-references

5.Suggest new questions to investigate and new sources to look for

What Can You Build?

Click any card to expand the full use case.

Why This Actually Works

maintenance burden

LLMs don’t get bored, don’t forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

pages updated per source

A single ingested source ripples through the entire knowledge base — updating summaries, revising entity pages, strengthening or challenging the evolving synthesis.

cross-references maintained

Every connection between pages is checked and updated. No orphans, no stale links, no forgotten relationships.

“
The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.
— Andrej Karpathy

From Memex to LLM Wiki

1945

The Memex

Vannevar Bush envisioned a personal, curated knowledge store with associative trails between documents. His vision was closer to this than to what the web became: private, actively curated, with the connections as valuable as the documents.

2023

RAG

Retrieval-Augmented Generation. Upload documents, retrieve chunks, generate answers. Works, but no accumulation. Knowledge is re-derived every time.

2026

LLM Wiki

The LLM incrementally builds and maintains a persistent wiki. The part Bush couldn’t solve — who does the maintenance — the LLM handles.

gentic.news: An LLM Wiki at Scale

We built an LLM Wiki before the term existed. Here is how gentic.news maps to Karpathy's three-layer architecture.

Raw Sources→89+ RSS feeds, Twitter/X, Reddit, YouTube, arXiv

The Wiki→Knowledge Graph: 4,300+ entities, 4,200+ relationships, 3,800+ articles

The Schema→Living Agent: 23 cycle types, 8,200+ lines, runs every 60 minutes

Concrete example

When OpenAI raises $122B at $852B valuation, the Living Agent: updates OpenAI's entity page, creates a timeline event, adjusts competitive relationships with Anthropic and Google, checks 3 active predictions for new evidence, and updates the weekly intelligence briefing. All automatically. One source touches 10+ wiki pages — exactly the pattern Karpathy describes.

The Toolkit

Karpathy's recommended stack. All open-source or freely available.

💎

Obsidian

Local-first markdown vault. Pages are plain .md files — searchable, portable, no lock-in. Graph view maps your knowledge network.

📎

Web Clipper

Browser extension that captures web pages as clean markdown. Strips ads and boilerplate for clean source documents.

⚡

qmd

CLI tool converting PDF, DOCX, EPUB, HTML into LLM-ready markdown. Preserves headings, tables, and code blocks.

🎥

Marp

Converts wiki markdown directly into presentation slide decks. Wiki pages become polished slides in seconds.

🔍

Dataview

Obsidian plugin for querying your wiki like a database. SQL-like power for dynamic tables and cross-page summaries.

🤖

Claude + CLAUDE.md

The LLM engine with schema-driven maintenance. Claude reads CLAUDE.md for conventions, then follows them for ingest, query, and lint.

Start Building Your LLM Wiki

Fork the idea. Adapt the pattern. The tools are free, the concept is open, and the LLM does the heavy lifting. Start with one topic you care about, add a few sources, and watch the wiki grow.

Read the Original Gist Explore gentic.news Intelligence

Built by gentic.news — the AI intelligence platform that lives this pattern every day.