Personal Knowledge Base With the LLM Wiki Pattern

AI

2026-05-01

Personal Knowledge Base With the LLM Wiki Pattern¶

I read a lot of articles, papers, and gists. The way I keep them has always been a mess: tabs that pile up until I close them in batches, PDFs in Downloads, the occasional Obsidian note I never revisit. When I want to remember a specific thing weeks later ("wait, which post talked about X?"), I'm searching from scratch every time.

NotebookLM and ChatGPT file uploads sort of solve this, but not really. They retrieve. They don't compound. Ask the same question twice and the LLM rediscovers the same fragments from the same documents. Nothing accumulates.

Andrej Karpathy posted a gist called "LLM Wiki" with a different framing. Instead of querying raw documents on demand, have the LLM incrementally build and maintain a markdown wiki that sits between you and the sources. When a new article goes in, the LLM doesn't just index it for retrieval. It reads it, updates entity pages, revises summaries, flags contradictions with what's already there. The wiki is the persistent artifact. The raw sources are the input.

I set this up as brain/ and have been running it for a couple of weeks. Here's how it works.

The three layers¶

Karpathy describes three layers in the gist, and the directory layout maps directly to them:

brain/
├── CLAUDE.md          ← schema (rules)
├── index.md           ← catalog of all wiki pages
├── log.md             ← append-only operation log
├── raw/               ← source documents (immutable)
└── wiki/
    ├── overview.md
    ├── sources/       ← one summary per raw source
    ├── entities/      ← people, orgs, projects
    ├── concepts/      ← ideas, frameworks, terms
    └── topics/        ← thematic syntheses

raw/ is mine. I drop articles, papers, and fetched URLs there and never edit them. wiki/ is Claude's. I never touch it manually. CLAUDE.md is the contract between us: directory layout, frontmatter format, link conventions, and the operations I expect Claude to perform. We co-evolved that file as I figured out what worked.

Every wiki page carries YAML frontmatter:

---
title: Page Title
type: entity | concept | topic | source | overview
tags: [tag1, tag2]
sources: [filename-in-raw.md]
updated: 2026-04-30
---

Internal links use Obsidian's [[Page Title]] format so the graph view works without extra configuration. Contradictions are marked inline as blockquotes (> Contradicts: [[Other Page]]). Uncertainty is [?]. None of this is novel. It's convention codified once so Claude follows it consistently across sessions.

Three operations¶

Everything I do with the brain falls into one of three commands.

`ingest <source>`¶

I drop a file in raw/, paste a URL, or paste raw text, and tell Claude ingest <thing>. It reads the source, has 1-3 clarifying exchanges with me about emphasis, then:

Creates wiki/sources/<slug>.md with a structured summary
Creates or updates entity pages for every person, project, or company mentioned
Creates or updates concept pages for significant ideas
Creates or updates relevant topic pages
Updates wiki/overview.md if the source shifts the bigger picture
Updates index.md
Appends an entry to log.md

A single ingest typically touches 5-15 pages. I read the diff in Obsidian as Claude writes, redirecting if a summary misses the point.

`query <question>`¶

I ask anything. Claude reads index.md first to find relevant pages (no embedding-based retrieval. The index is the retrieval), reads them, and synthesizes an answer with [[wikilink]] citations. If the answer is non-trivial (a comparison, an analysis, a new connection), Claude offers to file it as a new wiki/topics/ page. That part is the most useful, because my own questions compound back into the wiki the same way ingested sources do.

`lint`¶

Periodic health check. Claude looks for contradictions across pages, orphans with no inbound links, concepts mentioned but lacking their own page, missing cross-references, and claims superseded by newer sources. It reports a numbered list. I approve, deny, or amend each item. Then it applies the changes.

How I actually use it¶

The loop is small:

Read something good in the morning. ingest <url>. Spend 30 seconds reading what Claude wrote.
Hit a question during the day. query <question>. Get an answer with citations from things I've actually read, not generic LLM output.
Every 10 ingests or so, lint.

Two practical things make this work better than I expected.

The session start protocol. When I open Claude Code in brain/, the first thing it does is read index.md and log.md. I get back a one-liner: "Brain loaded. Last op: ingest adr.github.io sub-pages + Nygard 2011." No context-rebuilding tax. The wiki itself is the context.

Obsidian on the side. Claude writes; I read in Obsidian. Graph view shows hubs visually, and clicking through [[links]] is faster than scrolling files. Pages with many inbound links are the knowledge centers. Orphans are usually a sign the wiki needs a topic page that connects them.

Why this works at all¶

Wikis fail when humans run them because the maintenance cost grows faster than the value. Updating cross-references, keeping summaries current, flagging contradictions, normalizing terminology: nobody wants to do this. It's the same reason most personal Obsidian vaults rot.

LLMs don't get bored. Touching 15 files in one pass to keep them consistent is exactly the work they're good at. The cost of maintenance is close to zero, so the wiki stays maintained, so the value compounds.

Karpathy frames it as "the wiki is the codebase, the LLM is the programmer, Obsidian is the IDE." That's accurate. My job is sourcing and asking good questions. Claude does the rest.

What's actually in mine right now¶

A few weeks in: 5 sources ingested, around a dozen wiki pages. Topics are unfocused on purpose. ADRs, the LLM Wiki pattern itself, a handful of conceptual entries that came from the same articles. I'll probably split it later if it grows in a specific direction. The cost of starting general is low because Claude can refactor the structure on demand, which is another thing a human wouldn't do but an LLM doesn't mind.

The setup is also just a git repo of markdown. Version history, branching, and diffing all work. If I lose the LLM tomorrow, I still have the artifact.

Sources¶

LLM Wiki gist by Andrej Karpathy: the pattern this is based on
Vannevar Bush's Memex (1945): a personal, curated knowledge store with associative trails between documents, the spiritual ancestor of this idea