461 lines
17 KiB
Markdown
461 lines
17 KiB
Markdown
|
|
---
|
||
|
|
name: llm-wiki
|
||
|
|
description: "Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency."
|
||
|
|
version: 2.0.0
|
||
|
|
author: Hermes Agent
|
||
|
|
license: MIT
|
||
|
|
metadata:
|
||
|
|
hermes:
|
||
|
|
tags: [wiki, knowledge-base, research, notes, markdown, rag-alternative]
|
||
|
|
category: research
|
||
|
|
related_skills: [obsidian, arxiv, agentic-research-ideas]
|
||
|
|
config:
|
||
|
|
- key: wiki.path
|
||
|
|
description: Path to the LLM Wiki knowledge base directory
|
||
|
|
default: "~/wiki"
|
||
|
|
prompt: Wiki directory path
|
||
|
|
---
|
||
|
|
|
||
|
|
# Karpathy's LLM Wiki
|
||
|
|
|
||
|
|
Build and maintain a persistent, compounding knowledge base as interlinked markdown files.
|
||
|
|
Based on [Andrej Karpathy's LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).
|
||
|
|
|
||
|
|
Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki
|
||
|
|
compiles knowledge once and keeps it current. Cross-references are already there.
|
||
|
|
Contradictions have already been flagged. Synthesis reflects everything ingested.
|
||
|
|
|
||
|
|
**Division of labor:** The human curates sources and directs analysis. The agent
|
||
|
|
summarizes, cross-references, files, and maintains consistency.
|
||
|
|
|
||
|
|
## When This Skill Activates
|
||
|
|
|
||
|
|
Use this skill when the user:
|
||
|
|
- Asks to create, build, or start a wiki or knowledge base
|
||
|
|
- Asks to ingest, add, or process a source into their wiki
|
||
|
|
- Asks a question and an existing wiki is present at the configured path
|
||
|
|
- Asks to lint, audit, or health-check their wiki
|
||
|
|
- References their wiki, knowledge base, or "notes" in a research context
|
||
|
|
|
||
|
|
## Wiki Location
|
||
|
|
|
||
|
|
Configured via `skills.config.wiki.path` in `~/.hermes/config.yaml` (prompted
|
||
|
|
during `hermes config migrate` or `hermes setup`):
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
skills:
|
||
|
|
config:
|
||
|
|
wiki:
|
||
|
|
path: ~/wiki
|
||
|
|
```
|
||
|
|
|
||
|
|
Falls back to `~/wiki` default. The resolved path is injected when this
|
||
|
|
skill loads — check the `[Skill config: ...]` block above for the active value.
|
||
|
|
|
||
|
|
The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or
|
||
|
|
any editor. No database, no special tooling required.
|
||
|
|
|
||
|
|
## Architecture: Three Layers
|
||
|
|
|
||
|
|
```
|
||
|
|
wiki/
|
||
|
|
├── SCHEMA.md # Conventions, structure rules, domain config
|
||
|
|
├── index.md # Sectioned content catalog with one-line summaries
|
||
|
|
├── log.md # Chronological action log (append-only, rotated yearly)
|
||
|
|
├── raw/ # Layer 1: Immutable source material
|
||
|
|
│ ├── articles/ # Web articles, clippings
|
||
|
|
│ ├── papers/ # PDFs, arxiv papers
|
||
|
|
│ ├── transcripts/ # Meeting notes, interviews
|
||
|
|
│ └── assets/ # Images, diagrams referenced by sources
|
||
|
|
├── entities/ # Layer 2: Entity pages (people, orgs, products, models)
|
||
|
|
├── concepts/ # Layer 2: Concept/topic pages
|
||
|
|
├── comparisons/ # Layer 2: Side-by-side analyses
|
||
|
|
└── queries/ # Layer 2: Filed query results worth keeping
|
||
|
|
```
|
||
|
|
|
||
|
|
**Layer 1 — Raw Sources:** Immutable. The agent reads but never modifies these.
|
||
|
|
**Layer 2 — The Wiki:** Agent-owned markdown files. Created, updated, and
|
||
|
|
cross-referenced by the agent.
|
||
|
|
**Layer 3 — The Schema:** `SCHEMA.md` defines structure, conventions, and tag taxonomy.
|
||
|
|
|
||
|
|
## Resuming an Existing Wiki (CRITICAL — do this every session)
|
||
|
|
|
||
|
|
When the user has an existing wiki, **always orient yourself before doing anything**:
|
||
|
|
|
||
|
|
① **Read `SCHEMA.md`** — understand the domain, conventions, and tag taxonomy.
|
||
|
|
② **Read `index.md`** — learn what pages exist and their summaries.
|
||
|
|
③ **Scan recent `log.md`** — read the last 20-30 entries to understand recent activity.
|
||
|
|
|
||
|
|
```bash
|
||
|
|
WIKI="${wiki_path:-$HOME/wiki}"
|
||
|
|
# Orientation reads at session start
|
||
|
|
read_file "$WIKI/SCHEMA.md"
|
||
|
|
read_file "$WIKI/index.md"
|
||
|
|
read_file "$WIKI/log.md" offset=<last 30 lines>
|
||
|
|
```
|
||
|
|
|
||
|
|
Only after orientation should you ingest, query, or lint. This prevents:
|
||
|
|
- Creating duplicate pages for entities that already exist
|
||
|
|
- Missing cross-references to existing content
|
||
|
|
- Contradicting the schema's conventions
|
||
|
|
- Repeating work already logged
|
||
|
|
|
||
|
|
For large wikis (100+ pages), also run a quick `search_files` for the topic
|
||
|
|
at hand before creating anything new.
|
||
|
|
|
||
|
|
## Initializing a New Wiki
|
||
|
|
|
||
|
|
When the user asks to create or start a wiki:
|
||
|
|
|
||
|
|
1. Determine the wiki path (from config, env var, or ask the user; default `~/wiki`)
|
||
|
|
2. Create the directory structure above
|
||
|
|
3. Ask the user what domain the wiki covers — be specific
|
||
|
|
4. Write `SCHEMA.md` customized to the domain (see template below)
|
||
|
|
5. Write initial `index.md` with sectioned header
|
||
|
|
6. Write initial `log.md` with creation entry
|
||
|
|
7. Confirm the wiki is ready and suggest first sources to ingest
|
||
|
|
|
||
|
|
### SCHEMA.md Template
|
||
|
|
|
||
|
|
Adapt to the user's domain. The schema constrains agent behavior and ensures consistency:
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
# Wiki Schema
|
||
|
|
|
||
|
|
## Domain
|
||
|
|
[What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"]
|
||
|
|
|
||
|
|
## Conventions
|
||
|
|
- File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`)
|
||
|
|
- Every wiki page starts with YAML frontmatter (see below)
|
||
|
|
- Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page)
|
||
|
|
- When updating a page, always bump the `updated` date
|
||
|
|
- Every new page must be added to `index.md` under the correct section
|
||
|
|
- Every action must be appended to `log.md`
|
||
|
|
|
||
|
|
## Frontmatter
|
||
|
|
```yaml
|
||
|
|
---
|
||
|
|
title: Page Title
|
||
|
|
created: YYYY-MM-DD
|
||
|
|
updated: YYYY-MM-DD
|
||
|
|
type: entity | concept | comparison | query | summary
|
||
|
|
tags: [from taxonomy below]
|
||
|
|
sources: [raw/articles/source-name.md]
|
||
|
|
---
|
||
|
|
```
|
||
|
|
|
||
|
|
## Tag Taxonomy
|
||
|
|
[Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.]
|
||
|
|
|
||
|
|
Example for AI/ML:
|
||
|
|
- Models: model, architecture, benchmark, training
|
||
|
|
- People/Orgs: person, company, lab, open-source
|
||
|
|
- Techniques: optimization, fine-tuning, inference, alignment, data
|
||
|
|
- Meta: comparison, timeline, controversy, prediction
|
||
|
|
|
||
|
|
Rule: every tag on a page must appear in this taxonomy. If a new tag is needed,
|
||
|
|
add it here first, then use it. This prevents tag sprawl.
|
||
|
|
|
||
|
|
## Page Thresholds
|
||
|
|
- **Create a page** when an entity/concept appears in 2+ sources OR is central to one source
|
||
|
|
- **Add to existing page** when a source mentions something already covered
|
||
|
|
- **DON'T create a page** for passing mentions, minor details, or things outside the domain
|
||
|
|
- **Split a page** when it exceeds ~200 lines — break into sub-topics with cross-links
|
||
|
|
- **Archive a page** when its content is fully superseded — move to `_archive/`, remove from index
|
||
|
|
|
||
|
|
## Entity Pages
|
||
|
|
One page per notable entity. Include:
|
||
|
|
- Overview / what it is
|
||
|
|
- Key facts and dates
|
||
|
|
- Relationships to other entities ([[wikilinks]])
|
||
|
|
- Source references
|
||
|
|
|
||
|
|
## Concept Pages
|
||
|
|
One page per concept or topic. Include:
|
||
|
|
- Definition / explanation
|
||
|
|
- Current state of knowledge
|
||
|
|
- Open questions or debates
|
||
|
|
- Related concepts ([[wikilinks]])
|
||
|
|
|
||
|
|
## Comparison Pages
|
||
|
|
Side-by-side analyses. Include:
|
||
|
|
- What is being compared and why
|
||
|
|
- Dimensions of comparison (table format preferred)
|
||
|
|
- Verdict or synthesis
|
||
|
|
- Sources
|
||
|
|
|
||
|
|
## Update Policy
|
||
|
|
When new information conflicts with existing content:
|
||
|
|
1. Check the dates — newer sources generally supersede older ones
|
||
|
|
2. If genuinely contradictory, note both positions with dates and sources
|
||
|
|
3. Mark the contradiction in frontmatter: `contradictions: [page-name]`
|
||
|
|
4. Flag for user review in the lint report
|
||
|
|
```
|
||
|
|
|
||
|
|
### index.md Template
|
||
|
|
|
||
|
|
The index is sectioned by type. Each entry is one line: wikilink + summary.
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
# Wiki Index
|
||
|
|
|
||
|
|
> Content catalog. Every wiki page listed under its type with a one-line summary.
|
||
|
|
> Read this first to find relevant pages for any query.
|
||
|
|
> Last updated: YYYY-MM-DD | Total pages: N
|
||
|
|
|
||
|
|
## Entities
|
||
|
|
<!-- Alphabetical within section -->
|
||
|
|
|
||
|
|
## Concepts
|
||
|
|
|
||
|
|
## Comparisons
|
||
|
|
|
||
|
|
## Queries
|
||
|
|
```
|
||
|
|
|
||
|
|
**Scaling rule:** When any section exceeds 50 entries, split it into sub-sections
|
||
|
|
by first letter or sub-domain. When the index exceeds 200 entries total, create
|
||
|
|
a `_meta/topic-map.md` that groups pages by theme for faster navigation.
|
||
|
|
|
||
|
|
### log.md Template
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
# Wiki Log
|
||
|
|
|
||
|
|
> Chronological record of all wiki actions. Append-only.
|
||
|
|
> Format: `## [YYYY-MM-DD] action | subject`
|
||
|
|
> Actions: ingest, update, query, lint, create, archive, delete
|
||
|
|
> When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh.
|
||
|
|
|
||
|
|
## [YYYY-MM-DD] create | Wiki initialized
|
||
|
|
- Domain: [domain]
|
||
|
|
- Structure created with SCHEMA.md, index.md, log.md
|
||
|
|
```
|
||
|
|
|
||
|
|
## Core Operations
|
||
|
|
|
||
|
|
### 1. Ingest
|
||
|
|
|
||
|
|
When the user provides a source (URL, file, paste), integrate it into the wiki:
|
||
|
|
|
||
|
|
① **Capture the raw source:**
|
||
|
|
- URL → use `web_extract` to get markdown, save to `raw/articles/`
|
||
|
|
- PDF → use `web_extract` (handles PDFs), save to `raw/papers/`
|
||
|
|
- Pasted text → save to appropriate `raw/` subdirectory
|
||
|
|
- Name the file descriptively: `raw/articles/karpathy-llm-wiki-2026.md`
|
||
|
|
|
||
|
|
② **Discuss takeaways** with the user — what's interesting, what matters for
|
||
|
|
the domain. (Skip this in automated/cron contexts — proceed directly.)
|
||
|
|
|
||
|
|
③ **Check what already exists** — search index.md and use `search_files` to find
|
||
|
|
existing pages for mentioned entities/concepts. This is the difference between
|
||
|
|
a growing wiki and a pile of duplicates.
|
||
|
|
|
||
|
|
④ **Write or update wiki pages:**
|
||
|
|
- **New entities/concepts:** Create pages only if they meet the Page Thresholds
|
||
|
|
in SCHEMA.md (2+ source mentions, or central to one source)
|
||
|
|
- **Existing pages:** Add new information, update facts, bump `updated` date.
|
||
|
|
When new info contradicts existing content, follow the Update Policy.
|
||
|
|
- **Cross-reference:** Every new or updated page must link to at least 2 other
|
||
|
|
pages via `[[wikilinks]]`. Check that existing pages link back.
|
||
|
|
- **Tags:** Only use tags from the taxonomy in SCHEMA.md
|
||
|
|
|
||
|
|
⑤ **Update navigation:**
|
||
|
|
- Add new pages to `index.md` under the correct section, alphabetically
|
||
|
|
- Update the "Total pages" count and "Last updated" date in index header
|
||
|
|
- Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Title`
|
||
|
|
- List every file created or updated in the log entry
|
||
|
|
|
||
|
|
⑥ **Report what changed** — list every file created or updated to the user.
|
||
|
|
|
||
|
|
A single source can trigger updates across 5-15 wiki pages. This is normal
|
||
|
|
and desired — it's the compounding effect.
|
||
|
|
|
||
|
|
### 2. Query
|
||
|
|
|
||
|
|
When the user asks a question about the wiki's domain:
|
||
|
|
|
||
|
|
① **Read `index.md`** to identify relevant pages.
|
||
|
|
② **For wikis with 100+ pages**, also `search_files` across all `.md` files
|
||
|
|
for key terms — the index alone may miss relevant content.
|
||
|
|
③ **Read the relevant pages** using `read_file`.
|
||
|
|
④ **Synthesize an answer** from the compiled knowledge. Cite the wiki pages
|
||
|
|
you drew from: "Based on [[page-a]] and [[page-b]]..."
|
||
|
|
⑤ **File valuable answers back** — if the answer is a substantial comparison,
|
||
|
|
deep dive, or novel synthesis, create a page in `queries/` or `comparisons/`.
|
||
|
|
Don't file trivial lookups — only answers that would be painful to re-derive.
|
||
|
|
⑥ **Update log.md** with the query and whether it was filed.
|
||
|
|
|
||
|
|
### 3. Lint
|
||
|
|
|
||
|
|
When the user asks to lint, health-check, or audit the wiki:
|
||
|
|
|
||
|
|
① **Orphan pages:** Find pages with no inbound `[[wikilinks]]` from other pages.
|
||
|
|
```python
|
||
|
|
# Use execute_code for this — programmatic scan across all wiki pages
|
||
|
|
import os, re
|
||
|
|
from collections import defaultdict
|
||
|
|
wiki = "<WIKI_PATH>"
|
||
|
|
# Scan all .md files in entities/, concepts/, comparisons/, queries/
|
||
|
|
# Extract all [[wikilinks]] — build inbound link map
|
||
|
|
# Pages with zero inbound links are orphans
|
||
|
|
```
|
||
|
|
|
||
|
|
② **Broken wikilinks:** Find `[[links]]` that point to pages that don't exist.
|
||
|
|
|
||
|
|
③ **Index completeness:** Every wiki page should appear in `index.md`. Compare
|
||
|
|
the filesystem against index entries.
|
||
|
|
|
||
|
|
④ **Frontmatter validation:** Every wiki page must have all required fields
|
||
|
|
(title, created, updated, type, tags, sources). Tags must be in the taxonomy.
|
||
|
|
|
||
|
|
⑤ **Stale content:** Pages whose `updated` date is >90 days older than the most
|
||
|
|
recent source that mentions the same entities.
|
||
|
|
|
||
|
|
⑥ **Contradictions:** Pages on the same topic with conflicting claims. Look for
|
||
|
|
pages that share tags/entities but state different facts.
|
||
|
|
|
||
|
|
⑦ **Page size:** Flag pages over 200 lines — candidates for splitting.
|
||
|
|
|
||
|
|
⑧ **Tag audit:** List all tags in use, flag any not in the SCHEMA.md taxonomy.
|
||
|
|
|
||
|
|
⑨ **Log rotation:** If log.md exceeds 500 entries, rotate it.
|
||
|
|
|
||
|
|
⑩ **Report findings** with specific file paths and suggested actions, grouped by
|
||
|
|
severity (broken links > orphans > stale content > style issues).
|
||
|
|
|
||
|
|
⑪ **Append to log.md:** `## [YYYY-MM-DD] lint | N issues found`
|
||
|
|
|
||
|
|
## Working with the Wiki
|
||
|
|
|
||
|
|
### Searching
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Find pages by content
|
||
|
|
search_files "transformer" path="$WIKI" file_glob="*.md"
|
||
|
|
|
||
|
|
# Find pages by filename
|
||
|
|
search_files "*.md" target="files" path="$WIKI"
|
||
|
|
|
||
|
|
# Find pages by tag
|
||
|
|
search_files "tags:.*alignment" path="$WIKI" file_glob="*.md"
|
||
|
|
|
||
|
|
# Recent activity
|
||
|
|
read_file "$WIKI/log.md" offset=<last 20 lines>
|
||
|
|
```
|
||
|
|
|
||
|
|
### Bulk Ingest
|
||
|
|
|
||
|
|
When ingesting multiple sources at once, batch the updates:
|
||
|
|
1. Read all sources first
|
||
|
|
2. Identify all entities and concepts across all sources
|
||
|
|
3. Check existing pages for all of them (one search pass, not N)
|
||
|
|
4. Create/update pages in one pass (avoids redundant updates)
|
||
|
|
5. Update index.md once at the end
|
||
|
|
6. Write a single log entry covering the batch
|
||
|
|
|
||
|
|
### Archiving
|
||
|
|
|
||
|
|
When content is fully superseded or the domain scope changes:
|
||
|
|
1. Create `_archive/` directory if it doesn't exist
|
||
|
|
2. Move the page to `_archive/` with its original path (e.g., `_archive/entities/old-page.md`)
|
||
|
|
3. Remove from `index.md`
|
||
|
|
4. Update any pages that linked to it — replace wikilink with plain text + "(archived)"
|
||
|
|
5. Log the archive action
|
||
|
|
|
||
|
|
### Obsidian Integration
|
||
|
|
|
||
|
|
The wiki directory works as an Obsidian vault out of the box:
|
||
|
|
- `[[wikilinks]]` render as clickable links
|
||
|
|
- Graph View visualizes the knowledge network
|
||
|
|
- YAML frontmatter powers Dataview queries
|
||
|
|
- The `raw/assets/` folder holds images referenced via `![[image.png]]`
|
||
|
|
|
||
|
|
For best results:
|
||
|
|
- Set Obsidian's attachment folder to `raw/assets/`
|
||
|
|
- Enable "Wikilinks" in Obsidian settings (usually on by default)
|
||
|
|
- Install Dataview plugin for queries like `TABLE tags FROM "entities" WHERE contains(tags, "company")`
|
||
|
|
|
||
|
|
If using the Obsidian skill alongside this one, set `OBSIDIAN_VAULT_PATH` to the
|
||
|
|
same directory as the wiki path.
|
||
|
|
|
||
|
|
### Obsidian Headless (servers and headless machines)
|
||
|
|
|
||
|
|
On machines without a display, use `obsidian-headless` instead of the desktop app.
|
||
|
|
It syncs vaults via Obsidian Sync without a GUI — perfect for agents running on
|
||
|
|
servers that write to the wiki while Obsidian desktop reads it on another device.
|
||
|
|
|
||
|
|
**Setup:**
|
||
|
|
```bash
|
||
|
|
# Requires Node.js 22+
|
||
|
|
npm install -g obsidian-headless
|
||
|
|
|
||
|
|
# Login (requires Obsidian account with Sync subscription)
|
||
|
|
ob login --email <email> --password '<password>'
|
||
|
|
|
||
|
|
# Create a remote vault for the wiki
|
||
|
|
ob sync-create-remote --name "LLM Wiki"
|
||
|
|
|
||
|
|
# Connect the wiki directory to the vault
|
||
|
|
cd ~/wiki
|
||
|
|
ob sync-setup --vault "<vault-id>"
|
||
|
|
|
||
|
|
# Initial sync
|
||
|
|
ob sync
|
||
|
|
|
||
|
|
# Continuous sync (foreground — use systemd for background)
|
||
|
|
ob sync --continuous
|
||
|
|
```
|
||
|
|
|
||
|
|
**Continuous background sync via systemd:**
|
||
|
|
```ini
|
||
|
|
# ~/.config/systemd/user/obsidian-wiki-sync.service
|
||
|
|
[Unit]
|
||
|
|
Description=Obsidian LLM Wiki Sync
|
||
|
|
After=network-online.target
|
||
|
|
Wants=network-online.target
|
||
|
|
|
||
|
|
[Service]
|
||
|
|
ExecStart=/path/to/ob sync --continuous
|
||
|
|
WorkingDirectory=/home/user/wiki
|
||
|
|
Restart=on-failure
|
||
|
|
RestartSec=10
|
||
|
|
|
||
|
|
[Install]
|
||
|
|
WantedBy=default.target
|
||
|
|
```
|
||
|
|
|
||
|
|
```bash
|
||
|
|
systemctl --user daemon-reload
|
||
|
|
systemctl --user enable --now obsidian-wiki-sync
|
||
|
|
# Enable linger so sync survives logout:
|
||
|
|
sudo loginctl enable-linger $USER
|
||
|
|
```
|
||
|
|
|
||
|
|
This lets the agent write to `~/wiki` on a server while you browse the same
|
||
|
|
vault in Obsidian on your laptop/phone — changes appear within seconds.
|
||
|
|
|
||
|
|
## Pitfalls
|
||
|
|
|
||
|
|
- **Never modify files in `raw/`** — sources are immutable. Corrections go in wiki pages.
|
||
|
|
- **Always orient first** — read SCHEMA + index + recent log before any operation in a new session.
|
||
|
|
Skipping this causes duplicates and missed cross-references.
|
||
|
|
- **Always update index.md and log.md** — skipping this makes the wiki degrade. These are the
|
||
|
|
navigational backbone.
|
||
|
|
- **Don't create pages for passing mentions** — follow the Page Thresholds in SCHEMA.md. A name
|
||
|
|
appearing once in a footnote doesn't warrant an entity page.
|
||
|
|
- **Don't create pages without cross-references** — isolated pages are invisible. Every page must
|
||
|
|
link to at least 2 other pages.
|
||
|
|
- **Frontmatter is required** — it enables search, filtering, and staleness detection.
|
||
|
|
- **Tags must come from the taxonomy** — freeform tags decay into noise. Add new tags to SCHEMA.md
|
||
|
|
first, then use them.
|
||
|
|
- **Keep pages scannable** — a wiki page should be readable in 30 seconds. Split pages over
|
||
|
|
200 lines. Move detailed analysis to dedicated deep-dive pages.
|
||
|
|
- **Ask before mass-updating** — if an ingest would touch 10+ existing pages, confirm
|
||
|
|
the scope with the user first.
|
||
|
|
- **Rotate the log** — when log.md exceeds 500 entries, rename it `log-YYYY.md` and start fresh.
|
||
|
|
The agent should check log size during lint.
|
||
|
|
- **Handle contradictions explicitly** — don't silently overwrite. Note both claims with dates,
|
||
|
|
mark in frontmatter, flag for user review.
|