Files

17 KiB

name, description, version, author, license, metadata
name description version author license metadata
llm-wiki Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency. 2.0.0 Hermes Agent MIT
hermes
tags category related_skills config
wiki
knowledge-base
research
notes
markdown
rag-alternative
research
obsidian
arxiv
agentic-research-ideas
key description default prompt
wiki.path Path to the LLM Wiki knowledge base directory ~/wiki Wiki directory path

Karpathy's LLM Wiki

Build and maintain a persistent, compounding knowledge base as interlinked markdown files. Based on Andrej Karpathy's LLM Wiki pattern.

Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki compiles knowledge once and keeps it current. Cross-references are already there. Contradictions have already been flagged. Synthesis reflects everything ingested.

Division of labor: The human curates sources and directs analysis. The agent summarizes, cross-references, files, and maintains consistency.

When This Skill Activates

Use this skill when the user:

  • Asks to create, build, or start a wiki or knowledge base
  • Asks to ingest, add, or process a source into their wiki
  • Asks a question and an existing wiki is present at the configured path
  • Asks to lint, audit, or health-check their wiki
  • References their wiki, knowledge base, or "notes" in a research context

Wiki Location

Configured via skills.config.wiki.path in ~/.hermes/config.yaml (prompted during hermes config migrate or hermes setup):

skills:
  config:
    wiki:
      path: ~/wiki

Falls back to ~/wiki default. The resolved path is injected when this skill loads — check the [Skill config: ...] block above for the active value.

The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or any editor. No database, no special tooling required.

Architecture: Three Layers

wiki/
├── SCHEMA.md           # Conventions, structure rules, domain config
├── index.md            # Sectioned content catalog with one-line summaries
├── log.md              # Chronological action log (append-only, rotated yearly)
├── raw/                # Layer 1: Immutable source material
│   ├── articles/       # Web articles, clippings
│   ├── papers/         # PDFs, arxiv papers
│   ├── transcripts/    # Meeting notes, interviews
│   └── assets/         # Images, diagrams referenced by sources
├── entities/           # Layer 2: Entity pages (people, orgs, products, models)
├── concepts/           # Layer 2: Concept/topic pages
├── comparisons/        # Layer 2: Side-by-side analyses
└── queries/            # Layer 2: Filed query results worth keeping

Layer 1 — Raw Sources: Immutable. The agent reads but never modifies these. Layer 2 — The Wiki: Agent-owned markdown files. Created, updated, and cross-referenced by the agent. Layer 3 — The Schema: SCHEMA.md defines structure, conventions, and tag taxonomy.

Resuming an Existing Wiki (CRITICAL — do this every session)

When the user has an existing wiki, always orient yourself before doing anything:

Read SCHEMA.md — understand the domain, conventions, and tag taxonomy. ② Read index.md — learn what pages exist and their summaries. ③ Scan recent log.md — read the last 20-30 entries to understand recent activity.

WIKI="${wiki_path:-$HOME/wiki}"
# Orientation reads at session start
read_file "$WIKI/SCHEMA.md"
read_file "$WIKI/index.md"
read_file "$WIKI/log.md" offset=<last 30 lines>

Only after orientation should you ingest, query, or lint. This prevents:

  • Creating duplicate pages for entities that already exist
  • Missing cross-references to existing content
  • Contradicting the schema's conventions
  • Repeating work already logged

For large wikis (100+ pages), also run a quick search_files for the topic at hand before creating anything new.

Initializing a New Wiki

When the user asks to create or start a wiki:

  1. Determine the wiki path (from config, env var, or ask the user; default ~/wiki)
  2. Create the directory structure above
  3. Ask the user what domain the wiki covers — be specific
  4. Write SCHEMA.md customized to the domain (see template below)
  5. Write initial index.md with sectioned header
  6. Write initial log.md with creation entry
  7. Confirm the wiki is ready and suggest first sources to ingest

SCHEMA.md Template

Adapt to the user's domain. The schema constrains agent behavior and ensures consistency:

# Wiki Schema

## Domain
[What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"]

## Conventions
- File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`)
- Every wiki page starts with YAML frontmatter (see below)
- Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page)
- When updating a page, always bump the `updated` date
- Every new page must be added to `index.md` under the correct section
- Every action must be appended to `log.md`

## Frontmatter
  ```yaml
  ---
  title: Page Title
  created: YYYY-MM-DD
  updated: YYYY-MM-DD
  type: entity | concept | comparison | query | summary
  tags: [from taxonomy below]
  sources: [raw/articles/source-name.md]
  ---

Tag Taxonomy

[Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.]

Example for AI/ML:

  • Models: model, architecture, benchmark, training
  • People/Orgs: person, company, lab, open-source
  • Techniques: optimization, fine-tuning, inference, alignment, data
  • Meta: comparison, timeline, controversy, prediction

Rule: every tag on a page must appear in this taxonomy. If a new tag is needed, add it here first, then use it. This prevents tag sprawl.

Page Thresholds

  • Create a page when an entity/concept appears in 2+ sources OR is central to one source
  • Add to existing page when a source mentions something already covered
  • DON'T create a page for passing mentions, minor details, or things outside the domain
  • Split a page when it exceeds ~200 lines — break into sub-topics with cross-links
  • Archive a page when its content is fully superseded — move to _archive/, remove from index

Entity Pages

One page per notable entity. Include:

  • Overview / what it is
  • Key facts and dates
  • Relationships to other entities (wikilinks)
  • Source references

Concept Pages

One page per concept or topic. Include:

  • Definition / explanation
  • Current state of knowledge
  • Open questions or debates
  • Related concepts (wikilinks)

Comparison Pages

Side-by-side analyses. Include:

  • What is being compared and why
  • Dimensions of comparison (table format preferred)
  • Verdict or synthesis
  • Sources

Update Policy

When new information conflicts with existing content:

  1. Check the dates — newer sources generally supersede older ones
  2. If genuinely contradictory, note both positions with dates and sources
  3. Mark the contradiction in frontmatter: contradictions: [page-name]
  4. Flag for user review in the lint report

### index.md Template

The index is sectioned by type. Each entry is one line: wikilink + summary.

```markdown
# Wiki Index

> Content catalog. Every wiki page listed under its type with a one-line summary.
> Read this first to find relevant pages for any query.
> Last updated: YYYY-MM-DD | Total pages: N

## Entities
<!-- Alphabetical within section -->

## Concepts

## Comparisons

## Queries

Scaling rule: When any section exceeds 50 entries, split it into sub-sections by first letter or sub-domain. When the index exceeds 200 entries total, create a _meta/topic-map.md that groups pages by theme for faster navigation.

log.md Template

# Wiki Log

> Chronological record of all wiki actions. Append-only.
> Format: `## [YYYY-MM-DD] action | subject`
> Actions: ingest, update, query, lint, create, archive, delete
> When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh.

## [YYYY-MM-DD] create | Wiki initialized
- Domain: [domain]
- Structure created with SCHEMA.md, index.md, log.md

Core Operations

1. Ingest

When the user provides a source (URL, file, paste), integrate it into the wiki:

Capture the raw source:

  • URL → use web_extract to get markdown, save to raw/articles/
  • PDF → use web_extract (handles PDFs), save to raw/papers/
  • Pasted text → save to appropriate raw/ subdirectory
  • Name the file descriptively: raw/articles/karpathy-llm-wiki-2026.md

Discuss takeaways with the user — what's interesting, what matters for the domain. (Skip this in automated/cron contexts — proceed directly.)

Check what already exists — search index.md and use search_files to find existing pages for mentioned entities/concepts. This is the difference between a growing wiki and a pile of duplicates.

Write or update wiki pages:

  • New entities/concepts: Create pages only if they meet the Page Thresholds in SCHEMA.md (2+ source mentions, or central to one source)
  • Existing pages: Add new information, update facts, bump updated date. When new info contradicts existing content, follow the Update Policy.
  • Cross-reference: Every new or updated page must link to at least 2 other pages via [[wikilinks]]. Check that existing pages link back.
  • Tags: Only use tags from the taxonomy in SCHEMA.md

Update navigation:

  • Add new pages to index.md under the correct section, alphabetically
  • Update the "Total pages" count and "Last updated" date in index header
  • Append to log.md: ## [YYYY-MM-DD] ingest | Source Title
  • List every file created or updated in the log entry

Report what changed — list every file created or updated to the user.

A single source can trigger updates across 5-15 wiki pages. This is normal and desired — it's the compounding effect.

2. Query

When the user asks a question about the wiki's domain:

Read index.md to identify relevant pages. ② For wikis with 100+ pages, also search_files across all .md files for key terms — the index alone may miss relevant content. ③ Read the relevant pages using read_file. ④ Synthesize an answer from the compiled knowledge. Cite the wiki pages you drew from: "Based on page-a and page-b..." ⑤ File valuable answers back — if the answer is a substantial comparison, deep dive, or novel synthesis, create a page in queries/ or comparisons/. Don't file trivial lookups — only answers that would be painful to re-derive. ⑥ Update log.md with the query and whether it was filed.

3. Lint

When the user asks to lint, health-check, or audit the wiki:

Orphan pages: Find pages with no inbound [[wikilinks]] from other pages.

# Use execute_code for this — programmatic scan across all wiki pages
import os, re
from collections import defaultdict
wiki = "<WIKI_PATH>"
# Scan all .md files in entities/, concepts/, comparisons/, queries/
# Extract all [[wikilinks]] — build inbound link map
# Pages with zero inbound links are orphans

Broken wikilinks: Find [[links]] that point to pages that don't exist.

Index completeness: Every wiki page should appear in index.md. Compare the filesystem against index entries.

Frontmatter validation: Every wiki page must have all required fields (title, created, updated, type, tags, sources). Tags must be in the taxonomy.

Stale content: Pages whose updated date is >90 days older than the most recent source that mentions the same entities.

Contradictions: Pages on the same topic with conflicting claims. Look for pages that share tags/entities but state different facts.

Page size: Flag pages over 200 lines — candidates for splitting.

Tag audit: List all tags in use, flag any not in the SCHEMA.md taxonomy.

Log rotation: If log.md exceeds 500 entries, rotate it.

Report findings with specific file paths and suggested actions, grouped by severity (broken links > orphans > stale content > style issues).

Append to log.md: ## [YYYY-MM-DD] lint | N issues found

Working with the Wiki

Searching

# Find pages by content
search_files "transformer" path="$WIKI" file_glob="*.md"

# Find pages by filename
search_files "*.md" target="files" path="$WIKI"

# Find pages by tag
search_files "tags:.*alignment" path="$WIKI" file_glob="*.md"

# Recent activity
read_file "$WIKI/log.md" offset=<last 20 lines>

Bulk Ingest

When ingesting multiple sources at once, batch the updates:

  1. Read all sources first
  2. Identify all entities and concepts across all sources
  3. Check existing pages for all of them (one search pass, not N)
  4. Create/update pages in one pass (avoids redundant updates)
  5. Update index.md once at the end
  6. Write a single log entry covering the batch

Archiving

When content is fully superseded or the domain scope changes:

  1. Create _archive/ directory if it doesn't exist
  2. Move the page to _archive/ with its original path (e.g., _archive/entities/old-page.md)
  3. Remove from index.md
  4. Update any pages that linked to it — replace wikilink with plain text + "(archived)"
  5. Log the archive action

Obsidian Integration

The wiki directory works as an Obsidian vault out of the box:

  • [[wikilinks]] render as clickable links
  • Graph View visualizes the knowledge network
  • YAML frontmatter powers Dataview queries
  • The raw/assets/ folder holds images referenced via ![[image.png]]

For best results:

  • Set Obsidian's attachment folder to raw/assets/
  • Enable "Wikilinks" in Obsidian settings (usually on by default)
  • Install Dataview plugin for queries like TABLE tags FROM "entities" WHERE contains(tags, "company")

If using the Obsidian skill alongside this one, set OBSIDIAN_VAULT_PATH to the same directory as the wiki path.

Obsidian Headless (servers and headless machines)

On machines without a display, use obsidian-headless instead of the desktop app. It syncs vaults via Obsidian Sync without a GUI — perfect for agents running on servers that write to the wiki while Obsidian desktop reads it on another device.

Setup:

# Requires Node.js 22+
npm install -g obsidian-headless

# Login (requires Obsidian account with Sync subscription)
ob login --email <email> --password '<password>'

# Create a remote vault for the wiki
ob sync-create-remote --name "LLM Wiki"

# Connect the wiki directory to the vault
cd ~/wiki
ob sync-setup --vault "<vault-id>"

# Initial sync
ob sync

# Continuous sync (foreground — use systemd for background)
ob sync --continuous

Continuous background sync via systemd:

# ~/.config/systemd/user/obsidian-wiki-sync.service
[Unit]
Description=Obsidian LLM Wiki Sync
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/path/to/ob sync --continuous
WorkingDirectory=/home/user/wiki
Restart=on-failure
RestartSec=10

[Install]
WantedBy=default.target
systemctl --user daemon-reload
systemctl --user enable --now obsidian-wiki-sync
# Enable linger so sync survives logout:
sudo loginctl enable-linger $USER

This lets the agent write to ~/wiki on a server while you browse the same vault in Obsidian on your laptop/phone — changes appear within seconds.

Pitfalls

  • Never modify files in raw/ — sources are immutable. Corrections go in wiki pages.
  • Always orient first — read SCHEMA + index + recent log before any operation in a new session. Skipping this causes duplicates and missed cross-references.
  • Always update index.md and log.md — skipping this makes the wiki degrade. These are the navigational backbone.
  • Don't create pages for passing mentions — follow the Page Thresholds in SCHEMA.md. A name appearing once in a footnote doesn't warrant an entity page.
  • Don't create pages without cross-references — isolated pages are invisible. Every page must link to at least 2 other pages.
  • Frontmatter is required — it enables search, filtering, and staleness detection.
  • Tags must come from the taxonomy — freeform tags decay into noise. Add new tags to SCHEMA.md first, then use them.
  • Keep pages scannable — a wiki page should be readable in 30 seconds. Split pages over 200 lines. Move detailed analysis to dedicated deep-dive pages.
  • Ask before mass-updating — if an ingest would touch 10+ existing pages, confirm the scope with the user first.
  • Rotate the log — when log.md exceeds 500 entries, rename it log-YYYY.md and start fresh. The agent should check log size during lint.
  • Handle contradictions explicitly — don't silently overwrite. Note both claims with dates, mark in frontmatter, flag for user review.