hermes-sync/skills/creative/manim-video/references/paper-explainer.md

# Paper Explainer Workflow

How to turn a research paper into an animated explainer video.

## Why animate a paper?

A research paper is optimized for precision and completeness. A video is optimized for understanding and retention. The translation is NOT "read the paper aloud with pictures" — it's "extract the core insight and make it feel obvious through visual storytelling."

The paper has one job: prove the claim is true. The video has a different job: make the viewer understand WHY the claim is true, and WHY it matters.

## Who is watching?

Before anything, decide the audience:

| Audience | Prerequisites | Pacing | Depth |
|----------|--------------|--------|-------|
| General public | None | Slow, many analogies | Intuition only, skip proofs |
| Undergrad students | Basic math/CS | Medium, some formalism | Key equations, skip derivations |
| Grad students / researchers | Domain knowledge | Faster, more notation | Full equations, sketch proofs |

This determines everything: vocabulary, pacing, which sections to animate, how much math to show.

## The 5-minute template

Most paper explainers fit this structure (scale times proportionally for longer videos):

| Section | Duration | Purpose |
|---------|----------|---------|
| **Hook** | 0:00-0:30 | Surprising result or provocative question |
| **Problem** | 0:30-1:30 | What was broken/missing before this paper |
| **Key insight** | 1:30-3:00 | The core idea, explained visually |
| **How it works** | 3:00-4:00 | Method/algorithm, simplified |
| **Evidence** | 4:00-4:30 | Key result that proves it works |
| **Implications** | 4:30-5:00 | Why it matters, what it enables |

### What to skip

- Related work survey → one sentence: "Previous approaches did X, which had problem Y"
- Implementation details → skip unless they're the contribution
- Ablation studies → show one chart at most
- Proofs → show the key step, not the full proof
- Hyperparameter tuning → skip entirely

### What to expand

- The core insight → this gets the most screen time
- Geometric/visual intuition → if the paper has math, show what it MEANS
- Before/after comparison → the most compelling evidence

## Pre-code workflow

### Gate 1: Narration script

Write the full narration before any code. Every sentence maps to a visual beat. If you can't write the narration, you don't understand the paper well enough to animate it.

```markdown
## Hook (30s)
"What if I told you that a model with 7 billion parameters can outperform
one with 70 billion — if you train it on the right data?"

## Problem (60s)
"The standard approach is to scale up. More parameters, more compute.
[VISUAL: bar chart showing model sizes growing exponentially]
But Chinchilla showed us that most models are undertrained..."
```

### Gate 2: Scene list

After the narration, break it into scenes. Each scene is one Manim class.

```markdown
Scene 1: Hook — surprising stat with animated counter
Scene 2: Problem — model size bar chart growing
Scene 3: Key insight — training data vs parameters, animated 2D plot
Scene 4: Method — pipeline diagram building left to right
Scene 5: Results — before/after comparison with animated bars
Scene 6: Closing — implications text
```

### Gate 3: Style constants

Before coding scenes, define the visual language:

```python
# style.py — import in every scene file
BG = "#0D1117"
PRIMARY = "#58C4DD"
SECONDARY = "#83C167"
ACCENT = "#FFFF00"
HIGHLIGHT = "#FF6B6B"
MONO = "Menlo"

# Color meanings for THIS paper
MODEL_COLOR = PRIMARY      # "the model"
DATA_COLOR = SECONDARY     # "training data"
BASELINE_COLOR = HIGHLIGHT # "previous approach"
RESULT_COLOR = ACCENT      # "our result"
```

## First-principles equation explanation

When the paper has a key equation, don't just show it — build it from intuition:

### The "what would you do?" pattern

1. Pose the problem in plain language
2. Ask what the simplest solution would be
3. Show why it doesn't work (animate the failure)
4. Introduce the paper's solution as the fix
5. THEN show the equation — it now feels earned

```python
# Scene: Why we need attention (for a Transformer paper)
# Step 1: "How do we let each word look at every other word?"
# Step 2: Show naive approach (fully connected = O(n²) everything)
# Step 3: Show it breaks (information overload, no selectivity)
# Step 4: "What if each word could CHOOSE which words to attend to?"
# Step 5: Show attention equation — Q, K, V now mean something
```

### Equation reveal strategy

```python
# Show equation dimmed first (full destination)
eq = MathTex(r"Attention(Q,K,V) = softmax\left(\frac{QK^T}{\sqrt{d_k}}\right)V")
eq.set_opacity(0.15)
self.play(FadeIn(eq))

# Highlight Q, K, V one at a time with color + label
for part, color, label_text in [
    (r"Q", PRIMARY, "Query: what am I looking for?"),
    (r"K", SECONDARY, "Key: what do I contain?"),
    (r"V", ACCENT, "Value: what do I output?"),
]:
    eq.set_color_by_tex(part, color)
    label = Text(label_text, font_size=18, color=color, font=MONO)
    # position label, animate it, wait, then dim it
```

## Building architecture diagrams

### The progressive build pattern

Don't show the full architecture at once. Build it:

1. First component appears alone → explain
2. Arrow grows → "this feeds into..."
3. Second component appears → explain
4. Repeat until complete

```python
# Component factory
def make_box(label, color, width=2.0, height=0.8):
    box = RoundedRectangle(corner_radius=0.1, width=width, height=height,
                           color=color, fill_opacity=0.1, stroke_width=1.5)
    text = Text(label, font_size=18, font=MONO, color=color).move_to(box)
    return Group(box, text)

encoder = make_box("Encoder", PRIMARY)
decoder = make_box("Decoder", SECONDARY).next_to(encoder, RIGHT, buff=1.5)
arrow = Arrow(encoder.get_right(), decoder.get_left(), color=DIM, stroke_width=1.5)

self.play(FadeIn(encoder))
self.wait(1)  # explain encoder
self.play(GrowArrow(arrow))
self.play(FadeIn(decoder))
self.wait(1)  # explain decoder
```

### Data flow animation

After building the diagram, show data moving through it:

```python
# Dot traveling along the pipeline
data_dot = Dot(color=ACCENT, radius=0.1).move_to(encoder)
self.play(FadeIn(data_dot))
self.play(MoveAlongPath(data_dot, arrow), run_time=1)
self.play(data_dot.animate.move_to(decoder), run_time=0.5)
self.play(Flash(data_dot.get_center(), color=ACCENT), run_time=0.3)
```

## Animating results

### Bar chart comparison (most common)

```python
# Before/after bars
before_data = [45, 52, 38, 61]
after_data = [78, 85, 72, 91]
labels = ["Task A", "Task B", "Task C", "Task D"]

before_chart = BarChart(before_data, bar_names=labels,
    y_range=[0, 100, 20], bar_colors=[HIGHLIGHT]*4).scale(0.6).shift(LEFT*3)
after_chart = BarChart(after_data, bar_names=labels,
    y_range=[0, 100, 20], bar_colors=[SECONDARY]*4).scale(0.6).shift(RIGHT*3)

before_label = Text("Baseline", font_size=20, color=HIGHLIGHT, font=MONO)
after_label = Text("Ours", font_size=20, color=SECONDARY, font=MONO)

# Reveal baseline first, then ours (dramatic comparison)
self.play(Create(before_chart), FadeIn(before_label))
self.wait(1.5)
self.play(Create(after_chart), FadeIn(after_label))
self.wait(0.5)

# Highlight the improvement
improvement = Text("+35% avg", font_size=24, color=ACCENT, font=MONO)
self.play(FadeIn(improvement))
```

### Training curve (for ML papers)

```python
tracker = ValueTracker(0)
curve = always_redraw(lambda: axes.plot(
    lambda x: 1 - 0.8 * np.exp(-x / 3),
    x_range=[0, tracker.get_value()], color=PRIMARY
))
epoch_label = always_redraw(lambda: Text(
    f"Epoch {int(tracker.get_value())}", font_size=18, font=MONO
).to_corner(UR))

self.add(curve, epoch_label)
self.play(tracker.animate.set_value(10), run_time=5, rate_func=linear)
```

## Domain-specific patterns

### ML papers
- Show data flow through the model (animated pipeline)
- Training curves with `ValueTracker`
- Attention heatmaps as colored grids
- Embedding space as 2D scatter (PCA/t-SNE visualization)
- Loss landscape as 3D surface with gradient descent dot

### Physics/math papers
- Use `LinearTransformationScene` for linear algebra
- Vector fields with `ArrowVectorField` / `StreamLines`
- Phase spaces with `NumberPlane` + trajectories
- Wave equations with time-parameterized plots

### Systems/architecture papers
- Pipeline diagrams built progressively
- `ShowPassingFlash` for data flow along arrows
- `ZoomedScene` for zooming into components
- Before/after latency/throughput comparisons

## Common mistakes

1. **Trying to cover the whole paper.** A 5-minute video can explain ONE core insight well. Covering everything means explaining nothing.
2. **Reading the abstract as narration.** Academic writing is designed for readers, not listeners. Rewrite in conversational language.
3. **Showing notation without meaning.** Never show a symbol without first showing what it represents visually.
4. **Skipping the motivation.** Jumping straight to "here's our method" without showing why the problem matters. The Problem section is what makes the viewer care.
5. **Identical pacing throughout.** The hook and key insight need the most visual energy. The method section can be faster. Evidence should land with impact (pause after showing the big number).