Mixer/hermes-sync

Fork 0

Files

Mixer 1eba2bca95 Sync all skills and memories 2026-04-14 07:27

2026-04-14 07:27:20 +09:00

9.1 KiB

Raw Blame History

Paper Explainer Workflow

How to turn a research paper into an animated explainer video.

Why animate a paper?

A research paper is optimized for precision and completeness. A video is optimized for understanding and retention. The translation is NOT "read the paper aloud with pictures" — it's "extract the core insight and make it feel obvious through visual storytelling."

The paper has one job: prove the claim is true. The video has a different job: make the viewer understand WHY the claim is true, and WHY it matters.

Who is watching?

Before anything, decide the audience:

Audience	Prerequisites	Pacing	Depth
General public	None	Slow, many analogies	Intuition only, skip proofs
Undergrad students	Basic math/CS	Medium, some formalism	Key equations, skip derivations
Grad students / researchers	Domain knowledge	Faster, more notation	Full equations, sketch proofs

This determines everything: vocabulary, pacing, which sections to animate, how much math to show.

The 5-minute template

Most paper explainers fit this structure (scale times proportionally for longer videos):

Section	Duration	Purpose
Hook	0:00-0:30	Surprising result or provocative question
Problem	0:30-1:30	What was broken/missing before this paper
Key insight	1:30-3:00	The core idea, explained visually
How it works	3:00-4:00	Method/algorithm, simplified
Evidence	4:00-4:30	Key result that proves it works
Implications	4:30-5:00	Why it matters, what it enables

What to skip

Related work survey → one sentence: "Previous approaches did X, which had problem Y"
Implementation details → skip unless they're the contribution
Ablation studies → show one chart at most
Proofs → show the key step, not the full proof
Hyperparameter tuning → skip entirely

What to expand

The core insight → this gets the most screen time
Geometric/visual intuition → if the paper has math, show what it MEANS
Before/after comparison → the most compelling evidence

Pre-code workflow

Gate 1: Narration script

Write the full narration before any code. Every sentence maps to a visual beat. If you can't write the narration, you don't understand the paper well enough to animate it.

## Hook (30s)
"What if I told you that a model with 7 billion parameters can outperform
one with 70 billion — if you train it on the right data?"

## Problem (60s)
"The standard approach is to scale up. More parameters, more compute.
[VISUAL: bar chart showing model sizes growing exponentially]
But Chinchilla showed us that most models are undertrained..."

Gate 2: Scene list

After the narration, break it into scenes. Each scene is one Manim class.

Scene 1: Hook — surprising stat with animated counter
Scene 2: Problem — model size bar chart growing
Scene 3: Key insight — training data vs parameters, animated 2D plot
Scene 4: Method — pipeline diagram building left to right
Scene 5: Results — before/after comparison with animated bars
Scene 6: Closing — implications text

Gate 3: Style constants

Before coding scenes, define the visual language:

# style.py — import in every scene file
BG = "#0D1117"
PRIMARY = "#58C4DD"
SECONDARY = "#83C167"
ACCENT = "#FFFF00"
HIGHLIGHT = "#FF6B6B"
MONO = "Menlo"

# Color meanings for THIS paper
MODEL_COLOR = PRIMARY      # "the model"
DATA_COLOR = SECONDARY     # "training data"
BASELINE_COLOR = HIGHLIGHT # "previous approach"
RESULT_COLOR = ACCENT      # "our result"

First-principles equation explanation

When the paper has a key equation, don't just show it — build it from intuition:

The "what would you do?" pattern

Pose the problem in plain language
Ask what the simplest solution would be
Show why it doesn't work (animate the failure)
Introduce the paper's solution as the fix
THEN show the equation — it now feels earned

# Scene: Why we need attention (for a Transformer paper)
# Step 1: "How do we let each word look at every other word?"
# Step 2: Show naive approach (fully connected = O(n²) everything)
# Step 3: Show it breaks (information overload, no selectivity)
# Step 4: "What if each word could CHOOSE which words to attend to?"
# Step 5: Show attention equation — Q, K, V now mean something

Equation reveal strategy

# Show equation dimmed first (full destination)
eq = MathTex(r"Attention(Q,K,V) = softmax\left(\frac{QK^T}{\sqrt{d_k}}\right)V")
eq.set_opacity(0.15)
self.play(FadeIn(eq))

# Highlight Q, K, V one at a time with color + label
for part, color, label_text in [
    (r"Q", PRIMARY, "Query: what am I looking for?"),
    (r"K", SECONDARY, "Key: what do I contain?"),
    (r"V", ACCENT, "Value: what do I output?"),
]:
    eq.set_color_by_tex(part, color)
    label = Text(label_text, font_size=18, color=color, font=MONO)
    # position label, animate it, wait, then dim it

Building architecture diagrams

The progressive build pattern

Don't show the full architecture at once. Build it:

First component appears alone → explain
Arrow grows → "this feeds into..."
Second component appears → explain
Repeat until complete

# Component factory
def make_box(label, color, width=2.0, height=0.8):
    box = RoundedRectangle(corner_radius=0.1, width=width, height=height,
                           color=color, fill_opacity=0.1, stroke_width=1.5)
    text = Text(label, font_size=18, font=MONO, color=color).move_to(box)
    return Group(box, text)

encoder = make_box("Encoder", PRIMARY)
decoder = make_box("Decoder", SECONDARY).next_to(encoder, RIGHT, buff=1.5)
arrow = Arrow(encoder.get_right(), decoder.get_left(), color=DIM, stroke_width=1.5)

self.play(FadeIn(encoder))
self.wait(1)  # explain encoder
self.play(GrowArrow(arrow))
self.play(FadeIn(decoder))
self.wait(1)  # explain decoder

Data flow animation

After building the diagram, show data moving through it:

# Dot traveling along the pipeline
data_dot = Dot(color=ACCENT, radius=0.1).move_to(encoder)
self.play(FadeIn(data_dot))
self.play(MoveAlongPath(data_dot, arrow), run_time=1)
self.play(data_dot.animate.move_to(decoder), run_time=0.5)
self.play(Flash(data_dot.get_center(), color=ACCENT), run_time=0.3)

Animating results

Bar chart comparison (most common)

# Before/after bars
before_data = [45, 52, 38, 61]
after_data = [78, 85, 72, 91]
labels = ["Task A", "Task B", "Task C", "Task D"]

before_chart = BarChart(before_data, bar_names=labels,
    y_range=[0, 100, 20], bar_colors=[HIGHLIGHT]*4).scale(0.6).shift(LEFT*3)
after_chart = BarChart(after_data, bar_names=labels,
    y_range=[0, 100, 20], bar_colors=[SECONDARY]*4).scale(0.6).shift(RIGHT*3)

before_label = Text("Baseline", font_size=20, color=HIGHLIGHT, font=MONO)
after_label = Text("Ours", font_size=20, color=SECONDARY, font=MONO)

# Reveal baseline first, then ours (dramatic comparison)
self.play(Create(before_chart), FadeIn(before_label))
self.wait(1.5)
self.play(Create(after_chart), FadeIn(after_label))
self.wait(0.5)

# Highlight the improvement
improvement = Text("+35% avg", font_size=24, color=ACCENT, font=MONO)
self.play(FadeIn(improvement))

Training curve (for ML papers)

tracker = ValueTracker(0)
curve = always_redraw(lambda: axes.plot(
    lambda x: 1 - 0.8 * np.exp(-x / 3),
    x_range=[0, tracker.get_value()], color=PRIMARY
))
epoch_label = always_redraw(lambda: Text(
    f"Epoch {int(tracker.get_value())}", font_size=18, font=MONO
).to_corner(UR))

self.add(curve, epoch_label)
self.play(tracker.animate.set_value(10), run_time=5, rate_func=linear)

Domain-specific patterns

ML papers

Show data flow through the model (animated pipeline)
Training curves with ValueTracker
Attention heatmaps as colored grids
Embedding space as 2D scatter (PCA/t-SNE visualization)
Loss landscape as 3D surface with gradient descent dot

Physics/math papers

Use LinearTransformationScene for linear algebra
Vector fields with ArrowVectorField / StreamLines
Phase spaces with NumberPlane + trajectories
Wave equations with time-parameterized plots

Systems/architecture papers

Pipeline diagrams built progressively
ShowPassingFlash for data flow along arrows
ZoomedScene for zooming into components
Before/after latency/throughput comparisons

Common mistakes

Trying to cover the whole paper. A 5-minute video can explain ONE core insight well. Covering everything means explaining nothing.
Reading the abstract as narration. Academic writing is designed for readers, not listeners. Rewrite in conversational language.
Showing notation without meaning. Never show a symbol without first showing what it represents visually.
Skipping the motivation. Jumping straight to "here's our method" without showing why the problem matters. The Problem section is what makes the viewer care.
Identical pacing throughout. The hook and key insight need the most visual energy. The method section can be faster. Evidence should land with impact (pause after showing the big number).

9.1 KiB Raw Blame History