9.1 KiB
Paper Explainer Workflow
How to turn a research paper into an animated explainer video.
Why animate a paper?
A research paper is optimized for precision and completeness. A video is optimized for understanding and retention. The translation is NOT "read the paper aloud with pictures" — it's "extract the core insight and make it feel obvious through visual storytelling."
The paper has one job: prove the claim is true. The video has a different job: make the viewer understand WHY the claim is true, and WHY it matters.
Who is watching?
Before anything, decide the audience:
| Audience | Prerequisites | Pacing | Depth |
|---|---|---|---|
| General public | None | Slow, many analogies | Intuition only, skip proofs |
| Undergrad students | Basic math/CS | Medium, some formalism | Key equations, skip derivations |
| Grad students / researchers | Domain knowledge | Faster, more notation | Full equations, sketch proofs |
This determines everything: vocabulary, pacing, which sections to animate, how much math to show.
The 5-minute template
Most paper explainers fit this structure (scale times proportionally for longer videos):
| Section | Duration | Purpose |
|---|---|---|
| Hook | 0:00-0:30 | Surprising result or provocative question |
| Problem | 0:30-1:30 | What was broken/missing before this paper |
| Key insight | 1:30-3:00 | The core idea, explained visually |
| How it works | 3:00-4:00 | Method/algorithm, simplified |
| Evidence | 4:00-4:30 | Key result that proves it works |
| Implications | 4:30-5:00 | Why it matters, what it enables |
What to skip
- Related work survey → one sentence: "Previous approaches did X, which had problem Y"
- Implementation details → skip unless they're the contribution
- Ablation studies → show one chart at most
- Proofs → show the key step, not the full proof
- Hyperparameter tuning → skip entirely
What to expand
- The core insight → this gets the most screen time
- Geometric/visual intuition → if the paper has math, show what it MEANS
- Before/after comparison → the most compelling evidence
Pre-code workflow
Gate 1: Narration script
Write the full narration before any code. Every sentence maps to a visual beat. If you can't write the narration, you don't understand the paper well enough to animate it.
## Hook (30s)
"What if I told you that a model with 7 billion parameters can outperform
one with 70 billion — if you train it on the right data?"
## Problem (60s)
"The standard approach is to scale up. More parameters, more compute.
[VISUAL: bar chart showing model sizes growing exponentially]
But Chinchilla showed us that most models are undertrained..."
Gate 2: Scene list
After the narration, break it into scenes. Each scene is one Manim class.
Scene 1: Hook — surprising stat with animated counter
Scene 2: Problem — model size bar chart growing
Scene 3: Key insight — training data vs parameters, animated 2D plot
Scene 4: Method — pipeline diagram building left to right
Scene 5: Results — before/after comparison with animated bars
Scene 6: Closing — implications text
Gate 3: Style constants
Before coding scenes, define the visual language:
# style.py — import in every scene file
BG = "#0D1117"
PRIMARY = "#58C4DD"
SECONDARY = "#83C167"
ACCENT = "#FFFF00"
HIGHLIGHT = "#FF6B6B"
MONO = "Menlo"
# Color meanings for THIS paper
MODEL_COLOR = PRIMARY # "the model"
DATA_COLOR = SECONDARY # "training data"
BASELINE_COLOR = HIGHLIGHT # "previous approach"
RESULT_COLOR = ACCENT # "our result"
First-principles equation explanation
When the paper has a key equation, don't just show it — build it from intuition:
The "what would you do?" pattern
- Pose the problem in plain language
- Ask what the simplest solution would be
- Show why it doesn't work (animate the failure)
- Introduce the paper's solution as the fix
- THEN show the equation — it now feels earned
# Scene: Why we need attention (for a Transformer paper)
# Step 1: "How do we let each word look at every other word?"
# Step 2: Show naive approach (fully connected = O(n²) everything)
# Step 3: Show it breaks (information overload, no selectivity)
# Step 4: "What if each word could CHOOSE which words to attend to?"
# Step 5: Show attention equation — Q, K, V now mean something
Equation reveal strategy
# Show equation dimmed first (full destination)
eq = MathTex(r"Attention(Q,K,V) = softmax\left(\frac{QK^T}{\sqrt{d_k}}\right)V")
eq.set_opacity(0.15)
self.play(FadeIn(eq))
# Highlight Q, K, V one at a time with color + label
for part, color, label_text in [
(r"Q", PRIMARY, "Query: what am I looking for?"),
(r"K", SECONDARY, "Key: what do I contain?"),
(r"V", ACCENT, "Value: what do I output?"),
]:
eq.set_color_by_tex(part, color)
label = Text(label_text, font_size=18, color=color, font=MONO)
# position label, animate it, wait, then dim it
Building architecture diagrams
The progressive build pattern
Don't show the full architecture at once. Build it:
- First component appears alone → explain
- Arrow grows → "this feeds into..."
- Second component appears → explain
- Repeat until complete
# Component factory
def make_box(label, color, width=2.0, height=0.8):
box = RoundedRectangle(corner_radius=0.1, width=width, height=height,
color=color, fill_opacity=0.1, stroke_width=1.5)
text = Text(label, font_size=18, font=MONO, color=color).move_to(box)
return Group(box, text)
encoder = make_box("Encoder", PRIMARY)
decoder = make_box("Decoder", SECONDARY).next_to(encoder, RIGHT, buff=1.5)
arrow = Arrow(encoder.get_right(), decoder.get_left(), color=DIM, stroke_width=1.5)
self.play(FadeIn(encoder))
self.wait(1) # explain encoder
self.play(GrowArrow(arrow))
self.play(FadeIn(decoder))
self.wait(1) # explain decoder
Data flow animation
After building the diagram, show data moving through it:
# Dot traveling along the pipeline
data_dot = Dot(color=ACCENT, radius=0.1).move_to(encoder)
self.play(FadeIn(data_dot))
self.play(MoveAlongPath(data_dot, arrow), run_time=1)
self.play(data_dot.animate.move_to(decoder), run_time=0.5)
self.play(Flash(data_dot.get_center(), color=ACCENT), run_time=0.3)
Animating results
Bar chart comparison (most common)
# Before/after bars
before_data = [45, 52, 38, 61]
after_data = [78, 85, 72, 91]
labels = ["Task A", "Task B", "Task C", "Task D"]
before_chart = BarChart(before_data, bar_names=labels,
y_range=[0, 100, 20], bar_colors=[HIGHLIGHT]*4).scale(0.6).shift(LEFT*3)
after_chart = BarChart(after_data, bar_names=labels,
y_range=[0, 100, 20], bar_colors=[SECONDARY]*4).scale(0.6).shift(RIGHT*3)
before_label = Text("Baseline", font_size=20, color=HIGHLIGHT, font=MONO)
after_label = Text("Ours", font_size=20, color=SECONDARY, font=MONO)
# Reveal baseline first, then ours (dramatic comparison)
self.play(Create(before_chart), FadeIn(before_label))
self.wait(1.5)
self.play(Create(after_chart), FadeIn(after_label))
self.wait(0.5)
# Highlight the improvement
improvement = Text("+35% avg", font_size=24, color=ACCENT, font=MONO)
self.play(FadeIn(improvement))
Training curve (for ML papers)
tracker = ValueTracker(0)
curve = always_redraw(lambda: axes.plot(
lambda x: 1 - 0.8 * np.exp(-x / 3),
x_range=[0, tracker.get_value()], color=PRIMARY
))
epoch_label = always_redraw(lambda: Text(
f"Epoch {int(tracker.get_value())}", font_size=18, font=MONO
).to_corner(UR))
self.add(curve, epoch_label)
self.play(tracker.animate.set_value(10), run_time=5, rate_func=linear)
Domain-specific patterns
ML papers
- Show data flow through the model (animated pipeline)
- Training curves with
ValueTracker - Attention heatmaps as colored grids
- Embedding space as 2D scatter (PCA/t-SNE visualization)
- Loss landscape as 3D surface with gradient descent dot
Physics/math papers
- Use
LinearTransformationScenefor linear algebra - Vector fields with
ArrowVectorField/StreamLines - Phase spaces with
NumberPlane+ trajectories - Wave equations with time-parameterized plots
Systems/architecture papers
- Pipeline diagrams built progressively
ShowPassingFlashfor data flow along arrowsZoomedScenefor zooming into components- Before/after latency/throughput comparisons
Common mistakes
- Trying to cover the whole paper. A 5-minute video can explain ONE core insight well. Covering everything means explaining nothing.
- Reading the abstract as narration. Academic writing is designed for readers, not listeners. Rewrite in conversational language.
- Showing notation without meaning. Never show a symbol without first showing what it represents visually.
- Skipping the motivation. Jumping straight to "here's our method" without showing why the problem matters. The Problem section is what makes the viewer care.
- Identical pacing throughout. The hook and key insight need the most visual energy. The method section can be faster. Evidence should land with impact (pause after showing the big number).