Sync all skills and memories 2026-04-14 07:27
This commit is contained in:
@@ -0,0 +1,474 @@
|
||||
# ML Paper Writing Philosophy & Best Practices
|
||||
|
||||
This reference compiles writing advice from prominent ML researchers including Neel Nanda, Andrej Karpathy, Sebastian Farquhar, Zachary Lipton, and Jacob Steinhardt.
|
||||
|
||||
---
|
||||
|
||||
## Contents
|
||||
|
||||
- [The Narrative Principle](#the-narrative-principle)
|
||||
- [Time Allocation](#time-allocation)
|
||||
- [Abstract Writing Formula](#abstract-writing-formula)
|
||||
- [Introduction Structure](#introduction-structure)
|
||||
- [Sentence-Level Clarity](#sentence-level-clarity)
|
||||
- [Word Choice and Precision](#word-choice-and-precision)
|
||||
- [Mathematical Writing](#mathematical-writing)
|
||||
- [Figure Design](#figure-design)
|
||||
- [Common Mistakes to Avoid](#common-mistakes-to-avoid)
|
||||
|
||||
---
|
||||
|
||||
## The Narrative Principle
|
||||
|
||||
### From Neel Nanda
|
||||
|
||||
"A paper is a short, rigorous, evidence-based technical story with a takeaway readers care about."
|
||||
|
||||
The narrative rests on three pillars that must be crystal clear by the end of your introduction:
|
||||
|
||||
**The "What"**: One to three specific novel claims fitting within a cohesive theme. Vague contributions like "we study X" fail immediately—reviewers need precise, falsifiable claims.
|
||||
|
||||
**The "Why"**: Rigorous empirical evidence that convincingly supports those claims, including strong baselines honestly tuned and experiments that distinguish between competing hypotheses rather than merely showing "decent results."
|
||||
|
||||
**The "So What"**: Why readers should care, connecting your contribution to problems the community recognizes as important.
|
||||
|
||||
### From Andrej Karpathy
|
||||
|
||||
"A paper is not a random collection of experiments you report on. The paper sells a single thing that was not obvious or present before. The entire paper is organized around this core contribution with surgical precision."
|
||||
|
||||
This applies whether you're presenting a new architecture, a theoretical result, or improved understanding of existing methods—NeurIPS explicitly notes that "originality does not necessarily require an entirely new method."
|
||||
|
||||
**Practical Implication**: If you cannot state your contribution in one sentence, you don't yet have a paper. Everything else—experiments, related work, discussion—exists only to support that core claim.
|
||||
|
||||
---
|
||||
|
||||
## Time Allocation
|
||||
|
||||
### From Neel Nanda
|
||||
|
||||
Spend approximately **the same amount of time** on each of:
|
||||
1. The abstract
|
||||
2. The introduction
|
||||
3. The figures
|
||||
4. Everything else combined
|
||||
|
||||
This isn't hyperbole—most reviewers form preliminary judgments before reaching your methods section. Readers encounter your paper in a predictable pattern: **title → abstract → introduction → figures → maybe the rest.**
|
||||
|
||||
### Reviewer Reading Patterns
|
||||
|
||||
Studies of reviewer behavior show:
|
||||
- Abstract is read 100% of the time
|
||||
- Introduction is skimmed by 90%+ of reviewers
|
||||
- Figures are examined before methods by most reviewers
|
||||
- Full methods are read only if interest is established
|
||||
|
||||
**Implication**: Front-load your paper's value. Don't bury the contribution.
|
||||
|
||||
---
|
||||
|
||||
## Abstract Writing Formula
|
||||
|
||||
### Sebastian Farquhar's 5-Sentence Formula
|
||||
|
||||
1. **What you achieved**: "We introduce...", "We prove...", "We demonstrate..."
|
||||
2. **Why this is hard and important**
|
||||
3. **How you do it** (with specialist keywords for discoverability)
|
||||
4. **What evidence you have**
|
||||
5. **Your most remarkable number/result**
|
||||
|
||||
### Example (Good Abstract)
|
||||
|
||||
```
|
||||
We prove that gradient descent on overparameterized neural networks
|
||||
converges to global minima at a linear rate. [What]
|
||||
This resolves a fundamental question about why deep learning works
|
||||
despite non-convex optimization landscapes. [Why hard/important]
|
||||
Our proof relies on showing that the Neural Tangent Kernel remains
|
||||
approximately constant during training, reducing the problem to
|
||||
kernel regression. [How with keywords]
|
||||
We validate our theory on CIFAR-10 and ImageNet, showing that
|
||||
predicted convergence rates match experiments within 5%. [Evidence]
|
||||
This is the first polynomial-time convergence guarantee for
|
||||
networks with practical depth and width. [Remarkable result]
|
||||
```
|
||||
|
||||
### What to Avoid
|
||||
|
||||
From Zachary Lipton: "If the first sentence can be pre-pended to any ML paper, delete it."
|
||||
|
||||
**Delete these openings**:
|
||||
- "Large language models have achieved remarkable success..."
|
||||
- "Deep learning has revolutionized..."
|
||||
- "In recent years, neural networks have..."
|
||||
|
||||
**Start with your specific contribution instead.**
|
||||
|
||||
---
|
||||
|
||||
## Introduction Structure
|
||||
|
||||
### Requirements
|
||||
|
||||
- **1-1.5 pages maximum** (in two-column format)
|
||||
- **Methods should start by page 2-3**
|
||||
- Must include **2-4 bullet contribution list** (max 1-2 lines each)
|
||||
|
||||
### Structure Template
|
||||
|
||||
```markdown
|
||||
1. Opening Hook (2-3 sentences)
|
||||
- State the problem your paper addresses
|
||||
- Why it matters RIGHT NOW
|
||||
|
||||
2. Background/Challenge (1 paragraph)
|
||||
- What makes this problem hard?
|
||||
- What have others tried? Why is it insufficient?
|
||||
|
||||
3. Your Approach (1 paragraph)
|
||||
- What do you do differently?
|
||||
- Key insight that enables your contribution
|
||||
|
||||
4. Contribution Bullets (2-4 items)
|
||||
- Be specific and falsifiable
|
||||
- Each bullet: 1-2 lines maximum
|
||||
|
||||
5. Results Preview (2-3 sentences)
|
||||
- Most impressive numbers
|
||||
- Scope of evaluation
|
||||
|
||||
6. Paper Organization (optional, 1-2 sentences)
|
||||
- "Section 2 presents... Section 3 describes..."
|
||||
```
|
||||
|
||||
### Contribution Bullets: Good vs Bad
|
||||
|
||||
**Good:**
|
||||
- We prove that X converges in O(n log n) time under assumption Y
|
||||
- We introduce Z, a 3-layer architecture that reduces memory by 40%
|
||||
- We demonstrate that A outperforms B by 15% on benchmark C
|
||||
|
||||
**Bad:**
|
||||
- We study the problem of X (not a contribution)
|
||||
- We provide extensive experiments (too vague)
|
||||
- We make several contributions to the field (says nothing)
|
||||
|
||||
---
|
||||
|
||||
## Sentence-Level Clarity
|
||||
|
||||
### From Gopen & Swan: "The Science of Scientific Writing"
|
||||
|
||||
The seminal 1990 paper by George Gopen and Judith Swan establishes that **readers have structural expectations** about where information appears in prose. Violating these expectations forces readers to spend energy on structure rather than content.
|
||||
|
||||
> "If the reader is to grasp what the writer means, the writer must understand what the reader needs."
|
||||
|
||||
#### The 7 Principles of Reader Expectations
|
||||
|
||||
**Principle 1: Subject-Verb Proximity**
|
||||
|
||||
Keep grammatical subject and verb close together. Anything intervening reads as interruption of lesser importance.
|
||||
|
||||
**Weak**: "The model, which was trained on 100M tokens and fine-tuned on domain-specific data using LoRA with rank 16, achieves state-of-the-art results"
|
||||
|
||||
**Strong**: "The model achieves state-of-the-art results after training on 100M tokens and fine-tuning with LoRA (rank 16)"
|
||||
|
||||
**Principle 2: Stress Position (Save the Best for Last)**
|
||||
|
||||
Readers naturally emphasize the **last words of a sentence**. Place your most important information there.
|
||||
|
||||
**Weak**: "Accuracy improves by 15% when using attention"
|
||||
**Strong**: "When using attention, accuracy improves by **15%**"
|
||||
|
||||
**Principle 3: Topic Position (First Things First)**
|
||||
|
||||
The beginning of a sentence establishes perspective. Put the "whose story" element first—readers expect the sentence to be about whoever shows up first.
|
||||
|
||||
**Weak**: "A novel attention mechanism that computes alignment scores is introduced"
|
||||
**Strong**: "To address the alignment problem, we introduce a novel attention mechanism"
|
||||
|
||||
**Principle 4: Old Information Before New**
|
||||
|
||||
Put familiar information (old) in the topic position for backward linkage; put new information in the stress position for emphasis.
|
||||
|
||||
**Weak**: "Sparse attention was introduced by Child et al. The quadratic complexity of standard attention motivates this work."
|
||||
**Strong**: "Standard attention has quadratic complexity. To address this, Child et al. introduced sparse attention."
|
||||
|
||||
**Principle 5: One Unit, One Function**
|
||||
|
||||
Each unit of discourse (sentence, paragraph, section) should serve a single function. If you have two points, use two units.
|
||||
|
||||
**Principle 6: Articulate Action in the Verb**
|
||||
|
||||
Express the action of each sentence in its verb, not in nominalized nouns.
|
||||
|
||||
**Weak**: "We performed an analysis of the results" (nominalization)
|
||||
**Strong**: "We analyzed the results" (action in verb)
|
||||
|
||||
**Principle 7: Context Before New Information**
|
||||
|
||||
Provide context before asking the reader to consider anything new. This applies at all levels—sentence, paragraph, section.
|
||||
|
||||
**Weak**: "Equation 3 shows that convergence is guaranteed when the learning rate satisfies..."
|
||||
**Strong**: "For convergence to be guaranteed, the learning rate must satisfy the condition in Equation 3..."
|
||||
|
||||
#### Summary Table
|
||||
|
||||
| Principle | Rule | Mnemonic |
|
||||
|-----------|------|----------|
|
||||
| Subject-Verb Proximity | Keep subject and verb close | "Don't interrupt yourself" |
|
||||
| Stress Position | Emphasis at sentence end | "Save the best for last" |
|
||||
| Topic Position | Context at sentence start | "First things first" |
|
||||
| Old Before New | Familiar → unfamiliar | "Build on known ground" |
|
||||
| One Unit, One Function | Each paragraph = one point | "One idea per container" |
|
||||
| Action in Verb | Use verbs, not nominalizations | "Verbs do, nouns sit" |
|
||||
| Context Before New | Explain before presenting | "Set the stage first" |
|
||||
|
||||
---
|
||||
|
||||
## Micro-Level Writing Tips
|
||||
|
||||
### From Ethan Perez (Anthropic)
|
||||
|
||||
These practical micro-level tips improve clarity at the sentence and word level.
|
||||
|
||||
#### Pronoun Management
|
||||
|
||||
**Minimize pronouns** ("this," "it," "these," "that"). When pronouns are necessary, use them as adjectives with a noun:
|
||||
|
||||
**Weak**: "This shows that the model converges."
|
||||
**Strong**: "This result shows that the model converges."
|
||||
|
||||
**Weak**: "It improves performance."
|
||||
**Strong**: "This modification improves performance."
|
||||
|
||||
#### Verb Placement
|
||||
|
||||
**Position verbs early** in sentences for better parsing:
|
||||
|
||||
**Weak**: "The gradient, after being computed and normalized, updates the weights."
|
||||
**Strong**: "The gradient updates the weights after being computed and normalized."
|
||||
|
||||
#### Apostrophe Unfolding
|
||||
|
||||
Transform possessive constructions for clarity:
|
||||
|
||||
**Original**: "X's Y" → **Unfolded**: "The Y of X"
|
||||
|
||||
**Before**: "The model's accuracy on the test set"
|
||||
**After**: "The accuracy of the model on the test set"
|
||||
|
||||
This isn't always better, but when sentences feel awkward, try unfolding.
|
||||
|
||||
#### Words to Eliminate
|
||||
|
||||
Delete these filler words in almost all cases:
|
||||
- "actually"
|
||||
- "a bit"
|
||||
- "fortunately" / "unfortunately"
|
||||
- "very" / "really"
|
||||
- "quite"
|
||||
- "basically"
|
||||
- "essentially"
|
||||
- Excessive connectives ("however," "moreover," "furthermore" when not needed)
|
||||
|
||||
#### Sentence Construction Rules
|
||||
|
||||
1. **One idea per sentence** - If struggling to express an idea in one sentence, it needs two
|
||||
2. **No repeated sounds** - Avoid similar-sounding words in the same sentence
|
||||
3. **Every sentence adds information** - Delete sentences that merely restate
|
||||
4. **Active voice always** - Specify the actor ("We find..." not "It is found...")
|
||||
5. **Expand contractions** - "don't" → "do not" for formality
|
||||
|
||||
#### Paragraph Architecture
|
||||
|
||||
- **First sentence**: State the point clearly
|
||||
- **Middle sentences**: Support with evidence
|
||||
- **Last sentence**: Reinforce or transition
|
||||
|
||||
Don't bury key information in the middle of paragraphs.
|
||||
|
||||
---
|
||||
|
||||
## Word Choice and Precision
|
||||
|
||||
### From Zachary Lipton
|
||||
|
||||
**Eliminate hedging** unless genuine uncertainty exists:
|
||||
- Delete "may" and "can" unless necessary
|
||||
- "provides *very* tight approximation" drips with insecurity
|
||||
- "provides tight approximation" is confident
|
||||
|
||||
**Avoid vacuous intensifiers**:
|
||||
- Delete: very, extremely, highly, significantly (unless statistical)
|
||||
- These words signal insecurity, not strength
|
||||
|
||||
### From Jacob Steinhardt
|
||||
|
||||
**Precision over brevity**: Replace vague terms with specific ones.
|
||||
|
||||
| Vague | Specific |
|
||||
|-------|----------|
|
||||
| performance | accuracy, latency, throughput |
|
||||
| improves | increases accuracy by X%, reduces latency by Y |
|
||||
| large | 1B parameters, 100M tokens |
|
||||
| fast | 3x faster, 50ms latency |
|
||||
| good results | 92% accuracy, 0.85 F1 |
|
||||
|
||||
**Consistent terminology**: Referring to the same concept with different terms creates confusion.
|
||||
|
||||
**Choose one and stick with it**:
|
||||
- "model" vs "network" vs "architecture"
|
||||
- "training" vs "learning" vs "optimization"
|
||||
- "sample" vs "example" vs "instance"
|
||||
|
||||
### Vocabulary Signaling
|
||||
|
||||
**Avoid words signaling incremental work**:
|
||||
- Never: "combine," "modify," "expand," "extend"
|
||||
- Instead: "develop," "propose," "introduce"
|
||||
|
||||
**Why**: "We combine X and Y" sounds like you stapled two existing ideas together. "We develop a method that leverages X for Y" sounds like genuine contribution.
|
||||
|
||||
---
|
||||
|
||||
## Mathematical Writing
|
||||
|
||||
### From Ethan Perez
|
||||
|
||||
**Unfold apostrophes** for clarity:
|
||||
- Weak: "X's Y"
|
||||
- Strong: "The Y of X"
|
||||
|
||||
Example: "the model's accuracy" → "the accuracy of the model"
|
||||
|
||||
### General Principles
|
||||
|
||||
1. **State all assumptions formally** before theorems
|
||||
2. **Provide intuitive explanations** alongside proofs
|
||||
3. **Use consistent notation** throughout the paper
|
||||
4. **Define symbols at first use**
|
||||
|
||||
### Notation Conventions
|
||||
|
||||
```latex
|
||||
% Scalars: lowercase italic
|
||||
$x$, $y$, $\alpha$, $\beta$
|
||||
|
||||
% Vectors: lowercase bold
|
||||
$\mathbf{x}$, $\mathbf{v}$
|
||||
|
||||
% Matrices: uppercase bold
|
||||
$\mathbf{W}$, $\mathbf{X}$
|
||||
|
||||
% Sets: uppercase calligraphic
|
||||
$\mathcal{X}$, $\mathcal{D}$
|
||||
|
||||
% Functions: roman for named functions
|
||||
$\mathrm{softmax}$, $\mathrm{ReLU}$
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Figure Design
|
||||
|
||||
### From Neel Nanda
|
||||
|
||||
Figures should tell a coherent story even if the reader skips the text. Many readers DO skip the text initially.
|
||||
|
||||
### Design Principles
|
||||
|
||||
1. **Figure 1 is crucial**: Often the first thing readers examine after abstract
|
||||
2. **Self-contained captions**: Reader should understand figure without main text
|
||||
3. **No title inside figure**: The caption serves this function (ICML/NeurIPS rule)
|
||||
4. **Vector graphics**: PDF/EPS for plots, PNG (600 DPI) only for photographs
|
||||
|
||||
### Accessibility Requirements
|
||||
|
||||
8% of men have color vision deficiency. Your figures must work for them.
|
||||
|
||||
**Solutions**:
|
||||
- Use colorblind-safe palettes: Okabe-Ito or Paul Tol
|
||||
- Avoid red-green combinations
|
||||
- Verify figures work in grayscale
|
||||
- Use different line styles (solid, dashed, dotted) in addition to colors
|
||||
|
||||
### Tools
|
||||
|
||||
```python
|
||||
# SciencePlots: Publication-ready styles
|
||||
import matplotlib.pyplot as plt
|
||||
plt.style.use(['science', 'ieee'])
|
||||
|
||||
# Or for Nature-style
|
||||
plt.style.use(['science', 'nature'])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes to Avoid
|
||||
|
||||
### Structure Mistakes
|
||||
|
||||
| Mistake | Solution |
|
||||
|---------|----------|
|
||||
| Introduction too long (>1.5 pages) | Move background to Related Work |
|
||||
| Methods buried (after page 3) | Front-load contribution, cut intro |
|
||||
| Missing contribution bullets | Add 2-4 specific, falsifiable claims |
|
||||
| Experiments without explicit claims | State what each experiment tests |
|
||||
|
||||
### Writing Mistakes
|
||||
|
||||
| Mistake | Solution |
|
||||
|---------|----------|
|
||||
| Generic abstract opening | Start with your specific contribution |
|
||||
| Inconsistent terminology | Choose one term per concept |
|
||||
| Passive voice overuse | Use active voice: "We show" not "It is shown" |
|
||||
| Hedging everywhere | Be confident unless genuinely uncertain |
|
||||
|
||||
### Figure Mistakes
|
||||
|
||||
| Mistake | Solution |
|
||||
|---------|----------|
|
||||
| Raster graphics for plots | Use vector (PDF/EPS) |
|
||||
| Red-green color scheme | Use colorblind-safe palette |
|
||||
| Title inside figure | Put title in caption |
|
||||
| Captions require main text | Make captions self-contained |
|
||||
|
||||
### Citation Mistakes
|
||||
|
||||
| Mistake | Solution |
|
||||
|---------|----------|
|
||||
| Paper-by-paper Related Work | Organize methodologically |
|
||||
| Missing relevant citations | Reviewers authored papers—cite generously |
|
||||
| AI-generated citations | Always verify via APIs |
|
||||
| Inconsistent citation format | Use BibLaTeX with consistent keys |
|
||||
|
||||
---
|
||||
|
||||
## Pre-Submission Checklist
|
||||
|
||||
Before submitting, verify:
|
||||
|
||||
**Narrative**:
|
||||
- [ ] Can state contribution in one sentence
|
||||
- [ ] Three pillars (What/Why/So What) clear in intro
|
||||
- [ ] Every experiment supports a specific claim
|
||||
|
||||
**Structure**:
|
||||
- [ ] Abstract follows 5-sentence formula
|
||||
- [ ] Introduction ≤1.5 pages
|
||||
- [ ] Methods start by page 2-3
|
||||
- [ ] 2-4 contribution bullets included
|
||||
- [ ] Limitations section present
|
||||
|
||||
**Writing**:
|
||||
- [ ] Consistent terminology throughout
|
||||
- [ ] No generic opening sentences
|
||||
- [ ] Hedging removed unless necessary
|
||||
- [ ] All figures have self-contained captions
|
||||
|
||||
**Technical**:
|
||||
- [ ] All citations verified via API
|
||||
- [ ] Error bars included with methodology
|
||||
- [ ] Compute resources documented
|
||||
- [ ] Code/data availability stated
|
||||
Reference in New Issue
Block a user