Sync all skills and memories 2026-04-14 07:27

2026-04-14 07:27:20 +09:00
parent 516bb44fe6
commit 1eba2bca95
386 changed files with 167655 additions and 0 deletions
--- a/skills/research/research-paper-writing/references/writing-guide.md
+++ b/skills/research/research-paper-writing/references/writing-guide.md
@@ -0,0 +1,474 @@
+# ML Paper Writing Philosophy & Best Practices
+
+This reference compiles writing advice from prominent ML researchers including Neel Nanda, Andrej Karpathy, Sebastian Farquhar, Zachary Lipton, and Jacob Steinhardt.
+
+---
+
+## Contents
+
+- [The Narrative Principle](#the-narrative-principle)
+- [Time Allocation](#time-allocation)
+- [Abstract Writing Formula](#abstract-writing-formula)
+- [Introduction Structure](#introduction-structure)
+- [Sentence-Level Clarity](#sentence-level-clarity)
+- [Word Choice and Precision](#word-choice-and-precision)
+- [Mathematical Writing](#mathematical-writing)
+- [Figure Design](#figure-design)
+- [Common Mistakes to Avoid](#common-mistakes-to-avoid)
+
+---
+
+## The Narrative Principle
+
+### From Neel Nanda
+
+"A paper is a short, rigorous, evidence-based technical story with a takeaway readers care about."
+
+The narrative rests on three pillars that must be crystal clear by the end of your introduction:
+
+**The "What"**: One to three specific novel claims fitting within a cohesive theme. Vague contributions like "we study X" fail immediately—reviewers need precise, falsifiable claims.
+
+**The "Why"**: Rigorous empirical evidence that convincingly supports those claims, including strong baselines honestly tuned and experiments that distinguish between competing hypotheses rather than merely showing "decent results."
+
+**The "So What"**: Why readers should care, connecting your contribution to problems the community recognizes as important.
+
+### From Andrej Karpathy
+
+"A paper is not a random collection of experiments you report on. The paper sells a single thing that was not obvious or present before. The entire paper is organized around this core contribution with surgical precision."
+
+This applies whether you're presenting a new architecture, a theoretical result, or improved understanding of existing methods—NeurIPS explicitly notes that "originality does not necessarily require an entirely new method."
+
+**Practical Implication**: If you cannot state your contribution in one sentence, you don't yet have a paper. Everything else—experiments, related work, discussion—exists only to support that core claim.
+
+---
+
+## Time Allocation
+
+### From Neel Nanda
+
+Spend approximately **the same amount of time** on each of:
+1. The abstract
+2. The introduction
+3. The figures
+4. Everything else combined
+
+This isn't hyperbole—most reviewers form preliminary judgments before reaching your methods section. Readers encounter your paper in a predictable pattern: **title → abstract → introduction → figures → maybe the rest.**
+
+### Reviewer Reading Patterns
+
+Studies of reviewer behavior show:
+- Abstract is read 100% of the time
+- Introduction is skimmed by 90%+ of reviewers
+- Figures are examined before methods by most reviewers
+- Full methods are read only if interest is established
+
+**Implication**: Front-load your paper's value. Don't bury the contribution.
+
+---
+
+## Abstract Writing Formula
+
+### Sebastian Farquhar's 5-Sentence Formula
+
+1. **What you achieved**: "We introduce...", "We prove...", "We demonstrate..."
+2. **Why this is hard and important**
+3. **How you do it** (with specialist keywords for discoverability)
+4. **What evidence you have**
+5. **Your most remarkable number/result**
+
+### Example (Good Abstract)
+
+```
+We prove that gradient descent on overparameterized neural networks
+converges to global minima at a linear rate. [What]
+This resolves a fundamental question about why deep learning works
+despite non-convex optimization landscapes. [Why hard/important]
+Our proof relies on showing that the Neural Tangent Kernel remains
+approximately constant during training, reducing the problem to
+kernel regression. [How with keywords]
+We validate our theory on CIFAR-10 and ImageNet, showing that
+predicted convergence rates match experiments within 5%. [Evidence]
+This is the first polynomial-time convergence guarantee for
+networks with practical depth and width. [Remarkable result]
+```
+
+### What to Avoid
+
+From Zachary Lipton: "If the first sentence can be pre-pended to any ML paper, delete it."
+
+**Delete these openings**:
+- "Large language models have achieved remarkable success..."
+- "Deep learning has revolutionized..."
+- "In recent years, neural networks have..."
+
+**Start with your specific contribution instead.**
+
+---
+
+## Introduction Structure
+
+### Requirements
+
+- **1-1.5 pages maximum** (in two-column format)
+- **Methods should start by page 2-3**
+- Must include **2-4 bullet contribution list** (max 1-2 lines each)
+
+### Structure Template
+
+```markdown
+1. Opening Hook (2-3 sentences)
+   - State the problem your paper addresses
+   - Why it matters RIGHT NOW
+
+2. Background/Challenge (1 paragraph)
+   - What makes this problem hard?
+   - What have others tried? Why is it insufficient?
+
+3. Your Approach (1 paragraph)
+   - What do you do differently?
+   - Key insight that enables your contribution
+
+4. Contribution Bullets (2-4 items)
+   - Be specific and falsifiable
+   - Each bullet: 1-2 lines maximum
+
+5. Results Preview (2-3 sentences)
+   - Most impressive numbers
+   - Scope of evaluation
+
+6. Paper Organization (optional, 1-2 sentences)
+   - "Section 2 presents... Section 3 describes..."
+```
+
+### Contribution Bullets: Good vs Bad
+
+**Good:**
+- We prove that X converges in O(n log n) time under assumption Y
+- We introduce Z, a 3-layer architecture that reduces memory by 40%
+- We demonstrate that A outperforms B by 15% on benchmark C
+
+**Bad:**
+- We study the problem of X (not a contribution)
+- We provide extensive experiments (too vague)
+- We make several contributions to the field (says nothing)
+
+---
+
+## Sentence-Level Clarity
+
+### From Gopen & Swan: "The Science of Scientific Writing"
+
+The seminal 1990 paper by George Gopen and Judith Swan establishes that **readers have structural expectations** about where information appears in prose. Violating these expectations forces readers to spend energy on structure rather than content.
+
+> "If the reader is to grasp what the writer means, the writer must understand what the reader needs."
+
+#### The 7 Principles of Reader Expectations
+
+**Principle 1: Subject-Verb Proximity**
+
+Keep grammatical subject and verb close together. Anything intervening reads as interruption of lesser importance.
+
+**Weak**: "The model, which was trained on 100M tokens and fine-tuned on domain-specific data using LoRA with rank 16, achieves state-of-the-art results"
+
+**Strong**: "The model achieves state-of-the-art results after training on 100M tokens and fine-tuning with LoRA (rank 16)"
+
+**Principle 2: Stress Position (Save the Best for Last)**
+
+Readers naturally emphasize the **last words of a sentence**. Place your most important information there.
+
+**Weak**: "Accuracy improves by 15% when using attention"
+**Strong**: "When using attention, accuracy improves by **15%**"
+
+**Principle 3: Topic Position (First Things First)**
+
+The beginning of a sentence establishes perspective. Put the "whose story" element first—readers expect the sentence to be about whoever shows up first.
+
+**Weak**: "A novel attention mechanism that computes alignment scores is introduced"
+**Strong**: "To address the alignment problem, we introduce a novel attention mechanism"
+
+**Principle 4: Old Information Before New**
+
+Put familiar information (old) in the topic position for backward linkage; put new information in the stress position for emphasis.
+
+**Weak**: "Sparse attention was introduced by Child et al. The quadratic complexity of standard attention motivates this work."
+**Strong**: "Standard attention has quadratic complexity. To address this, Child et al. introduced sparse attention."
+
+**Principle 5: One Unit, One Function**
+
+Each unit of discourse (sentence, paragraph, section) should serve a single function. If you have two points, use two units.
+
+**Principle 6: Articulate Action in the Verb**
+
+Express the action of each sentence in its verb, not in nominalized nouns.
+
+**Weak**: "We performed an analysis of the results" (nominalization)
+**Strong**: "We analyzed the results" (action in verb)
+
+**Principle 7: Context Before New Information**
+
+Provide context before asking the reader to consider anything new. This applies at all levels—sentence, paragraph, section.
+
+**Weak**: "Equation 3 shows that convergence is guaranteed when the learning rate satisfies..."
+**Strong**: "For convergence to be guaranteed, the learning rate must satisfy the condition in Equation 3..."
+
+#### Summary Table
+
+| Principle | Rule | Mnemonic |
+|-----------|------|----------|
+| Subject-Verb Proximity | Keep subject and verb close | "Don't interrupt yourself" |
+| Stress Position | Emphasis at sentence end | "Save the best for last" |
+| Topic Position | Context at sentence start | "First things first" |
+| Old Before New | Familiar → unfamiliar | "Build on known ground" |
+| One Unit, One Function | Each paragraph = one point | "One idea per container" |
+| Action in Verb | Use verbs, not nominalizations | "Verbs do, nouns sit" |
+| Context Before New | Explain before presenting | "Set the stage first" |
+
+---
+
+## Micro-Level Writing Tips
+
+### From Ethan Perez (Anthropic)
+
+These practical micro-level tips improve clarity at the sentence and word level.
+
+#### Pronoun Management
+
+**Minimize pronouns** ("this," "it," "these," "that"). When pronouns are necessary, use them as adjectives with a noun:
+
+**Weak**: "This shows that the model converges."
+**Strong**: "This result shows that the model converges."
+
+**Weak**: "It improves performance."
+**Strong**: "This modification improves performance."
+
+#### Verb Placement
+
+**Position verbs early** in sentences for better parsing:
+
+**Weak**: "The gradient, after being computed and normalized, updates the weights."
+**Strong**: "The gradient updates the weights after being computed and normalized."
+
+#### Apostrophe Unfolding
+
+Transform possessive constructions for clarity:
+
+**Original**: "X's Y" → **Unfolded**: "The Y of X"
+
+**Before**: "The model's accuracy on the test set"
+**After**: "The accuracy of the model on the test set"
+
+This isn't always better, but when sentences feel awkward, try unfolding.
+
+#### Words to Eliminate
+
+Delete these filler words in almost all cases:
+- "actually"
+- "a bit"
+- "fortunately" / "unfortunately"
+- "very" / "really"
+- "quite"
+- "basically"
+- "essentially"
+- Excessive connectives ("however," "moreover," "furthermore" when not needed)
+
+#### Sentence Construction Rules
+
+1. **One idea per sentence** - If struggling to express an idea in one sentence, it needs two
+2. **No repeated sounds** - Avoid similar-sounding words in the same sentence
+3. **Every sentence adds information** - Delete sentences that merely restate
+4. **Active voice always** - Specify the actor ("We find..." not "It is found...")
+5. **Expand contractions** - "don't" → "do not" for formality
+
+#### Paragraph Architecture
+
+- **First sentence**: State the point clearly
+- **Middle sentences**: Support with evidence
+- **Last sentence**: Reinforce or transition
+
+Don't bury key information in the middle of paragraphs.
+
+---
+
+## Word Choice and Precision
+
+### From Zachary Lipton
+
+**Eliminate hedging** unless genuine uncertainty exists:
+- Delete "may" and "can" unless necessary
+- "provides *very* tight approximation" drips with insecurity
+- "provides tight approximation" is confident
+
+**Avoid vacuous intensifiers**:
+- Delete: very, extremely, highly, significantly (unless statistical)
+- These words signal insecurity, not strength
+
+### From Jacob Steinhardt
+
+**Precision over brevity**: Replace vague terms with specific ones.
+
+| Vague | Specific |
+|-------|----------|
+| performance | accuracy, latency, throughput |
+| improves | increases accuracy by X%, reduces latency by Y |
+| large | 1B parameters, 100M tokens |
+| fast | 3x faster, 50ms latency |
+| good results | 92% accuracy, 0.85 F1 |
+
+**Consistent terminology**: Referring to the same concept with different terms creates confusion.
+
+**Choose one and stick with it**:
+- "model" vs "network" vs "architecture"
+- "training" vs "learning" vs "optimization"
+- "sample" vs "example" vs "instance"
+
+### Vocabulary Signaling
+
+**Avoid words signaling incremental work**:
+- Never: "combine," "modify," "expand," "extend"
+- Instead: "develop," "propose," "introduce"
+
+**Why**: "We combine X and Y" sounds like you stapled two existing ideas together. "We develop a method that leverages X for Y" sounds like genuine contribution.
+
+---
+
+## Mathematical Writing
+
+### From Ethan Perez
+
+**Unfold apostrophes** for clarity:
+- Weak: "X's Y"
+- Strong: "The Y of X"
+
+Example: "the model's accuracy" → "the accuracy of the model"
+
+### General Principles
+
+1. **State all assumptions formally** before theorems
+2. **Provide intuitive explanations** alongside proofs
+3. **Use consistent notation** throughout the paper
+4. **Define symbols at first use**
+
+### Notation Conventions
+
+```latex
+% Scalars: lowercase italic
+$x$, $y$, $\alpha$, $\beta$
+
+% Vectors: lowercase bold
+$\mathbf{x}$, $\mathbf{v}$
+
+% Matrices: uppercase bold
+$\mathbf{W}$, $\mathbf{X}$
+
+% Sets: uppercase calligraphic
+$\mathcal{X}$, $\mathcal{D}$
+
+% Functions: roman for named functions
+$\mathrm{softmax}$, $\mathrm{ReLU}$
+```
+
+---
+
+## Figure Design
+
+### From Neel Nanda
+
+Figures should tell a coherent story even if the reader skips the text. Many readers DO skip the text initially.
+
+### Design Principles
+
+1. **Figure 1 is crucial**: Often the first thing readers examine after abstract
+2. **Self-contained captions**: Reader should understand figure without main text
+3. **No title inside figure**: The caption serves this function (ICML/NeurIPS rule)
+4. **Vector graphics**: PDF/EPS for plots, PNG (600 DPI) only for photographs
+
+### Accessibility Requirements
+
+8% of men have color vision deficiency. Your figures must work for them.
+
+**Solutions**:
+- Use colorblind-safe palettes: Okabe-Ito or Paul Tol
+- Avoid red-green combinations
+- Verify figures work in grayscale
+- Use different line styles (solid, dashed, dotted) in addition to colors
+
+### Tools
+
+```python
+# SciencePlots: Publication-ready styles
+import matplotlib.pyplot as plt
+plt.style.use(['science', 'ieee'])
+
+# Or for Nature-style
+plt.style.use(['science', 'nature'])
+```
+
+---
+
+## Common Mistakes to Avoid
+
+### Structure Mistakes
+
+| Mistake | Solution |
+|---------|----------|
+| Introduction too long (>1.5 pages) | Move background to Related Work |
+| Methods buried (after page 3) | Front-load contribution, cut intro |
+| Missing contribution bullets | Add 2-4 specific, falsifiable claims |
+| Experiments without explicit claims | State what each experiment tests |
+
+### Writing Mistakes
+
+| Mistake | Solution |
+|---------|----------|
+| Generic abstract opening | Start with your specific contribution |
+| Inconsistent terminology | Choose one term per concept |
+| Passive voice overuse | Use active voice: "We show" not "It is shown" |
+| Hedging everywhere | Be confident unless genuinely uncertain |
+
+### Figure Mistakes
+
+| Mistake | Solution |
+|---------|----------|
+| Raster graphics for plots | Use vector (PDF/EPS) |
+| Red-green color scheme | Use colorblind-safe palette |
+| Title inside figure | Put title in caption |
+| Captions require main text | Make captions self-contained |
+
+### Citation Mistakes
+
+| Mistake | Solution |
+|---------|----------|
+| Paper-by-paper Related Work | Organize methodologically |
+| Missing relevant citations | Reviewers authored papers—cite generously |
+| AI-generated citations | Always verify via APIs |
+| Inconsistent citation format | Use BibLaTeX with consistent keys |
+
+---
+
+## Pre-Submission Checklist
+
+Before submitting, verify:
+
+**Narrative**:
+- [ ] Can state contribution in one sentence
+- [ ] Three pillars (What/Why/So What) clear in intro
+- [ ] Every experiment supports a specific claim
+
+**Structure**:
+- [ ] Abstract follows 5-sentence formula
+- [ ] Introduction ≤1.5 pages
+- [ ] Methods start by page 2-3
+- [ ] 2-4 contribution bullets included
+- [ ] Limitations section present
+
+**Writing**:
+- [ ] Consistent terminology throughout
+- [ ] No generic opening sentences
+- [ ] Hedging removed unless necessary
+- [ ] All figures have self-contained captions
+
+**Technical**:
+- [ ] All citations verified via API
+- [ ] Error bars included with methodology
+- [ ] Compute resources documented
+- [ ] Code/data availability stated