Sync all skills and memories 2026-04-14 07:27
This commit is contained in:
290
skills/creative/ascii-video/README.md
Normal file
290
skills/creative/ascii-video/README.md
Normal file
@@ -0,0 +1,290 @@
|
||||
# ☤ ASCII Video
|
||||
|
||||
Renders any content as colored ASCII character video. Audio, video, images, text, or pure math in, MP4/GIF/PNG sequence out. Full RGB color per character cell, 1080p 24fps default. No GPU.
|
||||
|
||||
Built for [Hermes Agent](https://github.com/NousResearch/hermes-agent). Usable in any coding agent. Canonical source lives here; synced to [`NousResearch/hermes-agent/skills/creative/ascii-video`](https://github.com/NousResearch/hermes-agent/tree/main/skills/creative/ascii-video) via PR.
|
||||
|
||||
## What this is
|
||||
|
||||
A skill that teaches an agent how to build single-file Python renderers for ASCII video from scratch. The agent gets the full pipeline: grid system, font rasterization, effect library, shader chain, audio analysis, parallel encoding. It writes the renderer, runs it, gets video.
|
||||
|
||||
The output is actual video. Not terminal escape codes. Frames are computed as grids of colored characters, composited onto pixel canvases with pre-rasterized font bitmaps, post-processed through shaders, piped to ffmpeg.
|
||||
|
||||
## Modes
|
||||
|
||||
| Mode | Input | Output |
|
||||
|------|-------|--------|
|
||||
| Video-to-ASCII | A video file | ASCII recreation of the footage |
|
||||
| Audio-reactive | An audio file | Visuals driven by frequency bands, beats, energy |
|
||||
| Generative | Nothing | Procedural animation from math |
|
||||
| Hybrid | Video + audio | ASCII video with audio-reactive overlays |
|
||||
| Lyrics/text | Audio + timed text (SRT) | Karaoke-style text with effects |
|
||||
| TTS narration | Text quotes + API key | Narrated video with typewriter text and generated speech |
|
||||
|
||||
## Pipeline
|
||||
|
||||
Every mode follows the same 6-stage path:
|
||||
|
||||
```
|
||||
INPUT --> ANALYZE --> SCENE_FN --> TONEMAP --> SHADE --> ENCODE
|
||||
```
|
||||
|
||||
1. **Input** loads source material (or nothing for generative).
|
||||
2. **Analyze** extracts per-frame features. Audio gets 6-band FFT, RMS, spectral centroid, flatness, flux, beat detection with exponential decay. Video gets luminance, edges, motion.
|
||||
3. **Scene function** returns a pixel canvas directly. Composes multiple character grids at different densities, value/hue fields, pixel blend modes. This is where the visuals happen.
|
||||
4. **Tonemap** does adaptive percentile-based brightness normalization with per-scene gamma. ASCII on black is inherently dark. Linear multipliers don't work. This does.
|
||||
5. **Shade** runs a `ShaderChain` (38 composable shaders) plus a `FeedbackBuffer` for temporal recursion with spatial transforms.
|
||||
6. **Encode** pipes raw RGB frames to ffmpeg for H.264 encoding. Segments concatenated, audio muxed.
|
||||
|
||||
## Grid system
|
||||
|
||||
Characters render on fixed-size grids. Layer multiple densities for depth.
|
||||
|
||||
| Size | Font | Grid at 1080p | Use |
|
||||
|------|------|---------------|-----|
|
||||
| xs | 8px | 400x108 | Ultra-dense data fields |
|
||||
| sm | 10px | 320x83 | Rain, starfields |
|
||||
| md | 16px | 192x56 | Default balanced |
|
||||
| lg | 20px | 160x45 | Readable text |
|
||||
| xl | 24px | 137x37 | Large titles |
|
||||
| xxl | 40px | 80x22 | Giant minimal |
|
||||
|
||||
Rendering the same scene on `sm` and `lg` then screen-blending them creates natural texture interference. Fine detail shows through gaps in coarse characters. Most scenes use two or three grids.
|
||||
|
||||
## Character palettes (24)
|
||||
|
||||
Each sorted dark-to-bright, each a different visual texture. Validated against the font at init so broken glyphs get dropped silently.
|
||||
|
||||
| Family | Examples | Feel |
|
||||
|--------|----------|------|
|
||||
| Density ramps | ` .:-=+#@█` | Classic ASCII art gradient |
|
||||
| Block elements | ` ░▒▓█▄▀▐▌` | Chunky, digital |
|
||||
| Braille | ` ⠁⠂⠃...⠿` | Fine-grained pointillism |
|
||||
| Dots | ` ⋅∘∙●◉◎` | Smooth, organic |
|
||||
| Stars | ` ·✧✦✩✨★✶` | Sparkle, celestial |
|
||||
| Half-fills | ` ◔◑◕◐◒◓◖◗◙` | Directional fill progression |
|
||||
| Crosshatch | ` ▣▤▥▦▧▨▩` | Hatched density ramp |
|
||||
| Math | ` ·∘∙•°±×÷≈≠≡∞∫∑Ω` | Scientific, abstract |
|
||||
| Box drawing | ` ─│┌┐└┘├┤┬┴┼` | Structural, circuit-like |
|
||||
| Katakana | ` ·ヲァィゥェォャュ...` | Matrix rain |
|
||||
| Greek | ` αβγδεζηθ...ω` | Classical, academic |
|
||||
| Runes | ` ᚠᚢᚦᚱᚷᛁᛇᛒᛖᛚᛞᛟ` | Mystical, ancient |
|
||||
| Alchemical | ` ☉☽♀♂♃♄♅♆♇` | Esoteric |
|
||||
| Arrows | ` ←↑→↓↔↕↖↗↘↙` | Directional, kinetic |
|
||||
| Music | ` ♪♫♬♩♭♮♯○●` | Musical |
|
||||
| Project-specific | ` .·~=≈∞⚡☿✦★⊕◊◆▲▼●■` | Themed per project |
|
||||
|
||||
Custom palettes are built per project to match the content.
|
||||
|
||||
## Color strategies
|
||||
|
||||
| Strategy | How it maps hue | Good for |
|
||||
|----------|----------------|----------|
|
||||
| Angle-mapped | Position angle from center | Rainbow radial effects |
|
||||
| Distance-mapped | Distance from center | Depth, tunnels |
|
||||
| Frequency-mapped | Audio spectral centroid | Timbral shifting |
|
||||
| Value-mapped | Brightness level | Heat maps, fire |
|
||||
| Time-cycled | Slow rotation over time | Ambient, chill |
|
||||
| Source-sampled | Original video pixel colors | Video-to-ASCII |
|
||||
| Palette-indexed | Discrete lookup table | Retro, flat graphic |
|
||||
| Temperature | Warm-to-cool blend | Emotional tone |
|
||||
| Complementary | Hue + opposite | Bold, dramatic |
|
||||
| Triadic | Three equidistant hues | Psychedelic, vibrant |
|
||||
| Analogous | Neighboring hues | Harmonious, subtle |
|
||||
| Monochrome | Fixed hue, vary S/V | Noir, focused |
|
||||
|
||||
Plus 10 discrete RGB palettes (neon, pastel, cyberpunk, vaporwave, earth, ice, blood, forest, mono-green, mono-amber).
|
||||
|
||||
Full OKLAB/OKLCH color system: sRGB↔linear↔OKLAB conversion pipeline, perceptually uniform gradient interpolation, and color harmony generation (complementary, triadic, analogous, split-complementary, tetradic).
|
||||
|
||||
## Value field generators (21)
|
||||
|
||||
Value fields are the core visual building blocks. Each produces a 2D float array in [0, 1] mapping every grid cell to a brightness value.
|
||||
|
||||
### Trigonometric (12)
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| Sine field | Layered multi-sine interference, general-purpose background |
|
||||
| Smooth noise | Multi-octave sine approximation of Perlin noise |
|
||||
| Rings | Concentric rings, bass-driven count and wobble |
|
||||
| Spiral | Logarithmic spiral arms, configurable arm count/tightness |
|
||||
| Tunnel | Infinite depth perspective (inverse distance) |
|
||||
| Vortex | Twisting radial pattern, distance modulates angle |
|
||||
| Interference | N overlapping sine waves creating moire |
|
||||
| Aurora | Horizontal flowing bands |
|
||||
| Ripple | Concentric waves from configurable source points |
|
||||
| Plasma | Sum of sines at multiple orientations/speeds |
|
||||
| Diamond | Diamond/checkerboard pattern |
|
||||
| Noise/static | Random per-cell per-frame flicker |
|
||||
|
||||
### Noise-based (4)
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| Value noise | Smooth organic noise, no axis-alignment artifacts |
|
||||
| fBM | Fractal Brownian Motion — octaved noise for clouds, terrain, smoke |
|
||||
| Domain warp | Inigo Quilez technique — fBM-driven coordinate distortion for flowing organic forms |
|
||||
| Voronoi | Moving seed points with distance, edge, and cell-ID output modes |
|
||||
|
||||
### Simulation-based (4)
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| Reaction-diffusion | Gray-Scott with 7 presets: coral, spots, worms, labyrinths, mitosis, pulsating, chaos |
|
||||
| Cellular automata | Game of Life + 4 rule variants with analog fade trails |
|
||||
| Strange attractors | Clifford, De Jong, Bedhead — iterated point systems binned to density fields |
|
||||
| Temporal noise | 3D noise that morphs in-place without directional drift |
|
||||
|
||||
### SDF-based
|
||||
|
||||
7 signed distance field primitives (circle, box, ring, line, triangle, star, heart) with smooth boolean combinators (union, intersection, subtraction, smooth union/subtraction) and infinite tiling. Render as solid fills or glowing outlines.
|
||||
|
||||
## Hue field generators (9)
|
||||
|
||||
Determine per-cell color independent of brightness: fixed hue, angle-mapped rainbow, distance gradient, time-cycled rotation, audio spectral centroid, horizontal/vertical gradients, plasma variation, perceptually uniform OKLCH rainbow.
|
||||
|
||||
## Coordinate transforms (11)
|
||||
|
||||
UV-space transforms applied before effect evaluation: rotate, scale, skew, tile (with mirror seaming), polar, inverse-polar, twist (rotation increasing with distance), fisheye, wave displacement, Möbius conformal transformation. `make_tgrid()` wraps transformed coordinates into a grid object.
|
||||
|
||||
## Particle systems (9)
|
||||
|
||||
| Type | Behavior |
|
||||
|------|----------|
|
||||
| Explosion | Beat-triggered radial burst with gravity and life decay |
|
||||
| Embers | Rising from bottom with horizontal drift |
|
||||
| Dissolving cloud | Spreading outward with accelerating fade |
|
||||
| Starfield | 3D projected, Z-depth stars approaching with streak trails |
|
||||
| Orbit | Circular/elliptical paths around center |
|
||||
| Gravity well | Attracted toward configurable point sources |
|
||||
| Boid flocking | Separation/alignment/cohesion with spatial hash for O(n) neighbors |
|
||||
| Flow-field | Steered by gradient of any value field |
|
||||
| Trail particles | Fading lines between current and previous positions |
|
||||
|
||||
14 themed particle character sets (energy, spark, leaf, snow, rain, bubble, data, hex, binary, rune, zodiac, dot, dash).
|
||||
|
||||
## Temporal coherence
|
||||
|
||||
10 easing functions (linear, quad, cubic, expo, elastic, bounce — in/out/in-out). Keyframe interpolation with eased transitions. Value field morphing (smooth crossfade between fields). Value field sequencing (cycle through fields with crossfade). Temporal noise (3D noise evolving smoothly in-place).
|
||||
|
||||
## Shader pipeline
|
||||
|
||||
38 composable shaders, applied to the pixel canvas after character rendering. Configurable per section.
|
||||
|
||||
| Category | Shaders |
|
||||
|----------|---------|
|
||||
| Geometry | CRT barrel, pixelate, wave distort, displacement map, kaleidoscope, mirror (h/v/quad/diag) |
|
||||
| Channel | Chromatic aberration (beat-reactive), channel shift, channel swap, RGB split radial |
|
||||
| Color | Invert, posterize, threshold, solarize, hue rotate, saturation, color grade, color wobble, color ramp |
|
||||
| Glow/Blur | Bloom, edge glow, soft focus, radial blur |
|
||||
| Noise | Film grain (beat-reactive), static noise |
|
||||
| Lines/Patterns | Scanlines, halftone |
|
||||
| Tone | Vignette, contrast, gamma, levels, brightness |
|
||||
| Glitch/Data | Glitch bands (beat-reactive), block glitch, pixel sort, data bend |
|
||||
|
||||
12 color tint presets: warm, cool, matrix green, amber, sepia, neon pink, ice, blood, forest, void, sunset, neutral.
|
||||
|
||||
7 mood presets for common shader combos:
|
||||
|
||||
| Mood | Shaders |
|
||||
|------|---------|
|
||||
| Retro terminal | CRT + scanlines + grain + amber/green tint |
|
||||
| Clean modern | Light bloom + subtle vignette |
|
||||
| Glitch art | Heavy chromatic + glitch bands + color wobble |
|
||||
| Cinematic | Bloom + vignette + grain + color grade |
|
||||
| Dreamy | Heavy bloom + soft focus + color wobble |
|
||||
| Harsh/industrial | High contrast + grain + scanlines, no bloom |
|
||||
| Psychedelic | Color wobble + chromatic + kaleidoscope mirror |
|
||||
|
||||
## Blend modes and composition
|
||||
|
||||
20 pixel blend modes for layering canvases: normal, add, subtract, multiply, screen, overlay, softlight, hardlight, difference, exclusion, colordodge, colorburn, linearlight, vividlight, pin_light, hard_mix, lighten, darken, grain_extract, grain_merge. Both sRGB and linear-light blending supported.
|
||||
|
||||
**Feedback buffer.** Temporal recursion — each frame blends with a transformed version of the previous frame. 7 spatial transforms: zoom, shrink, rotate CW/CCW, shift up/down, mirror. Optional per-frame hue shift for rainbow trails. Configurable decay, blend mode, and opacity per scene.
|
||||
|
||||
**Masking.** 16 mask types for spatial compositing: shape masks (circle, rect, ring, gradients), procedural masks (any value field as a mask, text stencils), animated masks (iris open/close, wipe, dissolve), boolean operations (union, intersection, subtraction, invert).
|
||||
|
||||
**Transitions.** Crossfade, directional wipe, radial wipe, dissolve, glitch cut.
|
||||
|
||||
## Scene design patterns
|
||||
|
||||
Compositional patterns for making scenes that look intentional rather than random.
|
||||
|
||||
**Layer hierarchy.** Background (dim atmosphere, dense grid), content (main visual, standard grid), accent (sparse highlights, coarse grid). Three distinct roles, not three competing layers.
|
||||
|
||||
**Directional parameter arcs.** The defining parameter of each scene ramps, accelerates, or builds over its duration. Progress-based formulas (linear, ease-out, step reveal) replace aimless `sin(t)` oscillation.
|
||||
|
||||
**Scene concepts.** Scenes built around visual metaphors (emergence, descent, collision, entropy) with motivated layer/palette/feedback choices. Not named after their effects.
|
||||
|
||||
**Compositional techniques.** Counter-rotating dual systems, wave collision, progressive fragmentation (voronoi cells multiplying over time), entropy (geometry consumed by reaction-diffusion), staggered layer entry (crescendo buildup).
|
||||
|
||||
## Hardware adaptation
|
||||
|
||||
Auto-detects CPU count, RAM, platform, ffmpeg. Adapts worker count, resolution, FPS.
|
||||
|
||||
| Profile | Resolution | FPS | When |
|
||||
|---------|-----------|-----|------|
|
||||
| `draft` | 960x540 | 12 | Check timing/layout |
|
||||
| `preview` | 1280x720 | 15 | Review effects |
|
||||
| `production` | 1920x1080 | 24 | Final output |
|
||||
| `max` | 3840x2160 | 30 | Ultra-high |
|
||||
| `auto` | Detected | 24 | Adapts to hardware + duration |
|
||||
|
||||
`auto` estimates render time and downgrades if it would take over an hour. Low-memory systems drop to 720p automatically.
|
||||
|
||||
### Render times (1080p 24fps, ~180ms/frame/worker)
|
||||
|
||||
| Duration | 4 workers | 8 workers | 16 workers |
|
||||
|----------|-----------|-----------|------------|
|
||||
| 30s | ~3 min | ~2 min | ~1 min |
|
||||
| 2 min | ~13 min | ~7 min | ~4 min |
|
||||
| 5 min | ~33 min | ~17 min | ~9 min |
|
||||
| 10 min | ~65 min | ~33 min | ~17 min |
|
||||
|
||||
720p roughly halves these. 4K roughly quadruples them.
|
||||
|
||||
## Known pitfalls
|
||||
|
||||
**Brightness.** ASCII characters are small bright dots on black. Most frame pixels are background. Linear `* N` multipliers clip highlights and wash out. Use `tonemap()` with per-scene gamma instead. Default gamma 0.75, solarize scenes 0.55, posterize 0.50.
|
||||
|
||||
**Render bottleneck.** The per-cell Python loop compositing font bitmaps runs at ~100-150ms/frame. Unavoidable without Cython/C. Everything else must be vectorized numpy. Python for-loops over rows/cols in effect functions will tank performance.
|
||||
|
||||
**ffmpeg deadlock.** Never `stderr=subprocess.PIPE` on long-running encodes. Buffer fills at ~64KB, process hangs. Redirect stderr to a file.
|
||||
|
||||
**Font cell height.** Pillow's `textbbox()` returns wrong height on macOS. Use `font.getmetrics()` for `ascent + descent`.
|
||||
|
||||
**Font compatibility.** Not all Unicode renders in all fonts. Palettes validated at init, blank glyphs silently removed.
|
||||
|
||||
## Requirements
|
||||
|
||||
◆ Python 3.10+
|
||||
◆ NumPy, Pillow, SciPy (audio modes)
|
||||
◆ ffmpeg on PATH
|
||||
◆ A monospace font (Menlo, Courier, Monaco, auto-detected)
|
||||
◆ Optional: OpenCV, ElevenLabs API key (TTS mode)
|
||||
|
||||
## File structure
|
||||
|
||||
```
|
||||
├── SKILL.md # Modes, workflow, creative direction
|
||||
├── README.md # This file
|
||||
└── references/
|
||||
├── architecture.md # Grid system, fonts, palettes, color, _render_vf()
|
||||
├── effects.md # Value fields, hue fields, backgrounds, particles
|
||||
├── shaders.md # 38 shaders, ShaderChain, tint presets, transitions
|
||||
├── composition.md # Blend modes, multi-grid, tonemap, FeedbackBuffer
|
||||
├── scenes.md # Scene protocol, SCENES table, render_clip(), examples
|
||||
├── design-patterns.md # Layer hierarchy, directional arcs, scene concepts
|
||||
├── inputs.md # Audio analysis, video sampling, text, TTS
|
||||
├── optimization.md # Hardware detection, vectorized patterns, parallelism
|
||||
└── troubleshooting.md # Broadcasting traps, blend pitfalls, diagnostics
|
||||
```
|
||||
|
||||
## Projects built with this
|
||||
|
||||
✦ 85-second highlight reel. 15 scenes (14×5s + 15s crescendo finale), randomized order, directional parameter arcs, layer hierarchy composition. Showcases the full effect vocabulary: fBM, voronoi fragmentation, reaction-diffusion, cellular automata, dual counter-rotating spirals, wave collision, domain warping, tunnel descent, kaleidoscope symmetry, boid flocking, fire simulation, glitch corruption, and a 7-layer crescendo buildup.
|
||||
|
||||
✦ Audio-reactive music visualizer. 3.5 min, 8 sections with distinct effects, beat-triggered particles and glitch, cycling palettes.
|
||||
|
||||
✦ TTS narrated testimonial video. 23 quotes, per-quote ElevenLabs voices, background music at 15% wide stereo, per-clip re-rendering for iterative editing.
|
||||
232
skills/creative/ascii-video/SKILL.md
Normal file
232
skills/creative/ascii-video/SKILL.md
Normal file
@@ -0,0 +1,232 @@
|
||||
---
|
||||
name: ascii-video
|
||||
description: "Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid video+audio reactive, text/lyrics overlays, real-time terminal rendering. Use when users request: ASCII video, text art video, terminal-style video, character art animation, retro text visualization, audio visualizer in ASCII, converting video to ASCII art, matrix-style effects, or any animated ASCII output."
|
||||
---
|
||||
|
||||
# ASCII Video Production Pipeline
|
||||
|
||||
## Creative Standard
|
||||
|
||||
This is visual art. ASCII characters are the medium; cinema is the standard.
|
||||
|
||||
**Before writing a single line of code**, articulate the creative concept. What is the mood? What visual story does this tell? What makes THIS project different from every other ASCII video? The user's prompt is a starting point — interpret it with creative ambition, not literal transcription.
|
||||
|
||||
**First-render excellence is non-negotiable.** The output must be visually striking without requiring revision rounds. If something looks generic, flat, or like "AI-generated ASCII art," it is wrong — rethink the creative concept before shipping.
|
||||
|
||||
**Go beyond the reference vocabulary.** The effect catalogs, shader presets, and palette libraries in the references are a starting vocabulary. For every project, combine, modify, and invent new patterns. The catalog is a palette of paints — you write the painting.
|
||||
|
||||
**Be proactively creative.** Extend the skill's vocabulary when the project calls for it. If the references don't have what the vision demands, build it. Include at least one visual moment the user didn't ask for but will appreciate — a transition, an effect, a color choice that elevates the whole piece.
|
||||
|
||||
**Cohesive aesthetic over technical correctness.** All scenes in a video must feel connected by a unifying visual language — shared color temperature, related character palettes, consistent motion vocabulary. A technically correct video where every scene uses a random different effect is an aesthetic failure.
|
||||
|
||||
**Dense, layered, considered.** Every frame should reward viewing. Never flat black backgrounds. Always multi-grid composition. Always per-scene variation. Always intentional color.
|
||||
|
||||
## Modes
|
||||
|
||||
| Mode | Input | Output | Reference |
|
||||
|------|-------|--------|-----------|
|
||||
| **Video-to-ASCII** | Video file | ASCII recreation of source footage | `references/inputs.md` § Video Sampling |
|
||||
| **Audio-reactive** | Audio file | Generative visuals driven by audio features | `references/inputs.md` § Audio Analysis |
|
||||
| **Generative** | None (or seed params) | Procedural ASCII animation | `references/effects.md` |
|
||||
| **Hybrid** | Video + audio | ASCII video with audio-reactive overlays | Both input refs |
|
||||
| **Lyrics/text** | Audio + text/SRT | Timed text with visual effects | `references/inputs.md` § Text/Lyrics |
|
||||
| **TTS narration** | Text quotes + TTS API | Narrated testimonial/quote video with typed text | `references/inputs.md` § TTS Integration |
|
||||
|
||||
## Stack
|
||||
|
||||
Single self-contained Python script per project. No GPU required.
|
||||
|
||||
| Layer | Tool | Purpose |
|
||||
|-------|------|---------|
|
||||
| Core | Python 3.10+, NumPy | Math, array ops, vectorized effects |
|
||||
| Signal | SciPy | FFT, peak detection (audio modes) |
|
||||
| Imaging | Pillow (PIL) | Font rasterization, frame decoding, image I/O |
|
||||
| Video I/O | ffmpeg (CLI) | Decode input, encode output, mux audio |
|
||||
| Parallel | concurrent.futures | N workers for batch/clip rendering |
|
||||
| TTS | ElevenLabs API (optional) | Generate narration clips |
|
||||
| Optional | OpenCV | Video frame sampling, edge detection |
|
||||
|
||||
## Pipeline Architecture
|
||||
|
||||
Every mode follows the same 6-stage pipeline:
|
||||
|
||||
```
|
||||
INPUT → ANALYZE → SCENE_FN → TONEMAP → SHADE → ENCODE
|
||||
```
|
||||
|
||||
1. **INPUT** — Load/decode source material (video frames, audio samples, images, or nothing)
|
||||
2. **ANALYZE** — Extract per-frame features (audio bands, video luminance/edges, motion vectors)
|
||||
3. **SCENE_FN** — Scene function renders to pixel canvas (`uint8 H,W,3`). Composes multiple character grids via `_render_vf()` + pixel blend modes. See `references/composition.md`
|
||||
4. **TONEMAP** — Percentile-based adaptive brightness normalization. See `references/composition.md` § Adaptive Tonemap
|
||||
5. **SHADE** — Post-processing via `ShaderChain` + `FeedbackBuffer`. See `references/shaders.md`
|
||||
6. **ENCODE** — Pipe raw RGB frames to ffmpeg for H.264/GIF encoding
|
||||
|
||||
## Creative Direction
|
||||
|
||||
### Aesthetic Dimensions
|
||||
|
||||
| Dimension | Options | Reference |
|
||||
|-----------|---------|-----------|
|
||||
| **Character palette** | Density ramps, block elements, symbols, scripts (katakana, Greek, runes, braille), project-specific | `architecture.md` § Palettes |
|
||||
| **Color strategy** | HSV, OKLAB/OKLCH, discrete RGB palettes, auto-generated harmony, monochrome, temperature | `architecture.md` § Color System |
|
||||
| **Background texture** | Sine fields, fBM noise, domain warp, voronoi, reaction-diffusion, cellular automata, video | `effects.md` |
|
||||
| **Primary effects** | Rings, spirals, tunnel, vortex, waves, interference, aurora, fire, SDFs, strange attractors | `effects.md` |
|
||||
| **Particles** | Sparks, snow, rain, bubbles, runes, orbits, flocking boids, flow-field followers, trails | `effects.md` § Particles |
|
||||
| **Shader mood** | Retro CRT, clean modern, glitch art, cinematic, dreamy, industrial, psychedelic | `shaders.md` |
|
||||
| **Grid density** | xs(8px) through xxl(40px), mixed per layer | `architecture.md` § Grid System |
|
||||
| **Coordinate space** | Cartesian, polar, tiled, rotated, fisheye, Möbius, domain-warped | `effects.md` § Transforms |
|
||||
| **Feedback** | Zoom tunnel, rainbow trails, ghostly echo, rotating mandala, color evolution | `composition.md` § Feedback |
|
||||
| **Masking** | Circle, ring, gradient, text stencil, animated iris/wipe/dissolve | `composition.md` § Masking |
|
||||
| **Transitions** | Crossfade, wipe, dissolve, glitch cut, iris, mask-based reveal | `shaders.md` § Transitions |
|
||||
|
||||
### Per-Section Variation
|
||||
|
||||
Never use the same config for the entire video. For each section/scene:
|
||||
- **Different background effect** (or compose 2-3)
|
||||
- **Different character palette** (match the mood)
|
||||
- **Different color strategy** (or at minimum a different hue)
|
||||
- **Vary shader intensity** (more bloom during peaks, more grain during quiet)
|
||||
- **Different particle types** if particles are active
|
||||
|
||||
### Project-Specific Invention
|
||||
|
||||
For every project, invent at least one of:
|
||||
- A custom character palette matching the theme
|
||||
- A custom background effect (combine/modify existing building blocks)
|
||||
- A custom color palette (discrete RGB set matching the brand/mood)
|
||||
- A custom particle character set
|
||||
- A novel scene transition or visual moment
|
||||
|
||||
Don't just pick from the catalog. The catalog is vocabulary — you write the poem.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Creative Vision
|
||||
|
||||
Before any code, articulate the creative concept:
|
||||
|
||||
- **Mood/atmosphere**: What should the viewer feel? Energetic, meditative, chaotic, elegant, ominous?
|
||||
- **Visual story**: What happens over the duration? Build tension? Transform? Dissolve?
|
||||
- **Color world**: Warm/cool? Monochrome? Neon? Earth tones? What's the dominant hue?
|
||||
- **Character texture**: Dense data? Sparse stars? Organic dots? Geometric blocks?
|
||||
- **What makes THIS different**: What's the one thing that makes this project unique?
|
||||
- **Emotional arc**: How do scenes progress? Open with energy, build to climax, resolve?
|
||||
|
||||
Map the user's prompt to aesthetic choices. A "chill lo-fi visualizer" demands different everything from a "glitch cyberpunk data stream."
|
||||
|
||||
### Step 2: Technical Design
|
||||
|
||||
- **Mode** — which of the 6 modes above
|
||||
- **Resolution** — landscape 1920x1080 (default), portrait 1080x1920, square 1080x1080 @ 24fps
|
||||
- **Hardware detection** — auto-detect cores/RAM, set quality profile. See `references/optimization.md`
|
||||
- **Sections** — map timestamps to scene functions, each with its own effect/palette/color/shader config
|
||||
- **Output format** — MP4 (default), GIF (640x360 @ 15fps), PNG sequence
|
||||
|
||||
### Step 3: Build the Script
|
||||
|
||||
Single Python file. Components (with references):
|
||||
|
||||
1. **Hardware detection + quality profile** — `references/optimization.md`
|
||||
2. **Input loader** — mode-dependent; `references/inputs.md`
|
||||
3. **Feature analyzer** — audio FFT, video luminance, or synthetic
|
||||
4. **Grid + renderer** — multi-density grids with bitmap cache; `references/architecture.md`
|
||||
5. **Character palettes** — multiple per project; `references/architecture.md` § Palettes
|
||||
6. **Color system** — HSV + discrete RGB + harmony generation; `references/architecture.md` § Color
|
||||
7. **Scene functions** — each returns `canvas (uint8 H,W,3)`; `references/scenes.md`
|
||||
8. **Tonemap** — adaptive brightness normalization; `references/composition.md`
|
||||
9. **Shader pipeline** — `ShaderChain` + `FeedbackBuffer`; `references/shaders.md`
|
||||
10. **Scene table + dispatcher** — time → scene function + config; `references/scenes.md`
|
||||
11. **Parallel encoder** — N-worker clip rendering with ffmpeg pipes
|
||||
12. **Main** — orchestrate full pipeline
|
||||
|
||||
### Step 4: Quality Verification
|
||||
|
||||
- **Test frames first**: render single frames at key timestamps before full render
|
||||
- **Brightness check**: `canvas.mean() > 8` for all ASCII content. If dark, lower gamma
|
||||
- **Visual coherence**: do all scenes feel like they belong to the same video?
|
||||
- **Creative vision check**: does the output match the concept from Step 1? If it looks generic, go back
|
||||
|
||||
## Critical Implementation Notes
|
||||
|
||||
### Brightness — Use `tonemap()`, Not Linear Multipliers
|
||||
|
||||
This is the #1 visual issue. ASCII on black is inherently dark. **Never use `canvas * N` multipliers** — they clip highlights. Use adaptive tonemap:
|
||||
|
||||
```python
|
||||
def tonemap(canvas, gamma=0.75):
|
||||
f = canvas.astype(np.float32)
|
||||
lo, hi = np.percentile(f[::4, ::4], [1, 99.5])
|
||||
if hi - lo < 10: hi = lo + 10
|
||||
f = np.clip((f - lo) / (hi - lo), 0, 1) ** gamma
|
||||
return (f * 255).astype(np.uint8)
|
||||
```
|
||||
|
||||
Pipeline: `scene_fn() → tonemap() → FeedbackBuffer → ShaderChain → ffmpeg`
|
||||
|
||||
Per-scene gamma: default 0.75, solarize 0.55, posterize 0.50, bright scenes 0.85. Use `screen` blend (not `overlay`) for dark layers.
|
||||
|
||||
### Font Cell Height
|
||||
|
||||
macOS Pillow: `textbbox()` returns wrong height. Use `font.getmetrics()`: `cell_height = ascent + descent`. See `references/troubleshooting.md`.
|
||||
|
||||
### ffmpeg Pipe Deadlock
|
||||
|
||||
Never `stderr=subprocess.PIPE` with long-running ffmpeg — buffer fills at 64KB and deadlocks. Redirect to file. See `references/troubleshooting.md`.
|
||||
|
||||
### Font Compatibility
|
||||
|
||||
Not all Unicode chars render in all fonts. Validate palettes at init — render each char, check for blank output. See `references/troubleshooting.md`.
|
||||
|
||||
### Per-Clip Architecture
|
||||
|
||||
For segmented videos (quotes, scenes, chapters), render each as a separate clip file for parallel rendering and selective re-rendering. See `references/scenes.md`.
|
||||
|
||||
## Performance Targets
|
||||
|
||||
| Component | Budget |
|
||||
|-----------|--------|
|
||||
| Feature extraction | 1-5ms |
|
||||
| Effect function | 2-15ms |
|
||||
| Character render | 80-150ms (bottleneck) |
|
||||
| Shader pipeline | 5-25ms |
|
||||
| **Total** | ~100-200ms/frame |
|
||||
|
||||
## References
|
||||
|
||||
| File | Contents |
|
||||
|------|----------|
|
||||
| `references/architecture.md` | Grid system, resolution presets, font selection, character palettes (20+), color system (HSV + OKLAB + discrete RGB + harmony generation), `_render_vf()` helper, GridLayer class |
|
||||
| `references/composition.md` | Pixel blend modes (20 modes), `blend_canvas()`, multi-grid composition, adaptive `tonemap()`, `FeedbackBuffer`, `PixelBlendStack`, masking/stencil system |
|
||||
| `references/effects.md` | Effect building blocks: value field generators, hue fields, noise/fBM/domain warp, voronoi, reaction-diffusion, cellular automata, SDFs, strange attractors, particle systems, coordinate transforms, temporal coherence |
|
||||
| `references/shaders.md` | `ShaderChain`, `_apply_shader_step()` dispatch, 38 shader catalog, audio-reactive scaling, transitions, tint presets, output format encoding, terminal rendering |
|
||||
| `references/scenes.md` | Scene protocol, `Renderer` class, `SCENES` table, `render_clip()`, beat-synced cutting, parallel rendering, design patterns (layer hierarchy, directional arcs, visual metaphors, compositional techniques), complete scene examples at every complexity level, scene design checklist |
|
||||
| `references/inputs.md` | Audio analysis (FFT, bands, beats), video sampling, image conversion, text/lyrics, TTS integration (ElevenLabs, voice assignment, audio mixing) |
|
||||
| `references/optimization.md` | Hardware detection, quality profiles, vectorized patterns, parallel rendering, memory management, performance budgets |
|
||||
| `references/troubleshooting.md` | NumPy broadcasting traps, blend mode pitfalls, multiprocessing/pickling, brightness diagnostics, ffmpeg issues, font problems, common mistakes |
|
||||
|
||||
---
|
||||
|
||||
## Creative Divergence (use only when user requests experimental/creative/unique output)
|
||||
|
||||
If the user asks for creative, experimental, surprising, or unconventional output, select the strategy that best fits and reason through its steps BEFORE generating code.
|
||||
|
||||
- **Forced Connections** — when the user wants cross-domain inspiration ("make it look organic," "industrial aesthetic")
|
||||
- **Conceptual Blending** — when the user names two things to combine ("ocean meets music," "space + calligraphy")
|
||||
- **Oblique Strategies** — when the user is maximally open ("surprise me," "something I've never seen")
|
||||
|
||||
### Forced Connections
|
||||
1. Pick a domain unrelated to the visual goal (weather systems, microbiology, architecture, fluid dynamics, textile weaving)
|
||||
2. List its core visual/structural elements (erosion → gradual reveal; mitosis → splitting duplication; weaving → interlocking patterns)
|
||||
3. Map those elements onto ASCII characters and animation patterns
|
||||
4. Synthesize — what does "erosion" or "crystallization" look like in a character grid?
|
||||
|
||||
### Conceptual Blending
|
||||
1. Name two distinct visual/conceptual spaces (e.g., ocean waves + sheet music)
|
||||
2. Map correspondences (crests = high notes, troughs = rests, foam = staccato)
|
||||
3. Blend selectively — keep the most interesting mappings, discard forced ones
|
||||
4. Develop emergent properties that exist only in the blend
|
||||
|
||||
### Oblique Strategies
|
||||
1. Draw one: "Honor thy error as a hidden intention" / "Use an old idea" / "What would your closest friend do?" / "Emphasize the flaws" / "Turn it upside down" / "Only a part, not the whole" / "Reverse"
|
||||
2. Interpret the directive against the current ASCII animation challenge
|
||||
3. Apply the lateral insight to the visual design before writing code
|
||||
802
skills/creative/ascii-video/references/architecture.md
Normal file
802
skills/creative/ascii-video/references/architecture.md
Normal file
@@ -0,0 +1,802 @@
|
||||
# Architecture Reference
|
||||
|
||||
> **See also:** composition.md · effects.md · scenes.md · shaders.md · inputs.md · optimization.md · troubleshooting.md
|
||||
|
||||
## Grid System
|
||||
|
||||
### Resolution Presets
|
||||
|
||||
```python
|
||||
RESOLUTION_PRESETS = {
|
||||
"landscape": (1920, 1080), # 16:9 — YouTube, default
|
||||
"portrait": (1080, 1920), # 9:16 — TikTok, Reels, Stories
|
||||
"square": (1080, 1080), # 1:1 — Instagram feed
|
||||
"ultrawide": (2560, 1080), # 21:9 — cinematic
|
||||
"landscape4k":(3840, 2160), # 16:9 — 4K
|
||||
"portrait4k": (2160, 3840), # 9:16 — 4K portrait
|
||||
}
|
||||
|
||||
def get_resolution(preset="landscape", custom=None):
|
||||
"""Returns (VW, VH) tuple."""
|
||||
if custom:
|
||||
return custom
|
||||
return RESOLUTION_PRESETS.get(preset, RESOLUTION_PRESETS["landscape"])
|
||||
```
|
||||
|
||||
### Multi-Density Grids
|
||||
|
||||
Pre-initialize multiple grid sizes. Switch per section for visual variety. Grid dimensions auto-compute from resolution:
|
||||
|
||||
**Landscape (1920x1080):**
|
||||
|
||||
| Key | Font Size | Grid (cols x rows) | Use |
|
||||
|-----|-----------|-------------------|-----|
|
||||
| xs | 8 | 400x108 | Ultra-dense data fields |
|
||||
| sm | 10 | 320x83 | Dense detail, rain, starfields |
|
||||
| md | 16 | 192x56 | Default balanced, transitions |
|
||||
| lg | 20 | 160x45 | Quote/lyric text (readable at 1080p) |
|
||||
| xl | 24 | 137x37 | Short quotes, large titles |
|
||||
| xxl | 40 | 80x22 | Giant text, minimal |
|
||||
|
||||
**Portrait (1080x1920):**
|
||||
|
||||
| Key | Font Size | Grid (cols x rows) | Use |
|
||||
|-----|-----------|-------------------|-----|
|
||||
| xs | 8 | 225x192 | Ultra-dense, tall data columns |
|
||||
| sm | 10 | 180x148 | Dense detail, vertical rain |
|
||||
| md | 16 | 112x100 | Default balanced |
|
||||
| lg | 20 | 90x80 | Readable text (~30 chars/line centered) |
|
||||
| xl | 24 | 75x66 | Short quotes, stacked |
|
||||
| xxl | 40 | 45x39 | Giant text, minimal |
|
||||
|
||||
**Square (1080x1080):**
|
||||
|
||||
| Key | Font Size | Grid (cols x rows) | Use |
|
||||
|-----|-----------|-------------------|-----|
|
||||
| sm | 10 | 180x83 | Dense detail |
|
||||
| md | 16 | 112x56 | Default balanced |
|
||||
| lg | 20 | 90x45 | Readable text |
|
||||
|
||||
**Key differences in portrait mode:**
|
||||
- Fewer columns (90 at `lg` vs 160) — lines must be shorter or wrap
|
||||
- Many more rows (80 at `lg` vs 45) — vertical stacking is natural
|
||||
- Aspect ratio correction flips: `asp = cw / ch` still works but the visual emphasis is vertical
|
||||
- Radial effects appear as tall ellipses unless corrected
|
||||
- Vertical effects (rain, embers, fire columns) are naturally enhanced
|
||||
- Horizontal effects (spectrum bars, waveforms) need rotation or compression
|
||||
|
||||
**Grid sizing for text in portrait**: Use `lg` (20px) for 2-3 word lines. Max comfortable line length is ~25-30 chars. For longer quotes, break aggressively into many short lines stacked vertically — portrait has vertical space to spare. `xl` (24px) works for single words or very short phrases.
|
||||
|
||||
Grid dimensions: `cols = VW // cell_width`, `rows = VH // cell_height`.
|
||||
|
||||
### Font Selection
|
||||
|
||||
Don't hardcode a single font. Choose fonts to match the project's mood. Monospace fonts are required for grid alignment but vary widely in personality:
|
||||
|
||||
| Font | Personality | Platform |
|
||||
|------|-------------|----------|
|
||||
| Menlo | Clean, neutral, Apple-native | macOS |
|
||||
| Monaco | Retro terminal, compact | macOS |
|
||||
| Courier New | Classic typewriter, wide | Cross-platform |
|
||||
| SF Mono | Modern, tight spacing | macOS |
|
||||
| Consolas | Windows native, clean | Windows |
|
||||
| JetBrains Mono | Developer, ligature-ready | Install |
|
||||
| Fira Code | Geometric, modern | Install |
|
||||
| IBM Plex Mono | Corporate, authoritative | Install |
|
||||
| Source Code Pro | Adobe, balanced | Install |
|
||||
|
||||
**Font detection at init**: probe available fonts and fall back gracefully:
|
||||
|
||||
```python
|
||||
import platform
|
||||
|
||||
def find_font(preferences):
|
||||
"""Try fonts in order, return first that exists."""
|
||||
for name, path in preferences:
|
||||
if os.path.exists(path):
|
||||
return path
|
||||
raise FileNotFoundError(f"No monospace font found. Tried: {[p for _,p in preferences]}")
|
||||
|
||||
FONT_PREFS_MACOS = [
|
||||
("Menlo", "/System/Library/Fonts/Menlo.ttc"),
|
||||
("Monaco", "/System/Library/Fonts/Monaco.ttf"),
|
||||
("SF Mono", "/System/Library/Fonts/SFNSMono.ttf"),
|
||||
("Courier", "/System/Library/Fonts/Courier.ttc"),
|
||||
]
|
||||
FONT_PREFS_LINUX = [
|
||||
("DejaVu Sans Mono", "/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf"),
|
||||
("Liberation Mono", "/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf"),
|
||||
("Noto Sans Mono", "/usr/share/fonts/truetype/noto/NotoSansMono-Regular.ttf"),
|
||||
("Ubuntu Mono", "/usr/share/fonts/truetype/ubuntu/UbuntuMono-R.ttf"),
|
||||
]
|
||||
FONT_PREFS_WINDOWS = [
|
||||
("Consolas", r"C:\Windows\Fonts\consola.ttf"),
|
||||
("Courier New", r"C:\Windows\Fonts\cour.ttf"),
|
||||
("Lucida Console", r"C:\Windows\Fonts\lucon.ttf"),
|
||||
("Cascadia Code", os.path.expandvars(r"%LOCALAPPDATA%\Microsoft\Windows\Fonts\CascadiaCode.ttf")),
|
||||
("Cascadia Mono", os.path.expandvars(r"%LOCALAPPDATA%\Microsoft\Windows\Fonts\CascadiaMono.ttf")),
|
||||
]
|
||||
|
||||
def _get_font_prefs():
|
||||
s = platform.system()
|
||||
if s == "Darwin":
|
||||
return FONT_PREFS_MACOS
|
||||
elif s == "Windows":
|
||||
return FONT_PREFS_WINDOWS
|
||||
return FONT_PREFS_LINUX
|
||||
|
||||
FONT_PREFS = _get_font_prefs()
|
||||
```
|
||||
|
||||
**Multi-font rendering**: use different fonts for different layers (e.g., monospace for background, a bolder variant for overlay text). Each GridLayer owns its own font:
|
||||
|
||||
```python
|
||||
grid_bg = GridLayer(find_font(FONT_PREFS), 16) # background
|
||||
grid_text = GridLayer(find_font(BOLD_PREFS), 20) # readable text
|
||||
```
|
||||
|
||||
### Collecting All Characters
|
||||
|
||||
Before initializing grids, gather all characters that need bitmap pre-rasterization:
|
||||
|
||||
```python
|
||||
all_chars = set()
|
||||
for pal in [PAL_DEFAULT, PAL_DENSE, PAL_BLOCKS, PAL_RUNE, PAL_KATA,
|
||||
PAL_GREEK, PAL_MATH, PAL_DOTS, PAL_BRAILLE, PAL_STARS,
|
||||
PAL_HALFFILL, PAL_HATCH, PAL_BINARY, PAL_MUSIC, PAL_BOX,
|
||||
PAL_CIRCUIT, PAL_ARROWS, PAL_HERMES]: # ... all palettes used in project
|
||||
all_chars.update(pal)
|
||||
# Add any overlay text characters
|
||||
all_chars.update("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 .,-:;!?/|")
|
||||
all_chars.discard(" ") # space is never rendered
|
||||
```
|
||||
|
||||
### GridLayer Initialization
|
||||
|
||||
Each grid pre-computes coordinate arrays for vectorized effect math. The grid automatically adapts to any resolution (landscape, portrait, square):
|
||||
|
||||
```python
|
||||
class GridLayer:
|
||||
def __init__(self, font_path, font_size, vw=None, vh=None):
|
||||
"""Initialize grid for any resolution.
|
||||
vw, vh: video width/height in pixels. Defaults to global VW, VH."""
|
||||
vw = vw or VW; vh = vh or VH
|
||||
self.vw = vw; self.vh = vh
|
||||
|
||||
self.font = ImageFont.truetype(font_path, font_size)
|
||||
asc, desc = self.font.getmetrics()
|
||||
bbox = self.font.getbbox("M")
|
||||
self.cw = bbox[2] - bbox[0] # character cell width
|
||||
self.ch = asc + desc # CRITICAL: not textbbox height
|
||||
|
||||
self.cols = vw // self.cw
|
||||
self.rows = vh // self.ch
|
||||
self.ox = (vw - self.cols * self.cw) // 2 # centering
|
||||
self.oy = (vh - self.rows * self.ch) // 2
|
||||
|
||||
# Aspect ratio metadata
|
||||
self.aspect = vw / vh # >1 = landscape, <1 = portrait, 1 = square
|
||||
self.is_portrait = vw < vh
|
||||
self.is_landscape = vw > vh
|
||||
|
||||
# Index arrays
|
||||
self.rr = np.arange(self.rows, dtype=np.float32)[:, None]
|
||||
self.cc = np.arange(self.cols, dtype=np.float32)[None, :]
|
||||
|
||||
# Polar coordinates (aspect-corrected)
|
||||
cx, cy = self.cols / 2.0, self.rows / 2.0
|
||||
asp = self.cw / self.ch
|
||||
self.dx = self.cc - cx
|
||||
self.dy = (self.rr - cy) * asp
|
||||
self.dist = np.sqrt(self.dx**2 + self.dy**2)
|
||||
self.angle = np.arctan2(self.dy, self.dx)
|
||||
|
||||
# Normalized (0-1 range) -- for distance falloff
|
||||
self.dx_n = (self.cc - cx) / max(self.cols, 1)
|
||||
self.dy_n = (self.rr - cy) / max(self.rows, 1) * asp
|
||||
self.dist_n = np.sqrt(self.dx_n**2 + self.dy_n**2)
|
||||
|
||||
# Pre-rasterize all characters to float32 bitmaps
|
||||
self.bm = {}
|
||||
for c in all_chars:
|
||||
img = Image.new("L", (self.cw, self.ch), 0)
|
||||
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=self.font)
|
||||
self.bm[c] = np.array(img, dtype=np.float32) / 255.0
|
||||
```
|
||||
|
||||
### Character Render Loop
|
||||
|
||||
The bottleneck. Composites pre-rasterized bitmaps onto pixel canvas:
|
||||
|
||||
```python
|
||||
def render(self, chars, colors, canvas=None):
|
||||
if canvas is None:
|
||||
canvas = np.zeros((VH, VW, 3), dtype=np.uint8)
|
||||
for row in range(self.rows):
|
||||
y = self.oy + row * self.ch
|
||||
if y + self.ch > VH: break
|
||||
for col in range(self.cols):
|
||||
c = chars[row, col]
|
||||
if c == " ": continue
|
||||
x = self.ox + col * self.cw
|
||||
if x + self.cw > VW: break
|
||||
a = self.bm[c] # float32 bitmap
|
||||
canvas[y:y+self.ch, x:x+self.cw] = np.maximum(
|
||||
canvas[y:y+self.ch, x:x+self.cw],
|
||||
(a[:, :, None] * colors[row, col]).astype(np.uint8))
|
||||
return canvas
|
||||
```
|
||||
|
||||
Use `np.maximum` for additive blending (brighter chars overwrite dimmer ones, never darken).
|
||||
|
||||
### Multi-Layer Rendering
|
||||
|
||||
Render multiple grids onto the same canvas for depth:
|
||||
|
||||
```python
|
||||
canvas = np.zeros((VH, VW, 3), dtype=np.uint8)
|
||||
canvas = grid_lg.render(bg_chars, bg_colors, canvas) # background layer
|
||||
canvas = grid_md.render(main_chars, main_colors, canvas) # main layer
|
||||
canvas = grid_sm.render(detail_chars, detail_colors, canvas) # detail overlay
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Character Palettes
|
||||
|
||||
### Design Principles
|
||||
|
||||
Character palettes are the primary visual texture of ASCII video. They control not just brightness mapping but the entire visual feel. Design palettes intentionally:
|
||||
|
||||
- **Visual weight**: characters sorted by the amount of ink/pixels they fill. Space is always index 0.
|
||||
- **Coherence**: characters within a palette should belong to the same visual family.
|
||||
- **Density curve**: the brightness-to-character mapping is nonlinear. Dense palettes (many chars) give smoother gradients; sparse palettes (5-8 chars) give posterized/graphic looks.
|
||||
- **Rendering compatibility**: every character in the palette must exist in the font. Test at init and remove missing glyphs.
|
||||
|
||||
### Palette Library
|
||||
|
||||
Organized by visual family. Mix and match per project -- don't default to PAL_DEFAULT for everything.
|
||||
|
||||
#### Density / Brightness Palettes
|
||||
```python
|
||||
PAL_DEFAULT = " .`'-:;!><=+*^~?/|(){}[]#&$@%" # classic ASCII art
|
||||
PAL_DENSE = " .:;+=xX$#@\u2588" # simple 11-level ramp
|
||||
PAL_MINIMAL = " .:-=+#@" # 8-level, graphic
|
||||
PAL_BINARY = " \u2588" # 2-level, extreme contrast
|
||||
PAL_GRADIENT = " \u2591\u2592\u2593\u2588" # 4-level block gradient
|
||||
```
|
||||
|
||||
#### Unicode Block Elements
|
||||
```python
|
||||
PAL_BLOCKS = " \u2591\u2592\u2593\u2588\u2584\u2580\u2590\u258c" # standard blocks
|
||||
PAL_BLOCKS_EXT = " \u2596\u2597\u2598\u2599\u259a\u259b\u259c\u259d\u259e\u259f\u2591\u2592\u2593\u2588" # quadrant blocks (more detail)
|
||||
PAL_SHADE = " \u2591\u2592\u2593\u2588\u2587\u2586\u2585\u2584\u2583\u2582\u2581" # vertical fill progression
|
||||
```
|
||||
|
||||
#### Symbolic / Thematic
|
||||
```python
|
||||
PAL_MATH = " \u00b7\u2218\u2219\u2022\u00b0\u00b1\u2213\u00d7\u00f7\u2248\u2260\u2261\u2264\u2265\u221e\u222b\u2211\u220f\u221a\u2207\u2202\u2206\u03a9" # math symbols
|
||||
PAL_BOX = " \u2500\u2502\u250c\u2510\u2514\u2518\u251c\u2524\u252c\u2534\u253c\u2550\u2551\u2554\u2557\u255a\u255d\u2560\u2563\u2566\u2569\u256c" # box drawing
|
||||
PAL_CIRCUIT = " .\u00b7\u2500\u2502\u250c\u2510\u2514\u2518\u253c\u25cb\u25cf\u25a1\u25a0\u2206\u2207\u2261" # circuit board
|
||||
PAL_RUNE = " .\u16a0\u16a2\u16a6\u16b1\u16b7\u16c1\u16c7\u16d2\u16d6\u16da\u16de\u16df" # elder futhark runes
|
||||
PAL_ALCHEMIC = " \u2609\u263d\u2640\u2642\u2643\u2644\u2645\u2646\u2647\u2648\u2649\u264a\u264b" # planetary/alchemical symbols
|
||||
PAL_ZODIAC = " \u2648\u2649\u264a\u264b\u264c\u264d\u264e\u264f\u2650\u2651\u2652\u2653" # zodiac
|
||||
PAL_ARROWS = " \u2190\u2191\u2192\u2193\u2194\u2195\u2196\u2197\u2198\u2199\u21a9\u21aa\u21bb\u27a1" # directional arrows
|
||||
PAL_MUSIC = " \u266a\u266b\u266c\u2669\u266d\u266e\u266f\u25cb\u25cf" # musical notation
|
||||
```
|
||||
|
||||
#### Script / Writing System
|
||||
```python
|
||||
PAL_KATA = " \u00b7\uff66\uff67\uff68\uff69\uff6a\uff6b\uff6c\uff6d\uff6e\uff6f\uff70\uff71\uff72\uff73\uff74\uff75\uff76\uff77" # katakana halfwidth (matrix rain)
|
||||
PAL_GREEK = " \u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03c0\u03c1\u03c3\u03c4\u03c6\u03c8\u03c9" # Greek lowercase
|
||||
PAL_CYRILLIC = " \u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448" # Cyrillic lowercase
|
||||
PAL_ARABIC = " \u0627\u0628\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637" # Arabic letters (isolated forms)
|
||||
```
|
||||
|
||||
#### Dot / Point Progressions
|
||||
```python
|
||||
PAL_DOTS = " ⋅∘∙●◉◎◆✦★" # dot size progression
|
||||
PAL_BRAILLE = " ⠁⠂⠃⠄⠅⠆⠇⠈⠉⠊⠋⠌⠍⠎⠏⠐⠑⠒⠓⠔⠕⠖⠗⠘⠙⠚⠛⠜⠝⠞⠟⠿" # braille patterns
|
||||
PAL_STARS = " ·✧✦✩✨★✶✳✸" # star progression
|
||||
PAL_HALFFILL = " ◔◑◕◐◒◓◖◗◙" # directional half-fill progression
|
||||
PAL_HATCH = " ▣▤▥▦▧▨▩" # crosshatch density ramp
|
||||
```
|
||||
|
||||
#### Project-Specific (examples -- invent new ones per project)
|
||||
```python
|
||||
PAL_HERMES = " .\u00b7~=\u2248\u221e\u26a1\u263f\u2726\u2605\u2295\u25ca\u25c6\u25b2\u25bc\u25cf\u25a0" # mythology/tech blend
|
||||
PAL_OCEAN = " ~\u2248\u2248\u2248\u223c\u2307\u2248\u224b\u224c\u2248" # water/wave characters
|
||||
PAL_ORGANIC = " .\u00b0\u2218\u2022\u25e6\u25c9\u2742\u273f\u2741\u2743" # growing/botanical
|
||||
PAL_MACHINE = " _\u2500\u2502\u250c\u2510\u253c\u2261\u25a0\u2588\u2593\u2592\u2591" # mechanical/industrial
|
||||
```
|
||||
|
||||
### Creating Custom Palettes
|
||||
|
||||
When designing for a project, build palettes from the content's theme:
|
||||
|
||||
1. **Choose a visual family** (dots, blocks, symbols, script)
|
||||
2. **Sort by visual weight** -- render each char at target font size, count lit pixels, sort ascending
|
||||
3. **Test at target grid size** -- some chars collapse to blobs at small sizes
|
||||
4. **Validate in font** -- remove chars the font can't render:
|
||||
|
||||
```python
|
||||
def validate_palette(pal, font):
|
||||
"""Remove characters the font can't render."""
|
||||
valid = []
|
||||
for c in pal:
|
||||
if c == " ":
|
||||
valid.append(c)
|
||||
continue
|
||||
img = Image.new("L", (20, 20), 0)
|
||||
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
|
||||
if np.array(img).max() > 0: # char actually rendered something
|
||||
valid.append(c)
|
||||
return "".join(valid)
|
||||
```
|
||||
|
||||
### Mapping Values to Characters
|
||||
|
||||
```python
|
||||
def val2char(v, mask, pal=PAL_DEFAULT):
|
||||
"""Map float array (0-1) to character array using palette."""
|
||||
n = len(pal)
|
||||
idx = np.clip((v * n).astype(int), 0, n - 1)
|
||||
out = np.full(v.shape, " ", dtype="U1")
|
||||
for i, ch in enumerate(pal):
|
||||
out[mask & (idx == i)] = ch
|
||||
return out
|
||||
```
|
||||
|
||||
**Nonlinear mapping** for different visual curves:
|
||||
|
||||
```python
|
||||
def val2char_gamma(v, mask, pal, gamma=1.0):
|
||||
"""Gamma-corrected palette mapping. gamma<1 = brighter, gamma>1 = darker."""
|
||||
v_adj = np.power(np.clip(v, 0, 1), gamma)
|
||||
return val2char(v_adj, mask, pal)
|
||||
|
||||
def val2char_step(v, mask, pal, thresholds):
|
||||
"""Custom threshold mapping. thresholds = list of float breakpoints."""
|
||||
out = np.full(v.shape, pal[0], dtype="U1")
|
||||
for i, thr in enumerate(thresholds):
|
||||
out[mask & (v > thr)] = pal[min(i + 1, len(pal) - 1)]
|
||||
return out
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Color System
|
||||
|
||||
### HSV->RGB (Vectorized)
|
||||
|
||||
All color computation in HSV for intuitive control, converted at render time:
|
||||
|
||||
```python
|
||||
def hsv2rgb(h, s, v):
|
||||
"""Vectorized HSV->RGB. h,s,v are numpy arrays. Returns (R,G,B) uint8 arrays."""
|
||||
h = h % 1.0
|
||||
c = v * s; x = c * (1 - np.abs((h*6) % 2 - 1)); m = v - c
|
||||
# ... 6 sector assignment ...
|
||||
return (np.clip((r+m)*255, 0, 255).astype(np.uint8),
|
||||
np.clip((g+m)*255, 0, 255).astype(np.uint8),
|
||||
np.clip((b+m)*255, 0, 255).astype(np.uint8))
|
||||
```
|
||||
|
||||
### Color Mapping Strategies
|
||||
|
||||
Don't default to a single strategy. Choose based on the visual intent:
|
||||
|
||||
| Strategy | Hue source | Effect | Good for |
|
||||
|----------|------------|--------|----------|
|
||||
| Angle-mapped | `g.angle / (2*pi)` | Rainbow around center | Radial effects, kaleidoscopes |
|
||||
| Distance-mapped | `g.dist_n * 0.3` | Gradient from center | Tunnels, depth effects |
|
||||
| Frequency-mapped | `f["cent"] * 0.2` | Timbral color shifting | Audio-reactive |
|
||||
| Value-mapped | `val * 0.15` | Brightness-dependent hue | Fire, heat maps |
|
||||
| Time-cycled | `t * rate` | Slow color rotation | Ambient, chill |
|
||||
| Source-sampled | Video frame pixel colors | Preserve original color | Video-to-ASCII |
|
||||
| Palette-indexed | Discrete color lookup | Flat graphic style | Retro, pixel art |
|
||||
| Temperature | Blend between warm/cool | Emotional tone | Mood-driven scenes |
|
||||
| Complementary | `hue` and `hue + 0.5` | High contrast | Bold, dramatic |
|
||||
| Triadic | `hue`, `hue + 0.33`, `hue + 0.66` | Vibrant, balanced | Psychedelic |
|
||||
| Analogous | `hue +/- 0.08` | Harmonious, subtle | Elegant, cohesive |
|
||||
| Monochrome | Fixed hue, vary S and V | Restrained, focused | Noir, minimal |
|
||||
|
||||
### Color Palettes (Discrete RGB)
|
||||
|
||||
For non-HSV workflows -- direct RGB color sets for graphic/retro looks:
|
||||
|
||||
```python
|
||||
# Named color palettes -- use for flat/graphic styles or per-character coloring
|
||||
COLORS_NEON = [(255,0,102), (0,255,153), (102,0,255), (255,255,0), (0,204,255)]
|
||||
COLORS_PASTEL = [(255,179,186), (255,223,186), (255,255,186), (186,255,201), (186,225,255)]
|
||||
COLORS_MONO_GREEN = [(0,40,0), (0,80,0), (0,140,0), (0,200,0), (0,255,0)]
|
||||
COLORS_MONO_AMBER = [(40,20,0), (80,50,0), (140,90,0), (200,140,0), (255,191,0)]
|
||||
COLORS_CYBERPUNK = [(255,0,60), (0,255,200), (180,0,255), (255,200,0)]
|
||||
COLORS_VAPORWAVE = [(255,113,206), (1,205,254), (185,103,255), (5,255,161)]
|
||||
COLORS_EARTH = [(86,58,26), (139,90,43), (189,154,91), (222,193,136), (245,230,193)]
|
||||
COLORS_ICE = [(200,230,255), (150,200,240), (100,170,230), (60,130,210), (30,80,180)]
|
||||
COLORS_BLOOD = [(80,0,0), (140,10,10), (200,20,20), (255,50,30), (255,100,80)]
|
||||
COLORS_FOREST = [(10,30,10), (20,60,15), (30,100,20), (50,150,30), (80,200,50)]
|
||||
|
||||
def rgb_palette_map(val, mask, palette):
|
||||
"""Map float array (0-1) to RGB colors from a discrete palette."""
|
||||
n = len(palette)
|
||||
idx = np.clip((val * n).astype(int), 0, n - 1)
|
||||
R = np.zeros(val.shape, dtype=np.uint8)
|
||||
G = np.zeros(val.shape, dtype=np.uint8)
|
||||
B = np.zeros(val.shape, dtype=np.uint8)
|
||||
for i, (r, g, b) in enumerate(palette):
|
||||
m = mask & (idx == i)
|
||||
R[m] = r; G[m] = g; B[m] = b
|
||||
return R, G, B
|
||||
```
|
||||
|
||||
### OKLAB Color Space (Perceptually Uniform)
|
||||
|
||||
HSV hue is perceptually non-uniform: green occupies far more visual range than blue. OKLAB / OKLCH provide perceptually even color steps — hue increments of 0.1 look equally different regardless of starting hue. Use OKLAB for:
|
||||
- Gradient interpolation (no unwanted intermediate hues)
|
||||
- Color harmony generation (perceptually balanced palettes)
|
||||
- Smooth color transitions over time
|
||||
|
||||
```python
|
||||
# --- sRGB <-> Linear sRGB ---
|
||||
|
||||
def srgb_to_linear(c):
|
||||
"""Convert sRGB [0,1] to linear light. c: float32 array."""
|
||||
return np.where(c <= 0.04045, c / 12.92, ((c + 0.055) / 1.055) ** 2.4)
|
||||
|
||||
def linear_to_srgb(c):
|
||||
"""Convert linear light to sRGB [0,1]."""
|
||||
return np.where(c <= 0.0031308, c * 12.92, 1.055 * np.power(np.maximum(c, 0), 1/2.4) - 0.055)
|
||||
|
||||
# --- Linear sRGB <-> OKLAB ---
|
||||
|
||||
def linear_rgb_to_oklab(r, g, b):
|
||||
"""Linear sRGB to OKLAB. r,g,b: float32 arrays [0,1].
|
||||
Returns (L, a, b) where L=[0,1], a,b=[-0.4, 0.4] approx."""
|
||||
l_ = 0.4122214708 * r + 0.5363325363 * g + 0.0514459929 * b
|
||||
m_ = 0.2119034982 * r + 0.6806995451 * g + 0.1073969566 * b
|
||||
s_ = 0.0883024619 * r + 0.2817188376 * g + 0.6299787005 * b
|
||||
l_c = np.cbrt(l_); m_c = np.cbrt(m_); s_c = np.cbrt(s_)
|
||||
L = 0.2104542553 * l_c + 0.7936177850 * m_c - 0.0040720468 * s_c
|
||||
a = 1.9779984951 * l_c - 2.4285922050 * m_c + 0.4505937099 * s_c
|
||||
b_ = 0.0259040371 * l_c + 0.7827717662 * m_c - 0.8086757660 * s_c
|
||||
return L, a, b_
|
||||
|
||||
def oklab_to_linear_rgb(L, a, b):
|
||||
"""OKLAB to linear sRGB. Returns (r, g, b) float32 arrays [0,1]."""
|
||||
l_ = L + 0.3963377774 * a + 0.2158037573 * b
|
||||
m_ = L - 0.1055613458 * a - 0.0638541728 * b
|
||||
s_ = L - 0.0894841775 * a - 1.2914855480 * b
|
||||
l_c = l_ ** 3; m_c = m_ ** 3; s_c = s_ ** 3
|
||||
r = +4.0767416621 * l_c - 3.3077115913 * m_c + 0.2309699292 * s_c
|
||||
g = -1.2684380046 * l_c + 2.6097574011 * m_c - 0.3413193965 * s_c
|
||||
b_ = -0.0041960863 * l_c - 0.7034186147 * m_c + 1.7076147010 * s_c
|
||||
return np.clip(r, 0, 1), np.clip(g, 0, 1), np.clip(b_, 0, 1)
|
||||
|
||||
# --- Convenience: sRGB uint8 <-> OKLAB ---
|
||||
|
||||
def rgb_to_oklab(R, G, B):
|
||||
"""sRGB uint8 arrays to OKLAB."""
|
||||
r = srgb_to_linear(R.astype(np.float32) / 255.0)
|
||||
g = srgb_to_linear(G.astype(np.float32) / 255.0)
|
||||
b = srgb_to_linear(B.astype(np.float32) / 255.0)
|
||||
return linear_rgb_to_oklab(r, g, b)
|
||||
|
||||
def oklab_to_rgb(L, a, b):
|
||||
"""OKLAB to sRGB uint8 arrays."""
|
||||
r, g, b_ = oklab_to_linear_rgb(L, a, b)
|
||||
R = np.clip(linear_to_srgb(r) * 255, 0, 255).astype(np.uint8)
|
||||
G = np.clip(linear_to_srgb(g) * 255, 0, 255).astype(np.uint8)
|
||||
B = np.clip(linear_to_srgb(b_) * 255, 0, 255).astype(np.uint8)
|
||||
return R, G, B
|
||||
|
||||
# --- OKLCH (cylindrical form of OKLAB) ---
|
||||
|
||||
def oklab_to_oklch(L, a, b):
|
||||
"""OKLAB to OKLCH. Returns (L, C, H) where H is in [0, 1] (normalized)."""
|
||||
C = np.sqrt(a**2 + b**2)
|
||||
H = (np.arctan2(b, a) / (2 * np.pi)) % 1.0
|
||||
return L, C, H
|
||||
|
||||
def oklch_to_oklab(L, C, H):
|
||||
"""OKLCH to OKLAB. H in [0, 1]."""
|
||||
angle = H * 2 * np.pi
|
||||
a = C * np.cos(angle)
|
||||
b = C * np.sin(angle)
|
||||
return L, a, b
|
||||
```
|
||||
|
||||
### Gradient Interpolation (OKLAB vs HSV)
|
||||
|
||||
Interpolating colors through OKLAB avoids the hue detours that HSV produces:
|
||||
|
||||
```python
|
||||
def lerp_oklab(color_a, color_b, t_array):
|
||||
"""Interpolate between two sRGB colors through OKLAB.
|
||||
color_a, color_b: (R, G, B) tuples 0-255
|
||||
t_array: float32 array [0,1] — interpolation parameter per pixel.
|
||||
Returns (R, G, B) uint8 arrays."""
|
||||
La, aa, ba = rgb_to_oklab(
|
||||
np.full_like(t_array, color_a[0], dtype=np.uint8),
|
||||
np.full_like(t_array, color_a[1], dtype=np.uint8),
|
||||
np.full_like(t_array, color_a[2], dtype=np.uint8))
|
||||
Lb, ab, bb = rgb_to_oklab(
|
||||
np.full_like(t_array, color_b[0], dtype=np.uint8),
|
||||
np.full_like(t_array, color_b[1], dtype=np.uint8),
|
||||
np.full_like(t_array, color_b[2], dtype=np.uint8))
|
||||
L = La + (Lb - La) * t_array
|
||||
a = aa + (ab - aa) * t_array
|
||||
b = ba + (bb - ba) * t_array
|
||||
return oklab_to_rgb(L, a, b)
|
||||
|
||||
def lerp_oklch(color_a, color_b, t_array, short_path=True):
|
||||
"""Interpolate through OKLCH (preserves chroma, smooth hue path).
|
||||
short_path: take the shorter arc around the hue wheel."""
|
||||
La, aa, ba = rgb_to_oklab(
|
||||
np.full_like(t_array, color_a[0], dtype=np.uint8),
|
||||
np.full_like(t_array, color_a[1], dtype=np.uint8),
|
||||
np.full_like(t_array, color_a[2], dtype=np.uint8))
|
||||
Lb, ab, bb = rgb_to_oklab(
|
||||
np.full_like(t_array, color_b[0], dtype=np.uint8),
|
||||
np.full_like(t_array, color_b[1], dtype=np.uint8),
|
||||
np.full_like(t_array, color_b[2], dtype=np.uint8))
|
||||
L1, C1, H1 = oklab_to_oklch(La, aa, ba)
|
||||
L2, C2, H2 = oklab_to_oklch(Lb, ab, bb)
|
||||
# Shortest hue path
|
||||
if short_path:
|
||||
dh = H2 - H1
|
||||
dh = np.where(dh > 0.5, dh - 1.0, np.where(dh < -0.5, dh + 1.0, dh))
|
||||
H = (H1 + dh * t_array) % 1.0
|
||||
else:
|
||||
H = H1 + (H2 - H1) * t_array
|
||||
L = L1 + (L2 - L1) * t_array
|
||||
C = C1 + (C2 - C1) * t_array
|
||||
Lout, aout, bout = oklch_to_oklab(L, C, H)
|
||||
return oklab_to_rgb(Lout, aout, bout)
|
||||
```
|
||||
|
||||
### Color Harmony Generation
|
||||
|
||||
Auto-generate harmonious palettes from a seed color:
|
||||
|
||||
```python
|
||||
def harmony_complementary(seed_rgb):
|
||||
"""Two colors: seed + opposite hue."""
|
||||
L, a, b = rgb_to_oklab(np.array([seed_rgb[0]]), np.array([seed_rgb[1]]), np.array([seed_rgb[2]]))
|
||||
_, C, H = oklab_to_oklch(L, a, b)
|
||||
return [seed_rgb, _oklch_to_srgb_tuple(L[0], C[0], (H[0] + 0.5) % 1.0)]
|
||||
|
||||
def harmony_triadic(seed_rgb):
|
||||
"""Three colors: seed + two at 120-degree offsets."""
|
||||
L, a, b = rgb_to_oklab(np.array([seed_rgb[0]]), np.array([seed_rgb[1]]), np.array([seed_rgb[2]]))
|
||||
_, C, H = oklab_to_oklch(L, a, b)
|
||||
return [seed_rgb,
|
||||
_oklch_to_srgb_tuple(L[0], C[0], (H[0] + 0.333) % 1.0),
|
||||
_oklch_to_srgb_tuple(L[0], C[0], (H[0] + 0.667) % 1.0)]
|
||||
|
||||
def harmony_analogous(seed_rgb, spread=0.08, n=5):
|
||||
"""N colors spread evenly around seed hue."""
|
||||
L, a, b = rgb_to_oklab(np.array([seed_rgb[0]]), np.array([seed_rgb[1]]), np.array([seed_rgb[2]]))
|
||||
_, C, H = oklab_to_oklch(L, a, b)
|
||||
offsets = np.linspace(-spread * (n-1)/2, spread * (n-1)/2, n)
|
||||
return [_oklch_to_srgb_tuple(L[0], C[0], (H[0] + off) % 1.0) for off in offsets]
|
||||
|
||||
def harmony_split_complementary(seed_rgb, split=0.08):
|
||||
"""Three colors: seed + two flanking the complement."""
|
||||
L, a, b = rgb_to_oklab(np.array([seed_rgb[0]]), np.array([seed_rgb[1]]), np.array([seed_rgb[2]]))
|
||||
_, C, H = oklab_to_oklch(L, a, b)
|
||||
comp = (H[0] + 0.5) % 1.0
|
||||
return [seed_rgb,
|
||||
_oklch_to_srgb_tuple(L[0], C[0], (comp - split) % 1.0),
|
||||
_oklch_to_srgb_tuple(L[0], C[0], (comp + split) % 1.0)]
|
||||
|
||||
def harmony_tetradic(seed_rgb):
|
||||
"""Four colors: two complementary pairs at 90-degree offset."""
|
||||
L, a, b = rgb_to_oklab(np.array([seed_rgb[0]]), np.array([seed_rgb[1]]), np.array([seed_rgb[2]]))
|
||||
_, C, H = oklab_to_oklch(L, a, b)
|
||||
return [seed_rgb,
|
||||
_oklch_to_srgb_tuple(L[0], C[0], (H[0] + 0.25) % 1.0),
|
||||
_oklch_to_srgb_tuple(L[0], C[0], (H[0] + 0.5) % 1.0),
|
||||
_oklch_to_srgb_tuple(L[0], C[0], (H[0] + 0.75) % 1.0)]
|
||||
|
||||
def _oklch_to_srgb_tuple(L, C, H):
|
||||
"""Helper: single OKLCH -> sRGB (R,G,B) int tuple."""
|
||||
La = np.array([L]); Ca = np.array([C]); Ha = np.array([H])
|
||||
Lo, ao, bo = oklch_to_oklab(La, Ca, Ha)
|
||||
R, G, B = oklab_to_rgb(Lo, ao, bo)
|
||||
return (int(R[0]), int(G[0]), int(B[0]))
|
||||
```
|
||||
|
||||
### OKLAB Hue Fields
|
||||
|
||||
Drop-in replacements for `hf_*` generators that produce perceptually uniform hue variation:
|
||||
|
||||
```python
|
||||
def hf_oklch_angle(offset=0.0, chroma=0.12, lightness=0.7):
|
||||
"""OKLCH hue mapped to angle from center. Perceptually uniform rainbow.
|
||||
Returns (R, G, B) uint8 color array instead of a float hue.
|
||||
NOTE: Use with _render_vf_rgb() variant, not standard _render_vf()."""
|
||||
def fn(g, f, t, S):
|
||||
H = (g.angle / (2 * np.pi) + offset + t * 0.05) % 1.0
|
||||
L = np.full_like(H, lightness)
|
||||
C = np.full_like(H, chroma)
|
||||
Lo, ao, bo = oklch_to_oklab(L, C, H)
|
||||
R, G, B = oklab_to_rgb(Lo, ao, bo)
|
||||
return mkc(R, G, B, g.rows, g.cols)
|
||||
return fn
|
||||
```
|
||||
|
||||
### Compositing Helpers
|
||||
|
||||
```python
|
||||
def mkc(R, G, B, rows, cols):
|
||||
"""Pack 3 uint8 arrays into (rows, cols, 3) color array."""
|
||||
o = np.zeros((rows, cols, 3), dtype=np.uint8)
|
||||
o[:,:,0] = R; o[:,:,1] = G; o[:,:,2] = B
|
||||
return o
|
||||
|
||||
def layer_over(base_ch, base_co, top_ch, top_co):
|
||||
"""Composite top layer onto base. Non-space chars overwrite."""
|
||||
m = top_ch != " "
|
||||
base_ch[m] = top_ch[m]; base_co[m] = top_co[m]
|
||||
return base_ch, base_co
|
||||
|
||||
def layer_blend(base_co, top_co, alpha):
|
||||
"""Alpha-blend top color layer onto base. alpha is float array (0-1) or scalar."""
|
||||
if isinstance(alpha, (int, float)):
|
||||
alpha = np.full(base_co.shape[:2], alpha, dtype=np.float32)
|
||||
a = alpha[:,:,None]
|
||||
return np.clip(base_co * (1 - a) + top_co * a, 0, 255).astype(np.uint8)
|
||||
|
||||
def stamp(ch, co, text, row, col, color=(255,255,255)):
|
||||
"""Write text string at position."""
|
||||
for i, c in enumerate(text):
|
||||
cc = col + i
|
||||
if 0 <= row < ch.shape[0] and 0 <= cc < ch.shape[1]:
|
||||
ch[row, cc] = c; co[row, cc] = color
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Section System
|
||||
|
||||
Map time ranges to effect functions + shader configs + grid sizes:
|
||||
|
||||
```python
|
||||
SECTIONS = [
|
||||
(0.0, "void"), (3.94, "starfield"), (21.0, "matrix"),
|
||||
(46.0, "drop"), (130.0, "glitch"), (187.0, "outro"),
|
||||
]
|
||||
|
||||
FX_DISPATCH = {"void": fx_void, "starfield": fx_starfield, ...}
|
||||
SECTION_FX = {"void": {"vignette": 0.3, "bloom": 170}, ...}
|
||||
SECTION_GRID = {"void": "md", "starfield": "sm", "drop": "lg", ...}
|
||||
SECTION_MIRROR = {"drop": "h", "bass_rings": "quad"}
|
||||
|
||||
def get_section(t):
|
||||
sec = SECTIONS[0][1]
|
||||
for ts, name in SECTIONS:
|
||||
if t >= ts: sec = name
|
||||
return sec
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Parallel Encoding
|
||||
|
||||
Split frames across N workers. Each pipes raw RGB to its own ffmpeg subprocess:
|
||||
|
||||
```python
|
||||
def render_batch(batch_id, frame_start, frame_end, features, seg_path):
|
||||
r = Renderer()
|
||||
cmd = ["ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "rgb24",
|
||||
"-s", f"{VW}x{VH}", "-r", str(FPS), "-i", "pipe:0",
|
||||
"-c:v", "libx264", "-preset", "fast", "-crf", "18",
|
||||
"-pix_fmt", "yuv420p", seg_path]
|
||||
|
||||
# CRITICAL: stderr to file, not pipe
|
||||
stderr_fh = open(os.path.join(workdir, f"err_{batch_id:02d}.log"), "w")
|
||||
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE,
|
||||
stdout=subprocess.DEVNULL, stderr=stderr_fh)
|
||||
|
||||
for fi in range(frame_start, frame_end):
|
||||
t = fi / FPS
|
||||
sec = get_section(t)
|
||||
f = {k: float(features[k][fi]) for k in features}
|
||||
ch, co = FX_DISPATCH[sec](r, f, t)
|
||||
canvas = r.render(ch, co)
|
||||
canvas = apply_mirror(canvas, sec, f)
|
||||
canvas = apply_shaders(canvas, sec, f, t)
|
||||
pipe.stdin.write(canvas.tobytes())
|
||||
|
||||
pipe.stdin.close()
|
||||
pipe.wait()
|
||||
stderr_fh.close()
|
||||
```
|
||||
|
||||
Concatenate segments + mux audio:
|
||||
|
||||
```python
|
||||
# Write concat file
|
||||
with open(concat_path, "w") as cf:
|
||||
for seg in segments:
|
||||
cf.write(f"file '{seg}'\n")
|
||||
|
||||
subprocess.run(["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", concat_path,
|
||||
"-i", audio_path, "-c:v", "copy", "-c:a", "aac", "-b:a", "192k",
|
||||
"-shortest", output_path])
|
||||
```
|
||||
|
||||
## Effect Function Contract
|
||||
|
||||
### v2 Protocol (Current)
|
||||
|
||||
Every scene function: `(r, f, t, S) -> canvas_uint8` — where `r` = Renderer, `f` = features dict, `t` = time float, `S` = persistent state dict
|
||||
|
||||
```python
|
||||
def fx_example(r, f, t, S):
|
||||
"""Scene function returns a full pixel canvas (uint8 H,W,3).
|
||||
Scenes have full control over multi-grid rendering and pixel-level composition.
|
||||
"""
|
||||
# Render multiple layers at different grid densities
|
||||
canvas_a = _render_vf(r, "md", vf_plasma, hf_angle(0.0), PAL_DENSE, f, t, S)
|
||||
canvas_b = _render_vf(r, "sm", vf_vortex, hf_time_cycle(0.1), PAL_RUNE, f, t, S)
|
||||
|
||||
# Pixel-level blend
|
||||
result = blend_canvas(canvas_a, canvas_b, "screen", 0.8)
|
||||
return result
|
||||
```
|
||||
|
||||
See `references/scenes.md` for the full scene protocol, the Renderer class, `_render_vf()` helper, and complete scene examples.
|
||||
|
||||
See `references/composition.md` for blend modes, tone mapping, feedback buffers, and multi-grid composition.
|
||||
|
||||
### v1 Protocol (Legacy)
|
||||
|
||||
Simple scenes that use a single grid can still return `(chars, colors)` and let the caller handle rendering, but the v2 canvas protocol is preferred for all new code.
|
||||
|
||||
```python
|
||||
def fx_simple(r, f, t, S):
|
||||
g = r.get_grid("md")
|
||||
val = np.sin(g.dist * 0.1 - t * 3) * f.get("bass", 0.3) * 2
|
||||
val = np.clip(val, 0, 1); mask = val > 0.03
|
||||
ch = val2char(val, mask, PAL_DEFAULT)
|
||||
R, G, B = hsv2rgb(np.full_like(val, 0.6), np.full_like(val, 0.7), val)
|
||||
co = mkc(R, G, B, g.rows, g.cols)
|
||||
return g.render(ch, co) # returns canvas directly
|
||||
```
|
||||
|
||||
### Persistent State
|
||||
|
||||
Effects that need state across frames (particles, rain columns) use the `S` dict parameter (which is `r.S` — same object, but passed explicitly for clarity):
|
||||
|
||||
```python
|
||||
def fx_with_state(r, f, t, S):
|
||||
if "particles" not in S:
|
||||
S["particles"] = initialize_particles()
|
||||
update_particles(S["particles"])
|
||||
# ...
|
||||
```
|
||||
|
||||
State persists across frames within a single scene/clip. Each worker process (and each scene) gets its own independent state.
|
||||
|
||||
### Helper Functions
|
||||
|
||||
```python
|
||||
def hsv2rgb_scalar(h, s, v):
|
||||
"""Single-value HSV to RGB. Returns (R, G, B) tuple of ints 0-255."""
|
||||
h = h % 1.0
|
||||
c = v * s; x = c * (1 - abs((h * 6) % 2 - 1)); m = v - c
|
||||
if h * 6 < 1: r, g, b = c, x, 0
|
||||
elif h * 6 < 2: r, g, b = x, c, 0
|
||||
elif h * 6 < 3: r, g, b = 0, c, x
|
||||
elif h * 6 < 4: r, g, b = 0, x, c
|
||||
elif h * 6 < 5: r, g, b = x, 0, c
|
||||
else: r, g, b = c, 0, x
|
||||
return (int((r+m)*255), int((g+m)*255), int((b+m)*255))
|
||||
|
||||
def log(msg):
|
||||
"""Print timestamped log message."""
|
||||
print(msg, flush=True)
|
||||
```
|
||||
892
skills/creative/ascii-video/references/composition.md
Normal file
892
skills/creative/ascii-video/references/composition.md
Normal file
@@ -0,0 +1,892 @@
|
||||
# Composition & Brightness Reference
|
||||
|
||||
The composable system is the core of visual complexity. It operates at three levels: pixel-level blend modes, multi-grid composition, and adaptive brightness management. This document covers all three, plus the masking/stencil system for spatial control.
|
||||
|
||||
> **See also:** architecture.md · effects.md · scenes.md · shaders.md · troubleshooting.md
|
||||
|
||||
## Pixel-Level Blend Modes
|
||||
|
||||
### The `blend_canvas()` Function
|
||||
|
||||
All blending operates on full pixel canvases (`uint8 H,W,3`). Internally converts to float32 [0,1] for precision, blends, lerps by opacity, converts back.
|
||||
|
||||
```python
|
||||
def blend_canvas(base, top, mode="normal", opacity=1.0):
|
||||
af = base.astype(np.float32) / 255.0
|
||||
bf = top.astype(np.float32) / 255.0
|
||||
fn = BLEND_MODES.get(mode, BLEND_MODES["normal"])
|
||||
result = fn(af, bf)
|
||||
if opacity < 1.0:
|
||||
result = af * (1 - opacity) + result * opacity
|
||||
return np.clip(result * 255, 0, 255).astype(np.uint8)
|
||||
```
|
||||
|
||||
### 20 Blend Modes
|
||||
|
||||
```python
|
||||
BLEND_MODES = {
|
||||
# Basic arithmetic
|
||||
"normal": lambda a, b: b,
|
||||
"add": lambda a, b: np.clip(a + b, 0, 1),
|
||||
"subtract": lambda a, b: np.clip(a - b, 0, 1),
|
||||
"multiply": lambda a, b: a * b,
|
||||
"screen": lambda a, b: 1 - (1 - a) * (1 - b),
|
||||
|
||||
# Contrast
|
||||
"overlay": lambda a, b: np.where(a < 0.5, 2*a*b, 1 - 2*(1-a)*(1-b)),
|
||||
"softlight": lambda a, b: (1 - 2*b)*a*a + 2*b*a,
|
||||
"hardlight": lambda a, b: np.where(b < 0.5, 2*a*b, 1 - 2*(1-a)*(1-b)),
|
||||
|
||||
# Difference
|
||||
"difference": lambda a, b: np.abs(a - b),
|
||||
"exclusion": lambda a, b: a + b - 2*a*b,
|
||||
|
||||
# Dodge / burn
|
||||
"colordodge": lambda a, b: np.clip(a / (1 - b + 1e-6), 0, 1),
|
||||
"colorburn": lambda a, b: np.clip(1 - (1 - a) / (b + 1e-6), 0, 1),
|
||||
|
||||
# Light
|
||||
"linearlight": lambda a, b: np.clip(a + 2*b - 1, 0, 1),
|
||||
"vividlight": lambda a, b: np.where(b < 0.5,
|
||||
np.clip(1 - (1-a)/(2*b + 1e-6), 0, 1),
|
||||
np.clip(a / (2*(1-b) + 1e-6), 0, 1)),
|
||||
"pin_light": lambda a, b: np.where(b < 0.5,
|
||||
np.minimum(a, 2*b), np.maximum(a, 2*b - 1)),
|
||||
"hard_mix": lambda a, b: np.where(a + b >= 1.0, 1.0, 0.0),
|
||||
|
||||
# Compare
|
||||
"lighten": lambda a, b: np.maximum(a, b),
|
||||
"darken": lambda a, b: np.minimum(a, b),
|
||||
|
||||
# Grain
|
||||
"grain_extract": lambda a, b: np.clip(a - b + 0.5, 0, 1),
|
||||
"grain_merge": lambda a, b: np.clip(a + b - 0.5, 0, 1),
|
||||
}
|
||||
```
|
||||
|
||||
### Blend Mode Selection Guide
|
||||
|
||||
**Modes that brighten** (safe for dark inputs):
|
||||
- `screen` — always brightens. Two 50% gray layers screen to 75%. The go-to safe blend.
|
||||
- `add` — simple addition, clips at white. Good for sparkles, glows, particle overlays.
|
||||
- `colordodge` — extreme brightening at overlap zones. Can blow out. Use low opacity (0.3-0.5).
|
||||
- `linearlight` — aggressive brightening. Similar to add but with offset.
|
||||
|
||||
**Modes that darken** (avoid with dark inputs):
|
||||
- `multiply` — darkens everything. Only use when both layers are already bright.
|
||||
- `overlay` — darkens when base < 0.5, brightens when base > 0.5. Crushes dark inputs: `2 * 0.12 * 0.12 = 0.03`. Use `screen` instead for dark material.
|
||||
- `colorburn` — extreme darkening at overlap zones.
|
||||
|
||||
**Modes that create contrast**:
|
||||
- `softlight` — gentle contrast. Good for subtle texture overlay.
|
||||
- `hardlight` — strong contrast. Like overlay but keyed on the top layer.
|
||||
- `vividlight` — very aggressive contrast. Use sparingly.
|
||||
|
||||
**Modes that create color effects**:
|
||||
- `difference` — XOR-like patterns. Two identical layers difference to black; offset layers create wild colors. Great for psychedelic looks.
|
||||
- `exclusion` — softer version of difference. Creates complementary color patterns.
|
||||
- `hard_mix` — posterizes to pure black/white/saturated color at intersections.
|
||||
|
||||
**Modes for texture blending**:
|
||||
- `grain_extract` / `grain_merge` — extract a texture from one layer, apply it to another.
|
||||
|
||||
### Multi-Layer Chaining
|
||||
|
||||
```python
|
||||
# Pattern: render layers -> blend sequentially
|
||||
canvas_a = _render_vf(r, "md", vf_plasma, hf_angle(0.0), PAL_DENSE, f, t, S)
|
||||
canvas_b = _render_vf(r, "sm", vf_vortex, hf_time_cycle(0.1), PAL_RUNE, f, t, S)
|
||||
canvas_c = _render_vf(r, "lg", vf_rings, hf_distance(), PAL_BLOCKS, f, t, S)
|
||||
|
||||
result = blend_canvas(canvas_a, canvas_b, "screen", 0.8)
|
||||
result = blend_canvas(result, canvas_c, "difference", 0.6)
|
||||
```
|
||||
|
||||
Order matters: `screen(A, B)` is commutative, but `difference(screen(A,B), C)` differs from `difference(A, screen(B,C))`.
|
||||
|
||||
### Linear-Light Blend Modes
|
||||
|
||||
Standard `blend_canvas()` operates in sRGB space — the raw byte values. This is fine for most uses, but sRGB is perceptually non-linear: blending in sRGB darkens midtones and shifts hues slightly. For physically accurate blending (matching how light actually combines), convert to linear light first.
|
||||
|
||||
Uses `srgb_to_linear()` / `linear_to_srgb()` from `architecture.md` § OKLAB Color System.
|
||||
|
||||
```python
|
||||
def blend_canvas_linear(base, top, mode="normal", opacity=1.0):
|
||||
"""Blend in linear light space for physically accurate results.
|
||||
|
||||
Identical API to blend_canvas(), but converts sRGB → linear before
|
||||
blending and linear → sRGB after. More expensive (~2x) due to the
|
||||
gamma conversions, but produces correct results for additive blending,
|
||||
screen, and any mode where brightness matters.
|
||||
"""
|
||||
af = srgb_to_linear(base.astype(np.float32) / 255.0)
|
||||
bf = srgb_to_linear(top.astype(np.float32) / 255.0)
|
||||
fn = BLEND_MODES.get(mode, BLEND_MODES["normal"])
|
||||
result = fn(af, bf)
|
||||
if opacity < 1.0:
|
||||
result = af * (1 - opacity) + result * opacity
|
||||
result = linear_to_srgb(np.clip(result, 0, 1))
|
||||
return np.clip(result * 255, 0, 255).astype(np.uint8)
|
||||
```
|
||||
|
||||
**When to use `blend_canvas_linear()` vs `blend_canvas()`:**
|
||||
|
||||
| Scenario | Use | Why |
|
||||
|----------|-----|-----|
|
||||
| Screen-blending two bright layers | `linear` | sRGB screen over-brightens highlights |
|
||||
| Add mode for glow/bloom effects | `linear` | Additive light follows linear physics |
|
||||
| Blending text overlay at low opacity | `srgb` | Perceptual blending looks more natural for text |
|
||||
| Multiply for shadow/darkening | `srgb` | Differences are minimal for darken ops |
|
||||
| Color-critical work (matching reference) | `linear` | Avoids sRGB hue shifts in midtones |
|
||||
| Performance-critical inner loop | `srgb` | ~2x faster, good enough for most ASCII art |
|
||||
|
||||
**Batch version** for compositing many layers (converts once, blends multiple, converts back):
|
||||
|
||||
```python
|
||||
def blend_many_linear(layers, modes, opacities):
|
||||
"""Blend a stack of layers in linear light space.
|
||||
|
||||
Args:
|
||||
layers: list of uint8 (H,W,3) canvases
|
||||
modes: list of blend mode strings (len = len(layers) - 1)
|
||||
opacities: list of floats (len = len(layers) - 1)
|
||||
Returns:
|
||||
uint8 (H,W,3) canvas
|
||||
"""
|
||||
# Convert all to linear at once
|
||||
linear = [srgb_to_linear(l.astype(np.float32) / 255.0) for l in layers]
|
||||
result = linear[0]
|
||||
for i in range(1, len(linear)):
|
||||
fn = BLEND_MODES.get(modes[i-1], BLEND_MODES["normal"])
|
||||
blended = fn(result, linear[i])
|
||||
op = opacities[i-1]
|
||||
if op < 1.0:
|
||||
blended = result * (1 - op) + blended * op
|
||||
result = np.clip(blended, 0, 1)
|
||||
result = linear_to_srgb(result)
|
||||
return np.clip(result * 255, 0, 255).astype(np.uint8)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Multi-Grid Composition
|
||||
|
||||
This is the core visual technique. Rendering the same conceptual scene at different grid densities (character sizes) creates natural texture interference, because characters at different scales overlap at different spatial frequencies.
|
||||
|
||||
### Why It Works
|
||||
|
||||
- `sm` grid (10pt font): 320x83 characters. Fine detail, dense texture.
|
||||
- `md` grid (16pt): 192x56 characters. Medium density.
|
||||
- `lg` grid (20pt): 160x45 characters. Coarse, chunky characters.
|
||||
|
||||
When you render a plasma field on `sm` and a vortex on `lg`, then screen-blend them, the fine plasma texture shows through the gaps in the coarse vortex characters. The result has more visual complexity than either layer alone.
|
||||
|
||||
### The `_render_vf()` Helper
|
||||
|
||||
This is the workhorse function. It takes a value field + hue field + palette + grid, renders to a complete pixel canvas:
|
||||
|
||||
```python
|
||||
def _render_vf(r, grid_key, val_fn, hue_fn, pal, f, t, S, sat=0.8, threshold=0.03):
|
||||
"""Render a value field + hue field to a pixel canvas via a named grid.
|
||||
|
||||
Args:
|
||||
r: Renderer instance (has .get_grid())
|
||||
grid_key: "xs", "sm", "md", "lg", "xl", "xxl"
|
||||
val_fn: (g, f, t, S) -> float32 [0,1] array (rows, cols)
|
||||
hue_fn: callable (g, f, t, S) -> float32 hue array, OR float scalar
|
||||
pal: character palette string
|
||||
f: feature dict
|
||||
t: time in seconds
|
||||
S: persistent state dict
|
||||
sat: HSV saturation (0-1)
|
||||
threshold: minimum value to render (below = space)
|
||||
|
||||
Returns:
|
||||
uint8 array (VH, VW, 3) — full pixel canvas
|
||||
"""
|
||||
g = r.get_grid(grid_key)
|
||||
val = np.clip(val_fn(g, f, t, S), 0, 1)
|
||||
mask = val > threshold
|
||||
ch = val2char(val, mask, pal)
|
||||
|
||||
# Hue: either a callable or a fixed float
|
||||
if callable(hue_fn):
|
||||
h = hue_fn(g, f, t, S) % 1.0
|
||||
else:
|
||||
h = np.full((g.rows, g.cols), float(hue_fn), dtype=np.float32)
|
||||
|
||||
# CRITICAL: broadcast to full shape and copy (see Troubleshooting)
|
||||
h = np.broadcast_to(h, (g.rows, g.cols)).copy()
|
||||
|
||||
R, G, B = hsv2rgb(h, np.full_like(val, sat), val)
|
||||
co = mkc(R, G, B, g.rows, g.cols)
|
||||
return g.render(ch, co)
|
||||
```
|
||||
|
||||
### Grid Combination Strategies
|
||||
|
||||
| Combination | Effect | Good For |
|
||||
|-------------|--------|----------|
|
||||
| `sm` + `lg` | Maximum contrast between fine detail and chunky blocks | Bold, graphic looks |
|
||||
| `sm` + `md` | Subtle texture layering, similar scales | Organic, flowing looks |
|
||||
| `md` + `lg` + `xs` | Three-scale interference, maximum complexity | Psychedelic, dense |
|
||||
| `sm` + `sm` (different effects) | Same scale, pattern interference only | Moire, interference |
|
||||
|
||||
### Complete Multi-Grid Scene Example
|
||||
|
||||
```python
|
||||
def fx_psychedelic(r, f, t, S):
|
||||
"""Three-layer multi-grid scene with beat-reactive kaleidoscope."""
|
||||
# Layer A: plasma on medium grid with rainbow hue
|
||||
canvas_a = _render_vf(r, "md",
|
||||
lambda g, f, t, S: vf_plasma(g, f, t, S) * 1.3,
|
||||
hf_angle(0.0), PAL_DENSE, f, t, S, sat=0.8)
|
||||
|
||||
# Layer B: vortex on small grid with cycling hue
|
||||
canvas_b = _render_vf(r, "sm",
|
||||
lambda g, f, t, S: vf_vortex(g, f, t, S, twist=5.0) * 1.2,
|
||||
hf_time_cycle(0.1), PAL_RUNE, f, t, S, sat=0.7)
|
||||
|
||||
# Layer C: rings on large grid with distance hue
|
||||
canvas_c = _render_vf(r, "lg",
|
||||
lambda g, f, t, S: vf_rings(g, f, t, S, n_base=8, spacing_base=3) * 1.4,
|
||||
hf_distance(0.3, 0.02), PAL_BLOCKS, f, t, S, sat=0.9)
|
||||
|
||||
# Blend: A screened with B, then difference with C
|
||||
result = blend_canvas(canvas_a, canvas_b, "screen", 0.8)
|
||||
result = blend_canvas(result, canvas_c, "difference", 0.6)
|
||||
|
||||
# Beat-triggered kaleidoscope
|
||||
if f.get("bdecay", 0) > 0.3:
|
||||
result = sh_kaleidoscope(result.copy(), folds=6)
|
||||
|
||||
return result
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adaptive Tone Mapping
|
||||
|
||||
### The Brightness Problem
|
||||
|
||||
ASCII characters are small bright dots on a black background. Most pixels in any frame are background (black). This means:
|
||||
- Mean frame brightness is inherently low (often 5-30 out of 255)
|
||||
- Different effect combinations produce wildly different brightness levels
|
||||
- A spiral scene might be 50 mean, while a fire scene is 9 mean
|
||||
- Linear multipliers (e.g., `canvas * 2.0`) either leave dark scenes dark or blow out bright scenes
|
||||
|
||||
### The `tonemap()` Function
|
||||
|
||||
Replaces linear brightness multipliers with adaptive per-frame normalization + gamma correction:
|
||||
|
||||
```python
|
||||
def tonemap(canvas, target_mean=90, gamma=0.75, black_point=2, white_point=253):
|
||||
"""Adaptive tone-mapping: normalizes + gamma-corrects so no frame is
|
||||
fully dark or washed out.
|
||||
|
||||
1. Compute 1st and 99.5th percentile on 4x subsample (16x fewer values,
|
||||
negligible accuracy loss, major speedup at 1080p+)
|
||||
2. Stretch that range to [0, 1]
|
||||
3. Apply gamma curve (< 1 lifts shadows, > 1 darkens)
|
||||
4. Rescale to [black_point, white_point]
|
||||
"""
|
||||
f = canvas.astype(np.float32)
|
||||
sub = f[::4, ::4] # 4x subsample: ~390K values vs ~6.2M at 1080p
|
||||
lo = np.percentile(sub, 1)
|
||||
hi = np.percentile(sub, 99.5)
|
||||
if hi - lo < 10:
|
||||
hi = max(hi, lo + 10) # near-uniform frame fallback
|
||||
f = np.clip((f - lo) / (hi - lo), 0.0, 1.0)
|
||||
np.power(f, gamma, out=f) # in-place: avoids allocation
|
||||
np.multiply(f, (white_point - black_point), out=f)
|
||||
np.add(f, black_point, out=f)
|
||||
return np.clip(f, 0, 255).astype(np.uint8)
|
||||
```
|
||||
|
||||
### Why Gamma, Not Linear
|
||||
|
||||
Linear multiplier `* 2.0`:
|
||||
```
|
||||
input 10 -> output 20 (still dark)
|
||||
input 100 -> output 200 (ok)
|
||||
input 200 -> output 255 (clipped, lost detail)
|
||||
```
|
||||
|
||||
Gamma 0.75 after normalization:
|
||||
```
|
||||
input 0.04 -> output 0.08 (lifted from invisible to visible)
|
||||
input 0.39 -> output 0.50 (moderate lift)
|
||||
input 0.78 -> output 0.84 (gentle lift, no clipping)
|
||||
```
|
||||
|
||||
Gamma < 1 compresses the highlights and expands the shadows. This is exactly what we need: lift dark ASCII content into visibility without blowing out the bright parts.
|
||||
|
||||
### Pipeline Ordering
|
||||
|
||||
The pipeline in `render_clip()` is:
|
||||
|
||||
```
|
||||
scene_fn(r, f, t, S) -> canvas
|
||||
|
|
||||
tonemap(canvas, gamma=scene_gamma)
|
||||
|
|
||||
FeedbackBuffer.apply(canvas, ...)
|
||||
|
|
||||
ShaderChain.apply(canvas, f=f, t=t)
|
||||
|
|
||||
ffmpeg pipe
|
||||
```
|
||||
|
||||
Tonemap runs BEFORE feedback and shaders. This means:
|
||||
- Feedback operates on normalized data (consistent behavior regardless of scene brightness)
|
||||
- Shaders like solarize, posterize, contrast operate on properly-ranged data
|
||||
- The brightness shader in the chain is no longer needed (tonemap handles it)
|
||||
|
||||
### Per-Scene Gamma Tuning
|
||||
|
||||
Default gamma is 0.75. Scenes that apply destructive post-processing need more aggressive lift because the destruction happens after tonemap:
|
||||
|
||||
| Scene Type | Recommended Gamma | Why |
|
||||
|------------|-------------------|-----|
|
||||
| Standard effects | 0.75 | Default, works for most scenes |
|
||||
| Solarize post-process | 0.50-0.60 | Solarize inverts bright pixels, reducing overall brightness |
|
||||
| Posterize post-process | 0.50-0.55 | Posterize quantizes, often crushing mid-values to black |
|
||||
| Heavy difference blending | 0.60-0.70 | Difference mode creates many near-zero pixels |
|
||||
| Already bright scenes | 0.85-1.0 | Don't over-boost scenes that are naturally bright |
|
||||
|
||||
Configure via the scene table:
|
||||
|
||||
```python
|
||||
SCENES = [
|
||||
{"start": 9.17, "end": 11.25, "name": "fire", "gamma": 0.55,
|
||||
"fx": fx_fire, "shaders": [("solarize", {"threshold": 200}), ...]},
|
||||
{"start": 25.96, "end": 27.29, "name": "diamond", "gamma": 0.5,
|
||||
"fx": fx_diamond, "shaders": [("bloom", {"thr": 90}), ...]},
|
||||
]
|
||||
```
|
||||
|
||||
### Brightness Verification
|
||||
|
||||
After rendering, spot-check frame brightness:
|
||||
|
||||
```python
|
||||
# In test-frame mode
|
||||
canvas = scene["fx"](r, feat, t, r.S)
|
||||
canvas = tonemap(canvas, gamma=scene.get("gamma", 0.75))
|
||||
chain = ShaderChain()
|
||||
for sn, kw in scene.get("shaders", []):
|
||||
chain.add(sn, **kw)
|
||||
canvas = chain.apply(canvas, f=feat, t=t)
|
||||
print(f"Mean brightness: {canvas.astype(float).mean():.1f}, max: {canvas.max()}")
|
||||
```
|
||||
|
||||
Target ranges after tonemap + shaders:
|
||||
- Quiet/ambient scenes: mean 30-60
|
||||
- Active scenes: mean 40-100
|
||||
- Climax/peak scenes: mean 60-150
|
||||
- If mean < 20: gamma is too high or a shader is destroying brightness
|
||||
- If mean > 180: gamma is too low or add is stacking too much
|
||||
|
||||
---
|
||||
|
||||
## FeedbackBuffer Spatial Transforms
|
||||
|
||||
The feedback buffer stores the previous frame and blends it into the current frame with decay. Spatial transforms applied to the buffer before blending create the illusion of motion in the feedback trail.
|
||||
|
||||
### Implementation
|
||||
|
||||
```python
|
||||
class FeedbackBuffer:
|
||||
def __init__(self):
|
||||
self.buf = None
|
||||
|
||||
def apply(self, canvas, decay=0.85, blend="screen", opacity=0.5,
|
||||
transform=None, transform_amt=0.02, hue_shift=0.0):
|
||||
if self.buf is None:
|
||||
self.buf = canvas.astype(np.float32) / 255.0
|
||||
return canvas
|
||||
|
||||
# Decay old buffer
|
||||
self.buf *= decay
|
||||
|
||||
# Spatial transform
|
||||
if transform:
|
||||
self.buf = self._transform(self.buf, transform, transform_amt)
|
||||
|
||||
# Hue shift the feedback for rainbow trails
|
||||
if hue_shift > 0:
|
||||
self.buf = self._hue_shift(self.buf, hue_shift)
|
||||
|
||||
# Blend feedback into current frame
|
||||
result = blend_canvas(canvas,
|
||||
np.clip(self.buf * 255, 0, 255).astype(np.uint8),
|
||||
blend, opacity)
|
||||
|
||||
# Update buffer with current frame
|
||||
self.buf = result.astype(np.float32) / 255.0
|
||||
return result
|
||||
|
||||
def _transform(self, buf, transform, amt):
|
||||
h, w = buf.shape[:2]
|
||||
if transform == "zoom":
|
||||
# Zoom in: sample from slightly inside (creates expanding tunnel)
|
||||
m = int(h * amt); n = int(w * amt)
|
||||
if m > 0 and n > 0:
|
||||
cropped = buf[m:-m or None, n:-n or None]
|
||||
# Resize back to full (nearest-neighbor for speed)
|
||||
buf = np.array(Image.fromarray(
|
||||
np.clip(cropped * 255, 0, 255).astype(np.uint8)
|
||||
).resize((w, h), Image.NEAREST)).astype(np.float32) / 255.0
|
||||
elif transform == "shrink":
|
||||
# Zoom out: pad edges, shrink center
|
||||
m = int(h * amt); n = int(w * amt)
|
||||
small = np.array(Image.fromarray(
|
||||
np.clip(buf * 255, 0, 255).astype(np.uint8)
|
||||
).resize((w - 2*n, h - 2*m), Image.NEAREST))
|
||||
new = np.zeros((h, w, 3), dtype=np.uint8)
|
||||
new[m:m+small.shape[0], n:n+small.shape[1]] = small
|
||||
buf = new.astype(np.float32) / 255.0
|
||||
elif transform == "rotate_cw":
|
||||
# Small clockwise rotation via affine
|
||||
angle = amt * 10 # amt=0.005 -> 0.05 degrees per frame
|
||||
cy, cx = h / 2, w / 2
|
||||
Y = np.arange(h, dtype=np.float32)[:, None]
|
||||
X = np.arange(w, dtype=np.float32)[None, :]
|
||||
cos_a, sin_a = np.cos(angle), np.sin(angle)
|
||||
sx = (X - cx) * cos_a + (Y - cy) * sin_a + cx
|
||||
sy = -(X - cx) * sin_a + (Y - cy) * cos_a + cy
|
||||
sx = np.clip(sx.astype(int), 0, w - 1)
|
||||
sy = np.clip(sy.astype(int), 0, h - 1)
|
||||
buf = buf[sy, sx]
|
||||
elif transform == "rotate_ccw":
|
||||
angle = -amt * 10
|
||||
cy, cx = h / 2, w / 2
|
||||
Y = np.arange(h, dtype=np.float32)[:, None]
|
||||
X = np.arange(w, dtype=np.float32)[None, :]
|
||||
cos_a, sin_a = np.cos(angle), np.sin(angle)
|
||||
sx = (X - cx) * cos_a + (Y - cy) * sin_a + cx
|
||||
sy = -(X - cx) * sin_a + (Y - cy) * cos_a + cy
|
||||
sx = np.clip(sx.astype(int), 0, w - 1)
|
||||
sy = np.clip(sy.astype(int), 0, h - 1)
|
||||
buf = buf[sy, sx]
|
||||
elif transform == "shift_up":
|
||||
pixels = max(1, int(h * amt))
|
||||
buf = np.roll(buf, -pixels, axis=0)
|
||||
buf[-pixels:] = 0 # black fill at bottom
|
||||
elif transform == "shift_down":
|
||||
pixels = max(1, int(h * amt))
|
||||
buf = np.roll(buf, pixels, axis=0)
|
||||
buf[:pixels] = 0
|
||||
elif transform == "mirror_h":
|
||||
buf = buf[:, ::-1]
|
||||
return buf
|
||||
|
||||
def _hue_shift(self, buf, amount):
|
||||
"""Rotate hues of the feedback buffer. Operates on float32 [0,1]."""
|
||||
rgb = np.clip(buf * 255, 0, 255).astype(np.uint8)
|
||||
hsv = np.zeros_like(buf)
|
||||
# Simple approximate RGB->HSV->shift->RGB
|
||||
r, g, b = buf[:,:,0], buf[:,:,1], buf[:,:,2]
|
||||
mx = np.maximum(np.maximum(r, g), b)
|
||||
mn = np.minimum(np.minimum(r, g), b)
|
||||
delta = mx - mn + 1e-10
|
||||
# Hue
|
||||
h = np.where(mx == r, ((g - b) / delta) % 6,
|
||||
np.where(mx == g, (b - r) / delta + 2, (r - g) / delta + 4))
|
||||
h = (h / 6 + amount) % 1.0
|
||||
# Reconstruct with shifted hue (simplified)
|
||||
s = delta / (mx + 1e-10)
|
||||
v = mx
|
||||
c = v * s; x = c * (1 - np.abs((h * 6) % 2 - 1)); m = v - c
|
||||
ro = np.zeros_like(h); go = np.zeros_like(h); bo = np.zeros_like(h)
|
||||
for lo, hi, rv, gv, bv in [(0,1,c,x,0),(1,2,x,c,0),(2,3,0,c,x),
|
||||
(3,4,0,x,c),(4,5,x,0,c),(5,6,c,0,x)]:
|
||||
mask = ((h*6) >= lo) & ((h*6) < hi)
|
||||
ro[mask] = rv[mask] if not isinstance(rv, (int,float)) else rv
|
||||
go[mask] = gv[mask] if not isinstance(gv, (int,float)) else gv
|
||||
bo[mask] = bv[mask] if not isinstance(bv, (int,float)) else bv
|
||||
return np.stack([ro+m, go+m, bo+m], axis=2)
|
||||
```
|
||||
|
||||
### Feedback Presets
|
||||
|
||||
| Preset | Config | Visual Effect |
|
||||
|--------|--------|---------------|
|
||||
| Infinite zoom tunnel | `decay=0.8, blend="screen", transform="zoom", transform_amt=0.015` | Expanding ring patterns |
|
||||
| Rainbow trails | `decay=0.7, blend="screen", transform="zoom", transform_amt=0.01, hue_shift=0.02` | Psychedelic color trails |
|
||||
| Ghostly echo | `decay=0.9, blend="add", opacity=0.15, transform="shift_up", transform_amt=0.01` | Faint upward smearing |
|
||||
| Kaleidoscopic recursion | `decay=0.75, blend="screen", transform="rotate_cw", transform_amt=0.005, hue_shift=0.01` | Rotating mandala feedback |
|
||||
| Color evolution | `decay=0.8, blend="difference", opacity=0.4, hue_shift=0.03` | Frame-to-frame color XOR |
|
||||
| Rising heat haze | `decay=0.5, blend="add", opacity=0.2, transform="shift_up", transform_amt=0.02` | Hot air shimmer |
|
||||
|
||||
---
|
||||
|
||||
## Masking / Stencil System
|
||||
|
||||
Masks are float32 arrays `(rows, cols)` or `(VH, VW)` in range [0, 1]. They control where effects are visible: 1.0 = fully visible, 0.0 = fully hidden. Use masks to create figure/ground relationships, focal points, and shaped reveals.
|
||||
|
||||
### Shape Masks
|
||||
|
||||
```python
|
||||
def mask_circle(g, cx_frac=0.5, cy_frac=0.5, radius=0.3, feather=0.05):
|
||||
"""Circular mask centered at (cx_frac, cy_frac) in normalized coords.
|
||||
feather: width of soft edge (0 = hard cutoff)."""
|
||||
asp = g.cw / g.ch if hasattr(g, 'cw') else 1.0
|
||||
dx = (g.cc / g.cols - cx_frac)
|
||||
dy = (g.rr / g.rows - cy_frac) * asp
|
||||
d = np.sqrt(dx**2 + dy**2)
|
||||
if feather > 0:
|
||||
return np.clip(1.0 - (d - radius) / feather, 0, 1)
|
||||
return (d <= radius).astype(np.float32)
|
||||
|
||||
def mask_rect(g, x0=0.2, y0=0.2, x1=0.8, y1=0.8, feather=0.03):
|
||||
"""Rectangular mask. Coordinates in [0,1] normalized."""
|
||||
dx = np.maximum(x0 - g.cc / g.cols, g.cc / g.cols - x1)
|
||||
dy = np.maximum(y0 - g.rr / g.rows, g.rr / g.rows - y1)
|
||||
d = np.maximum(dx, dy)
|
||||
if feather > 0:
|
||||
return np.clip(1.0 - d / feather, 0, 1)
|
||||
return (d <= 0).astype(np.float32)
|
||||
|
||||
def mask_ring(g, cx_frac=0.5, cy_frac=0.5, inner_r=0.15, outer_r=0.35,
|
||||
feather=0.03):
|
||||
"""Ring / annulus mask."""
|
||||
inner = mask_circle(g, cx_frac, cy_frac, inner_r, feather)
|
||||
outer = mask_circle(g, cx_frac, cy_frac, outer_r, feather)
|
||||
return outer - inner
|
||||
|
||||
def mask_gradient_h(g, start=0.0, end=1.0):
|
||||
"""Left-to-right gradient mask."""
|
||||
return np.clip((g.cc / g.cols - start) / (end - start + 1e-10), 0, 1).astype(np.float32)
|
||||
|
||||
def mask_gradient_v(g, start=0.0, end=1.0):
|
||||
"""Top-to-bottom gradient mask."""
|
||||
return np.clip((g.rr / g.rows - start) / (end - start + 1e-10), 0, 1).astype(np.float32)
|
||||
|
||||
def mask_gradient_radial(g, cx_frac=0.5, cy_frac=0.5, inner=0.0, outer=0.5):
|
||||
"""Radial gradient mask — bright at center, dark at edges."""
|
||||
d = np.sqrt((g.cc / g.cols - cx_frac)**2 + (g.rr / g.rows - cy_frac)**2)
|
||||
return np.clip(1.0 - (d - inner) / (outer - inner + 1e-10), 0, 1)
|
||||
```
|
||||
|
||||
### Value Field as Mask
|
||||
|
||||
Use any `vf_*` function's output as a spatial mask:
|
||||
|
||||
```python
|
||||
def mask_from_vf(vf_result, threshold=0.5, feather=0.1):
|
||||
"""Convert a value field to a mask by thresholding.
|
||||
feather: smooth edge width around threshold."""
|
||||
if feather > 0:
|
||||
return np.clip((vf_result - threshold + feather) / (2 * feather), 0, 1)
|
||||
return (vf_result > threshold).astype(np.float32)
|
||||
|
||||
def mask_select(mask, vf_a, vf_b):
|
||||
"""Spatial conditional: show vf_a where mask is 1, vf_b where mask is 0.
|
||||
mask: float32 [0,1] array. Intermediate values blend."""
|
||||
return vf_a * mask + vf_b * (1 - mask)
|
||||
```
|
||||
|
||||
### Text Stencil
|
||||
|
||||
Render text to a mask. Effects are visible only through the letterforms:
|
||||
|
||||
```python
|
||||
def mask_text(grid, text, row_frac=0.5, font=None, font_size=None):
|
||||
"""Render text string as a float32 mask [0,1] at grid resolution.
|
||||
Characters = 1.0, background = 0.0.
|
||||
|
||||
row_frac: vertical position as fraction of grid height.
|
||||
font: PIL ImageFont (defaults to grid's font if None).
|
||||
font_size: override font size for the mask text (for larger stencil text).
|
||||
"""
|
||||
from PIL import Image, ImageDraw, ImageFont
|
||||
|
||||
f = font or grid.font
|
||||
if font_size and font != grid.font:
|
||||
f = ImageFont.truetype(font.path, font_size)
|
||||
|
||||
# Render text to image at pixel resolution, then downsample to grid
|
||||
img = Image.new("L", (grid.cols * grid.cw, grid.ch), 0)
|
||||
draw = ImageDraw.Draw(img)
|
||||
bbox = draw.textbbox((0, 0), text, font=f)
|
||||
tw = bbox[2] - bbox[0]
|
||||
x = (grid.cols * grid.cw - tw) // 2
|
||||
draw.text((x, 0), text, fill=255, font=f)
|
||||
row_mask = np.array(img, dtype=np.float32) / 255.0
|
||||
|
||||
# Place in full grid mask
|
||||
mask = np.zeros((grid.rows, grid.cols), dtype=np.float32)
|
||||
target_row = int(grid.rows * row_frac)
|
||||
# Downsample rendered text to grid cells
|
||||
for c in range(grid.cols):
|
||||
px = c * grid.cw
|
||||
if px + grid.cw <= row_mask.shape[1]:
|
||||
cell = row_mask[:, px:px + grid.cw]
|
||||
if cell.mean() > 0.1:
|
||||
mask[target_row, c] = cell.mean()
|
||||
return mask
|
||||
|
||||
def mask_text_block(grid, lines, start_row_frac=0.3, font=None):
|
||||
"""Multi-line text stencil. Returns full grid mask."""
|
||||
mask = np.zeros((grid.rows, grid.cols), dtype=np.float32)
|
||||
for i, line in enumerate(lines):
|
||||
row_frac = start_row_frac + i / grid.rows
|
||||
line_mask = mask_text(grid, line, row_frac, font)
|
||||
mask = np.maximum(mask, line_mask)
|
||||
return mask
|
||||
```
|
||||
|
||||
### Animated Masks
|
||||
|
||||
Masks that change over time for reveals, wipes, and morphing:
|
||||
|
||||
```python
|
||||
def mask_iris(g, t, t_start, t_end, cx_frac=0.5, cy_frac=0.5,
|
||||
max_radius=0.7, ease_fn=None):
|
||||
"""Iris open/close: circle that grows from 0 to max_radius.
|
||||
ease_fn: easing function (default: ease_in_out_cubic from effects.md)."""
|
||||
if ease_fn is None:
|
||||
ease_fn = lambda x: x * x * (3 - 2 * x) # smoothstep fallback
|
||||
progress = np.clip((t - t_start) / (t_end - t_start), 0, 1)
|
||||
radius = ease_fn(progress) * max_radius
|
||||
return mask_circle(g, cx_frac, cy_frac, radius, feather=0.03)
|
||||
|
||||
def mask_wipe_h(g, t, t_start, t_end, direction="right"):
|
||||
"""Horizontal wipe reveal."""
|
||||
progress = np.clip((t - t_start) / (t_end - t_start), 0, 1)
|
||||
if direction == "left":
|
||||
progress = 1 - progress
|
||||
return mask_gradient_h(g, start=progress - 0.05, end=progress + 0.05)
|
||||
|
||||
def mask_wipe_v(g, t, t_start, t_end, direction="down"):
|
||||
"""Vertical wipe reveal."""
|
||||
progress = np.clip((t - t_start) / (t_end - t_start), 0, 1)
|
||||
if direction == "up":
|
||||
progress = 1 - progress
|
||||
return mask_gradient_v(g, start=progress - 0.05, end=progress + 0.05)
|
||||
|
||||
def mask_dissolve(g, t, t_start, t_end, seed=42):
|
||||
"""Random pixel dissolve — noise threshold sweeps from 0 to 1."""
|
||||
progress = np.clip((t - t_start) / (t_end - t_start), 0, 1)
|
||||
rng = np.random.RandomState(seed)
|
||||
noise = rng.random((g.rows, g.cols)).astype(np.float32)
|
||||
return (noise < progress).astype(np.float32)
|
||||
```
|
||||
|
||||
### Mask Boolean Operations
|
||||
|
||||
```python
|
||||
def mask_union(a, b):
|
||||
"""OR — visible where either mask is active."""
|
||||
return np.maximum(a, b)
|
||||
|
||||
def mask_intersect(a, b):
|
||||
"""AND — visible only where both masks are active."""
|
||||
return np.minimum(a, b)
|
||||
|
||||
def mask_subtract(a, b):
|
||||
"""A minus B — visible where A is active but B is not."""
|
||||
return np.clip(a - b, 0, 1)
|
||||
|
||||
def mask_invert(m):
|
||||
"""NOT — flip mask."""
|
||||
return 1.0 - m
|
||||
```
|
||||
|
||||
### Applying Masks to Canvases
|
||||
|
||||
```python
|
||||
def apply_mask_canvas(canvas, mask, bg_canvas=None):
|
||||
"""Apply a grid-resolution mask to a pixel canvas.
|
||||
Expands mask from (rows, cols) to (VH, VW) via nearest-neighbor.
|
||||
|
||||
canvas: uint8 (VH, VW, 3)
|
||||
mask: float32 (rows, cols) [0,1]
|
||||
bg_canvas: what shows through where mask=0. None = black.
|
||||
"""
|
||||
# Expand mask to pixel resolution
|
||||
mask_px = np.repeat(np.repeat(mask, canvas.shape[0] // mask.shape[0] + 1, axis=0),
|
||||
canvas.shape[1] // mask.shape[1] + 1, axis=1)
|
||||
mask_px = mask_px[:canvas.shape[0], :canvas.shape[1]]
|
||||
|
||||
if bg_canvas is not None:
|
||||
return np.clip(canvas * mask_px[:, :, None] +
|
||||
bg_canvas * (1 - mask_px[:, :, None]), 0, 255).astype(np.uint8)
|
||||
return np.clip(canvas * mask_px[:, :, None], 0, 255).astype(np.uint8)
|
||||
|
||||
def apply_mask_vf(vf_a, vf_b, mask):
|
||||
"""Apply mask at value-field level — blend two value fields spatially.
|
||||
All arrays are (rows, cols) float32."""
|
||||
return vf_a * mask + vf_b * (1 - mask)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## PixelBlendStack
|
||||
|
||||
Higher-level wrapper for multi-layer compositing:
|
||||
|
||||
```python
|
||||
class PixelBlendStack:
|
||||
def __init__(self):
|
||||
self.layers = []
|
||||
|
||||
def add(self, canvas, mode="normal", opacity=1.0):
|
||||
self.layers.append((canvas, mode, opacity))
|
||||
return self
|
||||
|
||||
def composite(self):
|
||||
if not self.layers:
|
||||
return np.zeros((VH, VW, 3), dtype=np.uint8)
|
||||
result = self.layers[0][0]
|
||||
for canvas, mode, opacity in self.layers[1:]:
|
||||
result = blend_canvas(result, canvas, mode, opacity)
|
||||
return result
|
||||
```
|
||||
|
||||
## Text Backdrop (Readability Mask)
|
||||
|
||||
When placing readable text over busy multi-grid ASCII backgrounds, the text will blend into the background and become illegible. **Always apply a dark backdrop behind text regions.**
|
||||
|
||||
The technique: compute the bounding box of all text glyphs, create a gaussian-blurred dark mask covering that area with padding, and multiply the background by `(1 - mask * darkness)` before rendering text on top.
|
||||
|
||||
```python
|
||||
from scipy.ndimage import gaussian_filter
|
||||
|
||||
def apply_text_backdrop(canvas, glyphs, padding=80, darkness=0.75):
|
||||
"""Darken the background behind text for readability.
|
||||
|
||||
Call AFTER rendering background, BEFORE rendering text.
|
||||
|
||||
Args:
|
||||
canvas: (VH, VW, 3) uint8 background
|
||||
glyphs: list of {"x": float, "y": float, ...} glyph positions
|
||||
padding: pixel padding around text bounding box
|
||||
darkness: 0.0 = no darkening, 1.0 = fully black
|
||||
Returns:
|
||||
darkened canvas (uint8)
|
||||
"""
|
||||
if not glyphs:
|
||||
return canvas
|
||||
xs = [g['x'] for g in glyphs]
|
||||
ys = [g['y'] for g in glyphs]
|
||||
x0 = max(0, int(min(xs)) - padding)
|
||||
y0 = max(0, int(min(ys)) - padding)
|
||||
x1 = min(VW, int(max(xs)) + padding + 50) # extra for char width
|
||||
y1 = min(VH, int(max(ys)) + padding + 60) # extra for char height
|
||||
|
||||
# Soft dark mask with gaussian blur for feathered edges
|
||||
mask = np.zeros((VH, VW), dtype=np.float32)
|
||||
mask[y0:y1, x0:x1] = 1.0
|
||||
mask = gaussian_filter(mask, sigma=padding * 0.6)
|
||||
|
||||
factor = 1.0 - mask * darkness
|
||||
return (canvas.astype(np.float32) * factor[:, :, np.newaxis]).astype(np.uint8)
|
||||
```
|
||||
|
||||
### Usage in render pipeline
|
||||
|
||||
Insert between background rendering and text rendering:
|
||||
|
||||
```python
|
||||
# 1. Render background (multi-grid ASCII effects)
|
||||
bg = render_background(cfg, t)
|
||||
|
||||
# 2. Darken behind text region
|
||||
bg = apply_text_backdrop(bg, frame_glyphs, padding=80, darkness=0.75)
|
||||
|
||||
# 3. Render text on top (now readable against dark backdrop)
|
||||
bg = text_renderer.render(bg, frame_glyphs, color=(255, 255, 255))
|
||||
```
|
||||
|
||||
Combine with **reverse vignette** (see shaders.md) for scenes where text is always centered — the reverse vignette provides a persistent center-dark zone, while the backdrop handles per-frame glyph positions.
|
||||
|
||||
## External Layout Oracle Pattern
|
||||
|
||||
For text-heavy videos where text needs to dynamically reflow around obstacles (shapes, icons, other text), use an external layout engine to pre-compute glyph positions and feed them into the Python renderer via JSON.
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Layout Engine (browser/Node.js) → layouts.json → Python ASCII Renderer
|
||||
↑ ↑
|
||||
Computes per-frame Reads glyph positions,
|
||||
glyph (x,y) positions renders as ASCII chars
|
||||
with obstacle-aware reflow with full effect pipeline
|
||||
```
|
||||
|
||||
### JSON interchange format
|
||||
|
||||
```json
|
||||
{
|
||||
"meta": {
|
||||
"canvas_width": 1080, "canvas_height": 1080,
|
||||
"fps": 24, "total_frames": 1248,
|
||||
"fonts": {
|
||||
"body": {"charW": 12.04, "charH": 24, "fontSize": 20},
|
||||
"hero": {"charW": 24.08, "charH": 48, "fontSize": 40}
|
||||
}
|
||||
},
|
||||
"scenes": [
|
||||
{
|
||||
"id": "scene_name",
|
||||
"start_frame": 0, "end_frame": 96,
|
||||
"frames": {
|
||||
"0": {
|
||||
"glyphs": [
|
||||
{"char": "H", "x": 287.1, "y": 400.0, "alpha": 1.0},
|
||||
{"char": "e", "x": 311.2, "y": 400.0, "alpha": 1.0}
|
||||
],
|
||||
"obstacles": [
|
||||
{"type": "circle", "cx": 540, "cy": 540, "r": 80},
|
||||
{"type": "rect", "x": 300, "y": 500, "w": 120, "h": 80}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### When to use
|
||||
|
||||
- Text that dynamically reflows around moving objects
|
||||
- Per-glyph animation (reveal, scatter, physics)
|
||||
- Variable typography that needs precise measurement
|
||||
- Any case where Python's Pillow text layout is insufficient
|
||||
|
||||
### When NOT to use
|
||||
|
||||
- Static centered text (just use PIL `draw.text()` directly)
|
||||
- Text that only fades in/out without spatial animation
|
||||
- Simple typewriter effects (handle in Python with a character counter)
|
||||
|
||||
### Running the oracle
|
||||
|
||||
Use Playwright to run the layout engine in a headless browser:
|
||||
|
||||
```javascript
|
||||
// extract.mjs
|
||||
import { chromium } from 'playwright';
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const page = await browser.newPage();
|
||||
await page.goto(`file://${oraclePath}`);
|
||||
await page.waitForFunction(() => window.__ORACLE_DONE__ === true, null, { timeout: 60000 });
|
||||
const result = await page.evaluate(() => window.__ORACLE_RESULT__);
|
||||
writeFileSync('layouts.json', JSON.stringify(result));
|
||||
await browser.close();
|
||||
```
|
||||
|
||||
### Consuming in Python
|
||||
|
||||
```python
|
||||
# In the renderer, map pixel positions to the canvas:
|
||||
for glyph in frame_data['glyphs']:
|
||||
char, px, py = glyph['char'], glyph['x'], glyph['y']
|
||||
alpha = glyph.get('alpha', 1.0)
|
||||
# Render using PIL draw.text() at exact pixel position
|
||||
draw.text((px, py), char, fill=(int(255*alpha),)*3, font=font)
|
||||
```
|
||||
|
||||
Obstacles from the JSON can also be rendered as glowing ASCII shapes (circles, rectangles) to visualize the reflow zones.
|
||||
1865
skills/creative/ascii-video/references/effects.md
Normal file
1865
skills/creative/ascii-video/references/effects.md
Normal file
File diff suppressed because it is too large
Load Diff
685
skills/creative/ascii-video/references/inputs.md
Normal file
685
skills/creative/ascii-video/references/inputs.md
Normal file
@@ -0,0 +1,685 @@
|
||||
# Input Sources
|
||||
|
||||
> **See also:** architecture.md · effects.md · scenes.md · shaders.md · optimization.md · troubleshooting.md
|
||||
|
||||
## Audio Analysis
|
||||
|
||||
### Loading
|
||||
|
||||
```python
|
||||
tmp = tempfile.mktemp(suffix=".wav")
|
||||
subprocess.run(["ffmpeg", "-y", "-i", input_path, "-ac", "1", "-ar", "22050",
|
||||
"-sample_fmt", "s16", tmp], capture_output=True, check=True)
|
||||
with wave.open(tmp) as wf:
|
||||
sr = wf.getframerate()
|
||||
raw = wf.readframes(wf.getnframes())
|
||||
samples = np.frombuffer(raw, dtype=np.int16).astype(np.float32) / 32768.0
|
||||
```
|
||||
|
||||
### Per-Frame FFT
|
||||
|
||||
```python
|
||||
hop = sr // fps # samples per frame
|
||||
win = hop * 2 # analysis window (2x hop for overlap)
|
||||
window = np.hanning(win)
|
||||
freqs = rfftfreq(win, 1.0 / sr)
|
||||
|
||||
bands = {
|
||||
"sub": (freqs >= 20) & (freqs < 80),
|
||||
"bass": (freqs >= 80) & (freqs < 250),
|
||||
"lomid": (freqs >= 250) & (freqs < 500),
|
||||
"mid": (freqs >= 500) & (freqs < 2000),
|
||||
"himid": (freqs >= 2000)& (freqs < 6000),
|
||||
"hi": (freqs >= 6000),
|
||||
}
|
||||
```
|
||||
|
||||
For each frame: extract chunk, apply window, FFT, compute band energies.
|
||||
|
||||
### Feature Set
|
||||
|
||||
| Feature | Formula | Controls |
|
||||
|---------|---------|----------|
|
||||
| `rms` | `sqrt(mean(chunk²))` | Overall loudness/energy |
|
||||
| `sub`..`hi` | `sqrt(mean(band_magnitudes²))` | Per-band energy |
|
||||
| `centroid` | `sum(freq*mag) / sum(mag)` | Brightness/timbre |
|
||||
| `flatness` | `geomean(mag) / mean(mag)` | Noise vs tone |
|
||||
| `flux` | `sum(max(0, mag - prev_mag))` | Transient strength |
|
||||
| `sub_r`..`hi_r` | `band / sum(all_bands)` | Spectral shape (volume-independent) |
|
||||
| `cent_d` | `abs(gradient(centroid))` | Timbral change rate |
|
||||
| `beat` | Flux peak detection | Binary beat onset |
|
||||
| `bdecay` | Exponential decay from beats | Smooth beat pulse (0→1→0) |
|
||||
|
||||
**Band ratios are critical** — they decouple spectral shape from volume, so a quiet bass section and a loud bass section both read as "bassy" rather than just "loud" vs "quiet".
|
||||
|
||||
### Smoothing
|
||||
|
||||
EMA prevents visual jitter:
|
||||
|
||||
```python
|
||||
def ema(arr, alpha):
|
||||
out = np.empty_like(arr); out[0] = arr[0]
|
||||
for i in range(1, len(arr)):
|
||||
out[i] = alpha * arr[i] + (1 - alpha) * out[i-1]
|
||||
return out
|
||||
|
||||
# Slow-moving features (alpha=0.12): centroid, flatness, band ratios, cent_d
|
||||
# Fast-moving features (alpha=0.3): rms, flux, raw bands
|
||||
```
|
||||
|
||||
### Beat Detection
|
||||
|
||||
```python
|
||||
flux_smooth = np.convolve(flux, np.ones(5)/5, mode="same")
|
||||
peaks, _ = signal.find_peaks(flux_smooth, height=0.15, distance=fps//5, prominence=0.05)
|
||||
|
||||
beat = np.zeros(n_frames)
|
||||
bdecay = np.zeros(n_frames, dtype=np.float32)
|
||||
for p in peaks:
|
||||
beat[p] = 1.0
|
||||
for d in range(fps // 2):
|
||||
if p + d < n_frames:
|
||||
bdecay[p + d] = max(bdecay[p + d], math.exp(-d * 2.5 / (fps // 2)))
|
||||
```
|
||||
|
||||
`bdecay` gives smooth 0→1→0 pulse per beat, decaying over ~0.5s. Use for flash/glitch/mirror triggers.
|
||||
|
||||
### Normalization
|
||||
|
||||
After computing all frames, normalize each feature to 0-1:
|
||||
|
||||
```python
|
||||
for k in features:
|
||||
a = features[k]
|
||||
lo, hi = a.min(), a.max()
|
||||
features[k] = (a - lo) / (hi - lo + 1e-10)
|
||||
```
|
||||
|
||||
## Video Sampling
|
||||
|
||||
### Frame Extraction
|
||||
|
||||
```python
|
||||
# Method 1: ffmpeg pipe (memory efficient)
|
||||
cmd = ["ffmpeg", "-i", input_video, "-f", "rawvideo", "-pix_fmt", "rgb24",
|
||||
"-s", f"{target_w}x{target_h}", "-r", str(fps), "-"]
|
||||
pipe = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
|
||||
frame_size = target_w * target_h * 3
|
||||
for fi in range(n_frames):
|
||||
raw = pipe.stdout.read(frame_size)
|
||||
if len(raw) < frame_size: break
|
||||
frame = np.frombuffer(raw, dtype=np.uint8).reshape(target_h, target_w, 3)
|
||||
# process frame...
|
||||
|
||||
# Method 2: OpenCV (if available)
|
||||
cap = cv2.VideoCapture(input_video)
|
||||
```
|
||||
|
||||
### Luminance-to-Character Mapping
|
||||
|
||||
Convert video pixels to ASCII characters based on brightness:
|
||||
|
||||
```python
|
||||
def frame_to_ascii(frame_rgb, grid, pal=PAL_DEFAULT):
|
||||
"""Convert video frame to character + color arrays."""
|
||||
rows, cols = grid.rows, grid.cols
|
||||
# Resize frame to grid dimensions
|
||||
small = np.array(Image.fromarray(frame_rgb).resize((cols, rows), Image.LANCZOS))
|
||||
# Luminance
|
||||
lum = (0.299 * small[:,:,0] + 0.587 * small[:,:,1] + 0.114 * small[:,:,2]) / 255.0
|
||||
# Map to chars
|
||||
chars = val2char(lum, lum > 0.02, pal)
|
||||
# Colors: use source pixel colors, scaled by luminance for visibility
|
||||
colors = np.clip(small * np.clip(lum[:,:,None] * 1.5 + 0.3, 0.3, 1), 0, 255).astype(np.uint8)
|
||||
return chars, colors
|
||||
```
|
||||
|
||||
### Edge-Weighted Character Mapping
|
||||
|
||||
Use edge detection for more detail in contour regions:
|
||||
|
||||
```python
|
||||
def frame_to_ascii_edges(frame_rgb, grid, pal=PAL_DEFAULT, edge_pal=PAL_BOX):
|
||||
gray = np.mean(frame_rgb, axis=2)
|
||||
small_gray = resize(gray, (grid.rows, grid.cols))
|
||||
lum = small_gray / 255.0
|
||||
|
||||
# Sobel edge detection
|
||||
gx = np.abs(small_gray[:, 2:] - small_gray[:, :-2])
|
||||
gy = np.abs(small_gray[2:, :] - small_gray[:-2, :])
|
||||
edge = np.zeros_like(small_gray)
|
||||
edge[:, 1:-1] += gx; edge[1:-1, :] += gy
|
||||
edge = np.clip(edge / edge.max(), 0, 1)
|
||||
|
||||
# Edge regions get box drawing chars, flat regions get brightness chars
|
||||
is_edge = edge > 0.15
|
||||
chars = val2char(lum, lum > 0.02, pal)
|
||||
edge_chars = val2char(edge, is_edge, edge_pal)
|
||||
chars[is_edge] = edge_chars[is_edge]
|
||||
|
||||
return chars, colors
|
||||
```
|
||||
|
||||
### Motion Detection
|
||||
|
||||
Detect pixel changes between frames for motion-reactive effects:
|
||||
|
||||
```python
|
||||
prev_frame = None
|
||||
def compute_motion(frame):
|
||||
global prev_frame
|
||||
if prev_frame is None:
|
||||
prev_frame = frame.astype(np.float32)
|
||||
return np.zeros(frame.shape[:2])
|
||||
diff = np.abs(frame.astype(np.float32) - prev_frame).mean(axis=2)
|
||||
prev_frame = frame.astype(np.float32) * 0.7 + prev_frame * 0.3 # smoothed
|
||||
return np.clip(diff / 30.0, 0, 1) # normalized motion map
|
||||
```
|
||||
|
||||
Use motion map to drive particle emission, glitch intensity, or character density.
|
||||
|
||||
### Video Feature Extraction
|
||||
|
||||
Per-frame features analogous to audio features, for driving effects:
|
||||
|
||||
```python
|
||||
def analyze_video_frame(frame_rgb):
|
||||
gray = np.mean(frame_rgb, axis=2)
|
||||
return {
|
||||
"brightness": gray.mean() / 255.0,
|
||||
"contrast": gray.std() / 128.0,
|
||||
"edge_density": compute_edge_density(gray),
|
||||
"motion": compute_motion(frame_rgb).mean(),
|
||||
"dominant_hue": compute_dominant_hue(frame_rgb),
|
||||
"color_variance": compute_color_variance(frame_rgb),
|
||||
}
|
||||
```
|
||||
|
||||
## Image Sequence
|
||||
|
||||
### Static Image to ASCII
|
||||
|
||||
Same as single video frame conversion. For animated sequences:
|
||||
|
||||
```python
|
||||
import glob
|
||||
frames = sorted(glob.glob("frames/*.png"))
|
||||
for fi, path in enumerate(frames):
|
||||
img = np.array(Image.open(path).resize((VW, VH)))
|
||||
chars, colors = frame_to_ascii(img, grid, pal)
|
||||
```
|
||||
|
||||
### Image as Texture Source
|
||||
|
||||
Use an image as a background texture that effects modulate:
|
||||
|
||||
```python
|
||||
def load_texture(path, grid):
|
||||
img = np.array(Image.open(path).resize((grid.cols, grid.rows)))
|
||||
lum = np.mean(img, axis=2) / 255.0
|
||||
return lum, img # luminance for char mapping, RGB for colors
|
||||
```
|
||||
|
||||
## Text / Lyrics
|
||||
|
||||
### SRT Parsing
|
||||
|
||||
```python
|
||||
import re
|
||||
def parse_srt(path):
|
||||
"""Returns [(start_sec, end_sec, text), ...]"""
|
||||
entries = []
|
||||
with open(path) as f:
|
||||
content = f.read()
|
||||
blocks = content.strip().split("\n\n")
|
||||
for block in blocks:
|
||||
lines = block.strip().split("\n")
|
||||
if len(lines) >= 3:
|
||||
times = lines[1]
|
||||
m = re.match(r"(\d+):(\d+):(\d+),(\d+) --> (\d+):(\d+):(\d+),(\d+)", times)
|
||||
if m:
|
||||
g = [int(x) for x in m.groups()]
|
||||
start = g[0]*3600 + g[1]*60 + g[2] + g[3]/1000
|
||||
end = g[4]*3600 + g[5]*60 + g[6] + g[7]/1000
|
||||
text = " ".join(lines[2:])
|
||||
entries.append((start, end, text))
|
||||
return entries
|
||||
```
|
||||
|
||||
### Lyrics Display Modes
|
||||
|
||||
- **Typewriter**: characters appear left-to-right over the time window
|
||||
- **Fade-in**: whole line fades from dark to bright
|
||||
- **Flash**: appear instantly on beat, fade out
|
||||
- **Scatter**: characters start at random positions, converge to final position
|
||||
- **Wave**: text follows a sine wave path
|
||||
|
||||
```python
|
||||
def lyrics_typewriter(ch, co, text, row, col, t, t_start, t_end, color):
|
||||
"""Reveal characters progressively over time window."""
|
||||
progress = np.clip((t - t_start) / (t_end - t_start), 0, 1)
|
||||
n_visible = int(len(text) * progress)
|
||||
stamp(ch, co, text[:n_visible], row, col, color)
|
||||
```
|
||||
|
||||
## Generative (No Input)
|
||||
|
||||
For pure generative ASCII art, the "features" dict is synthesized from time:
|
||||
|
||||
```python
|
||||
def synthetic_features(t, bpm=120):
|
||||
"""Generate audio-like features from time alone."""
|
||||
beat_period = 60.0 / bpm
|
||||
beat_phase = (t % beat_period) / beat_period
|
||||
return {
|
||||
"rms": 0.5 + 0.3 * math.sin(t * 0.5),
|
||||
"bass": 0.5 + 0.4 * math.sin(t * 2 * math.pi / beat_period),
|
||||
"sub": 0.3 + 0.3 * math.sin(t * 0.8),
|
||||
"mid": 0.4 + 0.3 * math.sin(t * 1.3),
|
||||
"hi": 0.3 + 0.2 * math.sin(t * 2.1),
|
||||
"cent": 0.5 + 0.2 * math.sin(t * 0.3),
|
||||
"flat": 0.4,
|
||||
"flux": 0.3 + 0.2 * math.sin(t * 3),
|
||||
"beat": 1.0 if beat_phase < 0.05 else 0.0,
|
||||
"bdecay": max(0, 1.0 - beat_phase * 4),
|
||||
# ratios
|
||||
"sub_r": 0.2, "bass_r": 0.25, "lomid_r": 0.15,
|
||||
"mid_r": 0.2, "himid_r": 0.12, "hi_r": 0.08,
|
||||
"cent_d": 0.1,
|
||||
}
|
||||
```
|
||||
|
||||
## TTS Integration
|
||||
|
||||
For narrated videos (testimonials, quotes, storytelling), generate speech audio per segment and mix with background music.
|
||||
|
||||
### ElevenLabs Voice Generation
|
||||
|
||||
```python
|
||||
import requests, time, os
|
||||
|
||||
def generate_tts(text, voice_id, api_key, output_path, model="eleven_multilingual_v2"):
|
||||
"""Generate TTS audio via ElevenLabs API. Streams response to disk."""
|
||||
# Skip if already generated (idempotent re-runs)
|
||||
if os.path.exists(output_path) and os.path.getsize(output_path) > 1000:
|
||||
return
|
||||
|
||||
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"
|
||||
headers = {"xi-api-key": api_key, "Content-Type": "application/json"}
|
||||
data = {
|
||||
"text": text,
|
||||
"model_id": model,
|
||||
"voice_settings": {
|
||||
"stability": 0.65,
|
||||
"similarity_boost": 0.80,
|
||||
"style": 0.15,
|
||||
"use_speaker_boost": True,
|
||||
},
|
||||
}
|
||||
resp = requests.post(url, json=data, headers=headers, stream=True)
|
||||
resp.raise_for_status()
|
||||
with open(output_path, "wb") as f:
|
||||
for chunk in resp.iter_content(chunk_size=4096):
|
||||
f.write(chunk)
|
||||
time.sleep(0.3) # rate limit: avoid 429s on batch generation
|
||||
```
|
||||
|
||||
Voice settings notes:
|
||||
- `stability` 0.65 gives natural variation without drift. Lower (0.3-0.5) for more expressive reads, higher (0.7-0.9) for monotone/narration.
|
||||
- `similarity_boost` 0.80 keeps it close to the voice profile. Lower for more generic sound.
|
||||
- `style` 0.15 adds slight stylistic variation. Keep low (0-0.2) for straightforward reads.
|
||||
- `use_speaker_boost` True improves clarity at the cost of slightly more processing time.
|
||||
|
||||
### Voice Pool
|
||||
|
||||
ElevenLabs has ~20 built-in voices. Use multiple voices for variety across quotes. Reference pool:
|
||||
|
||||
```python
|
||||
VOICE_POOL = [
|
||||
("JBFqnCBsd6RMkjVDRZzb", "George"),
|
||||
("nPczCjzI2devNBz1zQrb", "Brian"),
|
||||
("pqHfZKP75CvOlQylNhV4", "Bill"),
|
||||
("CwhRBWXzGAHq8TQ4Fs17", "Roger"),
|
||||
("cjVigY5qzO86Huf0OWal", "Eric"),
|
||||
("onwK4e9ZLuTAKqWW03F9", "Daniel"),
|
||||
("IKne3meq5aSn9XLyUdCD", "Charlie"),
|
||||
("iP95p4xoKVk53GoZ742B", "Chris"),
|
||||
("bIHbv24MWmeRgasZH58o", "Will"),
|
||||
("TX3LPaxmHKxFdv7VOQHJ", "Liam"),
|
||||
("SAz9YHcvj6GT2YYXdXww", "River"),
|
||||
("EXAVITQu4vr4xnSDxMaL", "Sarah"),
|
||||
("Xb7hH8MSUJpSbSDYk0k2", "Alice"),
|
||||
("pFZP5JQG7iQjIQuC4Bku", "Lily"),
|
||||
("XrExE9yKIg1WjnnlVkGX", "Matilda"),
|
||||
("FGY2WhTYpPnrIDTdsKH5", "Laura"),
|
||||
("SOYHLrjzK2X1ezoPC6cr", "Harry"),
|
||||
("hpp4J3VqNfWAUOO0d1Us", "Bella"),
|
||||
("N2lVS1w4EtoT3dr4eOWO", "Callum"),
|
||||
("cgSgspJ2msm6clMCkdW9", "Jessica"),
|
||||
("pNInz6obpgDQGcFmaJgB", "Adam"),
|
||||
]
|
||||
```
|
||||
|
||||
### Voice Assignment
|
||||
|
||||
Shuffle deterministically so re-runs produce the same voice mapping:
|
||||
|
||||
```python
|
||||
import random as _rng
|
||||
|
||||
def assign_voices(n_quotes, voice_pool, seed=42):
|
||||
"""Assign a different voice to each quote, cycling if needed."""
|
||||
r = _rng.Random(seed)
|
||||
ids = [v[0] for v in voice_pool]
|
||||
r.shuffle(ids)
|
||||
return [ids[i % len(ids)] for i in range(n_quotes)]
|
||||
```
|
||||
|
||||
### Pronunciation Control
|
||||
|
||||
TTS text must be separate from display text. The display text has line breaks for visual layout; the TTS text is a flat sentence with phonetic fixes.
|
||||
|
||||
Common fixes:
|
||||
- Brand names: spell phonetically ("Nous" -> "Noose", "nginx" -> "engine-x")
|
||||
- Abbreviations: expand ("API" -> "A P I", "CLI" -> "C L I")
|
||||
- Technical terms: add phonetic hints
|
||||
- Punctuation for pacing: periods create pauses, commas create slight pauses
|
||||
|
||||
```python
|
||||
# Display text: line breaks control visual layout
|
||||
QUOTES = [
|
||||
("It can do far more than the Claws,\nand you don't need to buy a Mac Mini.\nNous Research has a winner here.", "Brian Roemmele"),
|
||||
]
|
||||
|
||||
# TTS text: flat, phonetically corrected for speech
|
||||
QUOTES_TTS = [
|
||||
"It can do far more than the Claws, and you don't need to buy a Mac Mini. Noose Research has a winner here.",
|
||||
]
|
||||
# Keep both arrays in sync -- same indices
|
||||
```
|
||||
|
||||
### Audio Pipeline
|
||||
|
||||
1. Generate individual TTS clips (MP3 per quote, skipping existing)
|
||||
2. Convert each to WAV (mono, 22050 Hz) for duration measurement and concatenation
|
||||
3. Calculate timing: intro pad + speech + gaps + outro pad = target duration
|
||||
4. Concatenate into single TTS track with silence padding
|
||||
5. Mix with background music
|
||||
|
||||
```python
|
||||
def build_tts_track(tts_clips, target_duration, intro_pad=5.0, outro_pad=4.0):
|
||||
"""Concatenate TTS clips with calculated gaps, pad to target duration.
|
||||
|
||||
Returns:
|
||||
timing: list of (start_time, end_time, quote_index) tuples
|
||||
"""
|
||||
sr = 22050
|
||||
|
||||
# Convert MP3s to WAV for duration and sample-level concatenation
|
||||
durations = []
|
||||
for clip in tts_clips:
|
||||
wav = clip.replace(".mp3", ".wav")
|
||||
subprocess.run(
|
||||
["ffmpeg", "-y", "-i", clip, "-ac", "1", "-ar", str(sr),
|
||||
"-sample_fmt", "s16", wav],
|
||||
capture_output=True, check=True)
|
||||
result = subprocess.run(
|
||||
["ffprobe", "-v", "error", "-show_entries", "format=duration",
|
||||
"-of", "csv=p=0", wav],
|
||||
capture_output=True, text=True)
|
||||
durations.append(float(result.stdout.strip()))
|
||||
|
||||
# Calculate gap to fill target duration
|
||||
total_speech = sum(durations)
|
||||
n_gaps = len(tts_clips) - 1
|
||||
remaining = target_duration - total_speech - intro_pad - outro_pad
|
||||
gap = max(1.0, remaining / max(1, n_gaps))
|
||||
|
||||
# Build timing and concatenate samples
|
||||
timing = []
|
||||
t = intro_pad
|
||||
all_audio = [np.zeros(int(sr * intro_pad), dtype=np.int16)]
|
||||
|
||||
for i, dur in enumerate(durations):
|
||||
wav = tts_clips[i].replace(".mp3", ".wav")
|
||||
with wave.open(wav) as wf:
|
||||
samples = np.frombuffer(wf.readframes(wf.getnframes()), dtype=np.int16)
|
||||
timing.append((t, t + dur, i))
|
||||
all_audio.append(samples)
|
||||
t += dur
|
||||
if i < len(tts_clips) - 1:
|
||||
all_audio.append(np.zeros(int(sr * gap), dtype=np.int16))
|
||||
t += gap
|
||||
|
||||
all_audio.append(np.zeros(int(sr * outro_pad), dtype=np.int16))
|
||||
|
||||
# Pad or trim to exactly target_duration
|
||||
full = np.concatenate(all_audio)
|
||||
target_samples = int(sr * target_duration)
|
||||
if len(full) < target_samples:
|
||||
full = np.pad(full, (0, target_samples - len(full)))
|
||||
else:
|
||||
full = full[:target_samples]
|
||||
|
||||
# Write concatenated TTS track
|
||||
with wave.open("tts_full.wav", "w") as wf:
|
||||
wf.setnchannels(1)
|
||||
wf.setsampwidth(2)
|
||||
wf.setframerate(sr)
|
||||
wf.writeframes(full.tobytes())
|
||||
|
||||
return timing
|
||||
```
|
||||
|
||||
### Audio Mixing
|
||||
|
||||
Mix TTS (center) with background music (wide stereo, low volume). The filter chain:
|
||||
1. TTS mono duplicated to both channels (centered)
|
||||
2. BGM loudness-normalized, volume reduced to 15%, stereo widened with `extrastereo`
|
||||
3. Mixed together with dropout transition for smooth endings
|
||||
|
||||
```python
|
||||
def mix_audio(tts_path, bgm_path, output_path, bgm_volume=0.15):
|
||||
"""Mix TTS centered with BGM panned wide stereo."""
|
||||
filter_complex = (
|
||||
# TTS: mono -> stereo center
|
||||
"[0:a]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=mono,"
|
||||
"pan=stereo|c0=c0|c1=c0[tts];"
|
||||
# BGM: normalize loudness, reduce volume, widen stereo
|
||||
f"[1:a]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,"
|
||||
f"loudnorm=I=-16:TP=-1.5:LRA=11,"
|
||||
f"volume={bgm_volume},"
|
||||
f"extrastereo=m=2.5[bgm];"
|
||||
# Mix with smooth dropout at end
|
||||
"[tts][bgm]amix=inputs=2:duration=longest:dropout_transition=3,"
|
||||
"aformat=sample_fmts=s16:sample_rates=44100:channel_layouts=stereo[out]"
|
||||
)
|
||||
cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-i", tts_path,
|
||||
"-i", bgm_path,
|
||||
"-filter_complex", filter_complex,
|
||||
"-map", "[out]", output_path,
|
||||
]
|
||||
subprocess.run(cmd, capture_output=True, check=True)
|
||||
```
|
||||
|
||||
### Per-Quote Visual Style
|
||||
|
||||
Cycle through visual presets per quote for variety. Each preset defines a background effect, color scheme, and text color:
|
||||
|
||||
```python
|
||||
QUOTE_STYLES = [
|
||||
{"hue": 0.08, "accent": 0.7, "bg": "spiral", "text_rgb": (255, 220, 140)}, # warm gold
|
||||
{"hue": 0.55, "accent": 0.6, "bg": "rings", "text_rgb": (180, 220, 255)}, # cool blue
|
||||
{"hue": 0.75, "accent": 0.7, "bg": "wave", "text_rgb": (220, 180, 255)}, # purple
|
||||
{"hue": 0.35, "accent": 0.6, "bg": "matrix", "text_rgb": (140, 255, 180)}, # green
|
||||
{"hue": 0.95, "accent": 0.8, "bg": "fire", "text_rgb": (255, 180, 160)}, # red/coral
|
||||
{"hue": 0.12, "accent": 0.5, "bg": "interference", "text_rgb": (255, 240, 200)}, # amber
|
||||
{"hue": 0.60, "accent": 0.7, "bg": "tunnel", "text_rgb": (160, 210, 255)}, # cyan
|
||||
{"hue": 0.45, "accent": 0.6, "bg": "aurora", "text_rgb": (180, 255, 220)}, # teal
|
||||
]
|
||||
|
||||
style = QUOTE_STYLES[quote_index % len(QUOTE_STYLES)]
|
||||
```
|
||||
|
||||
This guarantees no two adjacent quotes share the same look, even without randomness.
|
||||
|
||||
### Typewriter Text Rendering
|
||||
|
||||
Display quote text character-by-character synced to speech progress. Recently revealed characters are brighter, creating a "just typed" glow:
|
||||
|
||||
```python
|
||||
def render_typewriter(ch, co, lines, block_start, cols, progress, total_chars, text_rgb, t):
|
||||
"""Overlay typewriter text onto character/color grids.
|
||||
progress: 0.0 (nothing visible) to 1.0 (all text visible)."""
|
||||
chars_visible = int(total_chars * min(1.0, progress * 1.2)) # slight overshoot for snappy feel
|
||||
tr, tg, tb = text_rgb
|
||||
char_count = 0
|
||||
for li, line in enumerate(lines):
|
||||
row = block_start + li
|
||||
col = (cols - len(line)) // 2
|
||||
for ci, c in enumerate(line):
|
||||
if char_count < chars_visible:
|
||||
age = chars_visible - char_count
|
||||
bri_factor = min(1.0, 0.5 + 0.5 / (1 + age * 0.015)) # newer = brighter
|
||||
hue_shift = math.sin(char_count * 0.3 + t * 2) * 0.05
|
||||
stamp(ch, co, c, row, col + ci,
|
||||
(int(min(255, tr * bri_factor * (1.0 + hue_shift))),
|
||||
int(min(255, tg * bri_factor)),
|
||||
int(min(255, tb * bri_factor * (1.0 - hue_shift)))))
|
||||
char_count += 1
|
||||
|
||||
# Blinking cursor at insertion point
|
||||
if progress < 1.0 and int(t * 3) % 2 == 0:
|
||||
# Find cursor position (char_count == chars_visible)
|
||||
cc = 0
|
||||
for li, line in enumerate(lines):
|
||||
for ci, c in enumerate(line):
|
||||
if cc == chars_visible:
|
||||
stamp(ch, co, "\u258c", block_start + li,
|
||||
(cols - len(line)) // 2 + ci, (255, 220, 100))
|
||||
return
|
||||
cc += 1
|
||||
```
|
||||
|
||||
### Feature Analysis on Mixed Audio
|
||||
|
||||
Run the standard audio analysis (FFT, beat detection) on the final mixed track so visual effects react to both TTS and music:
|
||||
|
||||
```python
|
||||
# Analyze mixed_final.wav (not individual tracks)
|
||||
features = analyze_audio("mixed_final.wav", fps=24)
|
||||
```
|
||||
|
||||
Visuals pulse with both the music beats and the speech energy.
|
||||
|
||||
---
|
||||
|
||||
## Audio-Video Sync Verification
|
||||
|
||||
After rendering, verify that visual beat markers align with actual audio beats. Drift accumulates from frame timing errors, ffmpeg concat boundaries, and rounding in `fi / fps`.
|
||||
|
||||
### Beat Timestamp Extraction
|
||||
|
||||
```python
|
||||
def extract_beat_timestamps(features, fps, threshold=0.5):
|
||||
"""Extract timestamps where beat feature exceeds threshold."""
|
||||
beat = features["beat"]
|
||||
timestamps = []
|
||||
for fi in range(len(beat)):
|
||||
if beat[fi] > threshold:
|
||||
timestamps.append(fi / fps)
|
||||
return timestamps
|
||||
|
||||
def extract_visual_beat_timestamps(video_path, fps, brightness_jump=30):
|
||||
"""Detect visual beats by brightness jumps between consecutive frames.
|
||||
Returns timestamps where mean brightness increases by more than threshold."""
|
||||
import subprocess
|
||||
cmd = ["ffmpeg", "-i", video_path, "-f", "rawvideo", "-pix_fmt", "gray", "-"]
|
||||
proc = subprocess.run(cmd, capture_output=True)
|
||||
frames = np.frombuffer(proc.stdout, dtype=np.uint8)
|
||||
# Infer frame dimensions from total byte count
|
||||
n_pixels = len(frames)
|
||||
# For 1080p: 1920*1080 pixels per frame
|
||||
# Auto-detect from video metadata is more robust:
|
||||
probe = subprocess.run(
|
||||
["ffprobe", "-v", "error", "-select_streams", "v:0",
|
||||
"-show_entries", "stream=width,height",
|
||||
"-of", "csv=p=0", video_path],
|
||||
capture_output=True, text=True)
|
||||
w, h = map(int, probe.stdout.strip().split(","))
|
||||
ppf = w * h # pixels per frame
|
||||
n_frames = n_pixels // ppf
|
||||
frames = frames[:n_frames * ppf].reshape(n_frames, ppf)
|
||||
means = frames.mean(axis=1)
|
||||
|
||||
timestamps = []
|
||||
for i in range(1, len(means)):
|
||||
if means[i] - means[i-1] > brightness_jump:
|
||||
timestamps.append(i / fps)
|
||||
return timestamps
|
||||
```
|
||||
|
||||
### Sync Report
|
||||
|
||||
```python
|
||||
def sync_report(audio_beats, visual_beats, tolerance_ms=50):
|
||||
"""Compare audio beat timestamps to visual beat timestamps.
|
||||
|
||||
Args:
|
||||
audio_beats: list of timestamps (seconds) from audio analysis
|
||||
visual_beats: list of timestamps (seconds) from video brightness analysis
|
||||
tolerance_ms: max acceptable drift in milliseconds
|
||||
|
||||
Returns:
|
||||
dict with matched/unmatched/drift statistics
|
||||
"""
|
||||
tolerance = tolerance_ms / 1000.0
|
||||
matched = []
|
||||
unmatched_audio = []
|
||||
unmatched_visual = list(visual_beats)
|
||||
|
||||
for at in audio_beats:
|
||||
best_match = None
|
||||
best_delta = float("inf")
|
||||
for vt in unmatched_visual:
|
||||
delta = abs(at - vt)
|
||||
if delta < best_delta:
|
||||
best_delta = delta
|
||||
best_match = vt
|
||||
if best_match is not None and best_delta < tolerance:
|
||||
matched.append({"audio": at, "visual": best_match, "drift_ms": best_delta * 1000})
|
||||
unmatched_visual.remove(best_match)
|
||||
else:
|
||||
unmatched_audio.append(at)
|
||||
|
||||
drifts = [m["drift_ms"] for m in matched]
|
||||
return {
|
||||
"matched": len(matched),
|
||||
"unmatched_audio": len(unmatched_audio),
|
||||
"unmatched_visual": len(unmatched_visual),
|
||||
"total_audio_beats": len(audio_beats),
|
||||
"total_visual_beats": len(visual_beats),
|
||||
"mean_drift_ms": np.mean(drifts) if drifts else 0,
|
||||
"max_drift_ms": np.max(drifts) if drifts else 0,
|
||||
"p95_drift_ms": np.percentile(drifts, 95) if len(drifts) > 1 else 0,
|
||||
}
|
||||
|
||||
# Usage:
|
||||
audio_beats = extract_beat_timestamps(features, fps=24)
|
||||
visual_beats = extract_visual_beat_timestamps("output.mp4", fps=24)
|
||||
report = sync_report(audio_beats, visual_beats)
|
||||
print(f"Matched: {report['matched']}/{report['total_audio_beats']} beats")
|
||||
print(f"Mean drift: {report['mean_drift_ms']:.1f}ms, Max: {report['max_drift_ms']:.1f}ms")
|
||||
# Target: mean drift < 20ms, max drift < 42ms (1 frame at 24fps)
|
||||
```
|
||||
|
||||
### Common Sync Issues
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Consistent late visual beats | ffmpeg concat adds frames at boundaries | Use `-vsync cfr` flag; pad segments to exact frame count |
|
||||
| Drift increases over time | Floating-point accumulation in `t = fi / fps` | Use integer frame counter, compute `t` fresh each frame |
|
||||
| Random missed beats | Beat threshold too high / feature smoothing too aggressive | Lower threshold; reduce EMA alpha for beat feature |
|
||||
| Beats land on wrong frame | Off-by-one in frame indexing | Verify: frame 0 = t=0, frame 1 = t=1/fps (not t=0) |
|
||||
688
skills/creative/ascii-video/references/optimization.md
Normal file
688
skills/creative/ascii-video/references/optimization.md
Normal file
@@ -0,0 +1,688 @@
|
||||
# Optimization Reference
|
||||
|
||||
> **See also:** architecture.md · composition.md · scenes.md · shaders.md · inputs.md · troubleshooting.md
|
||||
|
||||
## Hardware Detection
|
||||
|
||||
Detect the user's hardware at script startup and adapt rendering parameters automatically. Never hardcode worker counts or resolution.
|
||||
|
||||
### CPU and Memory Detection
|
||||
|
||||
```python
|
||||
import multiprocessing
|
||||
import platform
|
||||
import shutil
|
||||
import os
|
||||
|
||||
def detect_hardware():
|
||||
"""Detect hardware capabilities and return render config."""
|
||||
cpu_count = multiprocessing.cpu_count()
|
||||
|
||||
# Leave 1-2 cores free for OS + ffmpeg encoding
|
||||
if cpu_count >= 16:
|
||||
workers = cpu_count - 2
|
||||
elif cpu_count >= 8:
|
||||
workers = cpu_count - 1
|
||||
elif cpu_count >= 4:
|
||||
workers = cpu_count - 1
|
||||
else:
|
||||
workers = max(1, cpu_count)
|
||||
|
||||
# Memory detection (platform-specific)
|
||||
try:
|
||||
if platform.system() == "Darwin":
|
||||
import subprocess
|
||||
mem_bytes = int(subprocess.check_output(["sysctl", "-n", "hw.memsize"]).strip())
|
||||
elif platform.system() == "Linux":
|
||||
with open("/proc/meminfo") as f:
|
||||
for line in f:
|
||||
if line.startswith("MemTotal"):
|
||||
mem_bytes = int(line.split()[1]) * 1024
|
||||
break
|
||||
else:
|
||||
mem_bytes = 8 * 1024**3 # assume 8GB on unknown
|
||||
except Exception:
|
||||
mem_bytes = 8 * 1024**3
|
||||
|
||||
mem_gb = mem_bytes / (1024**3)
|
||||
|
||||
# Each worker uses ~50-150MB depending on grid sizes
|
||||
# Cap workers if memory is tight
|
||||
mem_per_worker_mb = 150
|
||||
max_workers_by_mem = int(mem_gb * 1024 * 0.6 / mem_per_worker_mb) # use 60% of RAM
|
||||
workers = min(workers, max_workers_by_mem)
|
||||
|
||||
# ffmpeg availability and codec support
|
||||
has_ffmpeg = shutil.which("ffmpeg") is not None
|
||||
|
||||
return {
|
||||
"cpu_count": cpu_count,
|
||||
"workers": workers,
|
||||
"mem_gb": mem_gb,
|
||||
"platform": platform.system(),
|
||||
"arch": platform.machine(),
|
||||
"has_ffmpeg": has_ffmpeg,
|
||||
}
|
||||
```
|
||||
|
||||
### Adaptive Quality Profiles
|
||||
|
||||
Scale resolution, FPS, CRF, and grid density based on hardware:
|
||||
|
||||
```python
|
||||
def quality_profile(hw, target_duration_s, user_preference="auto"):
|
||||
"""
|
||||
Returns render settings adapted to hardware.
|
||||
user_preference: "auto", "draft", "preview", "production", "max"
|
||||
"""
|
||||
if user_preference == "draft":
|
||||
return {"vw": 960, "vh": 540, "fps": 12, "crf": 28, "workers": min(4, hw["workers"]),
|
||||
"grid_scale": 0.5, "shaders": "minimal", "particles_max": 200}
|
||||
|
||||
if user_preference == "preview":
|
||||
return {"vw": 1280, "vh": 720, "fps": 15, "crf": 25, "workers": hw["workers"],
|
||||
"grid_scale": 0.75, "shaders": "standard", "particles_max": 500}
|
||||
|
||||
if user_preference == "max":
|
||||
return {"vw": 3840, "vh": 2160, "fps": 30, "crf": 15, "workers": hw["workers"],
|
||||
"grid_scale": 2.0, "shaders": "full", "particles_max": 3000}
|
||||
|
||||
# "production" or "auto"
|
||||
# Auto-detect: estimate render time, downgrade if it would take too long
|
||||
n_frames = int(target_duration_s * 24)
|
||||
est_seconds_per_frame = 0.18 # ~180ms at 1080p
|
||||
est_total_s = n_frames * est_seconds_per_frame / max(1, hw["workers"])
|
||||
|
||||
if hw["mem_gb"] < 4 or hw["cpu_count"] <= 2:
|
||||
# Low-end: 720p, 15fps
|
||||
return {"vw": 1280, "vh": 720, "fps": 15, "crf": 23, "workers": hw["workers"],
|
||||
"grid_scale": 0.75, "shaders": "standard", "particles_max": 500}
|
||||
|
||||
if est_total_s > 3600: # would take over an hour
|
||||
# Downgrade to 720p to speed up
|
||||
return {"vw": 1280, "vh": 720, "fps": 24, "crf": 20, "workers": hw["workers"],
|
||||
"grid_scale": 0.75, "shaders": "standard", "particles_max": 800}
|
||||
|
||||
# Standard production: 1080p 24fps
|
||||
return {"vw": 1920, "vh": 1080, "fps": 24, "crf": 20, "workers": hw["workers"],
|
||||
"grid_scale": 1.0, "shaders": "full", "particles_max": 1200}
|
||||
|
||||
|
||||
def apply_quality_profile(profile):
|
||||
"""Set globals from quality profile."""
|
||||
global VW, VH, FPS, N_WORKERS
|
||||
VW = profile["vw"]
|
||||
VH = profile["vh"]
|
||||
FPS = profile["fps"]
|
||||
N_WORKERS = profile["workers"]
|
||||
# Grid sizes scale with resolution
|
||||
# CRF passed to ffmpeg encoder
|
||||
# Shader set determines which post-processing is active
|
||||
```
|
||||
|
||||
### CLI Integration
|
||||
|
||||
```python
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--quality", choices=["draft", "preview", "production", "max", "auto"],
|
||||
default="auto", help="Render quality preset")
|
||||
parser.add_argument("--aspect", choices=["landscape", "portrait", "square"],
|
||||
default="landscape", help="Aspect ratio preset")
|
||||
parser.add_argument("--workers", type=int, default=0, help="Override worker count (0=auto)")
|
||||
parser.add_argument("--resolution", type=str, default="", help="Override resolution e.g. 1280x720")
|
||||
args = parser.parse_args()
|
||||
|
||||
hw = detect_hardware()
|
||||
if args.workers > 0:
|
||||
hw["workers"] = args.workers
|
||||
profile = quality_profile(hw, target_duration, args.quality)
|
||||
|
||||
# Apply aspect ratio preset (before manual resolution override)
|
||||
ASPECT_PRESETS = {
|
||||
"landscape": (1920, 1080),
|
||||
"portrait": (1080, 1920),
|
||||
"square": (1080, 1080),
|
||||
}
|
||||
if args.aspect != "landscape" and not args.resolution:
|
||||
profile["vw"], profile["vh"] = ASPECT_PRESETS[args.aspect]
|
||||
|
||||
if args.resolution:
|
||||
w, h = args.resolution.split("x")
|
||||
profile["vw"], profile["vh"] = int(w), int(h)
|
||||
apply_quality_profile(profile)
|
||||
|
||||
log(f"Hardware: {hw['cpu_count']} cores, {hw['mem_gb']:.1f}GB RAM, {hw['platform']}")
|
||||
log(f"Render: {profile['vw']}x{profile['vh']} @{profile['fps']}fps, "
|
||||
f"CRF {profile['crf']}, {profile['workers']} workers")
|
||||
```
|
||||
|
||||
### Portrait Mode Considerations
|
||||
|
||||
Portrait (1080x1920) has the same pixel count as landscape 1080p, so performance is equivalent. But composition patterns differ:
|
||||
|
||||
| Concern | Landscape | Portrait |
|
||||
|---------|-----------|----------|
|
||||
| Grid cols at `lg` | 160 | 90 |
|
||||
| Grid rows at `lg` | 45 | 80 |
|
||||
| Max text line chars | ~50 centered | ~25-30 centered |
|
||||
| Vertical rain | Short travel | Long, dramatic travel |
|
||||
| Horizontal spectrum | Full width | Needs rotation or compression |
|
||||
| Radial effects | Natural circles | Tall ellipses (aspect correction handles this) |
|
||||
| Particle explosions | Wide spread | Tall spread |
|
||||
| Text stacking | 3-4 lines comfortable | 8-10 lines comfortable |
|
||||
| Quote layout | 2-3 wide lines | 5-6 short lines |
|
||||
|
||||
**Portrait-optimized patterns:**
|
||||
- Vertical rain/matrix effects are naturally enhanced — longer column travel
|
||||
- Fire columns rise through more screen space
|
||||
- Rising embers/particles have more vertical runway
|
||||
- Text can be stacked more aggressively with more lines
|
||||
- Radial effects work if aspect correction is applied (GridLayer handles this automatically)
|
||||
- Spectrum bars can be rotated 90 degrees (vertical bars from bottom)
|
||||
|
||||
**Portrait text layout:**
|
||||
```python
|
||||
def layout_text_portrait(text, max_chars_per_line=25, grid=None):
|
||||
"""Break text into short lines for portrait display."""
|
||||
words = text.split()
|
||||
lines = []; current = ""
|
||||
for w in words:
|
||||
if len(current) + len(w) + 1 > max_chars_per_line:
|
||||
lines.append(current.strip())
|
||||
current = w + " "
|
||||
else:
|
||||
current += w + " "
|
||||
if current.strip():
|
||||
lines.append(current.strip())
|
||||
return lines
|
||||
```
|
||||
|
||||
## Performance Budget
|
||||
|
||||
Target: 100-200ms per frame (5-10 fps single-threaded, 40-80 fps across 8 workers).
|
||||
|
||||
| Component | Time | Notes |
|
||||
|-----------|------|-------|
|
||||
| Feature extraction | 1-5ms | Pre-computed for all frames before render |
|
||||
| Effect function | 2-15ms | Vectorized numpy, avoid Python loops |
|
||||
| Character render | 80-150ms | **Bottleneck** -- per-cell Python loop |
|
||||
| Shader pipeline | 5-25ms | Depends on active shaders |
|
||||
| ffmpeg encode | ~5ms | Amortized by pipe buffering |
|
||||
|
||||
## Bitmap Pre-Rasterization
|
||||
|
||||
Rasterize every character at init, not per-frame:
|
||||
|
||||
```python
|
||||
# At init time -- done once
|
||||
for c in all_characters:
|
||||
img = Image.new("L", (cell_w, cell_h), 0)
|
||||
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
|
||||
bitmaps[c] = np.array(img, dtype=np.float32) / 255.0 # float32 for fast multiply
|
||||
|
||||
# At render time -- fast lookup
|
||||
bitmap = bitmaps[char]
|
||||
canvas[y:y+ch, x:x+cw] = np.maximum(canvas[y:y+ch, x:x+cw],
|
||||
(bitmap[:,:,None] * color).astype(np.uint8))
|
||||
```
|
||||
|
||||
Collect all characters from all palettes + overlay text into the init set. Lazy-init for any missed characters.
|
||||
|
||||
## Pre-Rendered Background Textures
|
||||
|
||||
Alternative to `_render_vf()` for backgrounds where characters don't need to change every frame. Pre-bake a static ASCII texture once at init, then multiply by a per-cell color field each frame. One matrix multiply vs thousands of bitmap blits.
|
||||
|
||||
Use when: background layer uses a fixed character palette and only color/brightness varies per frame. NOT suitable for layers where character selection depends on a changing value field.
|
||||
|
||||
### Init: Bake the Texture
|
||||
|
||||
```python
|
||||
# In GridLayer.__init__:
|
||||
self._bg_row_idx = np.clip(
|
||||
(np.arange(VH) - self.oy) // self.ch, 0, self.rows - 1
|
||||
)
|
||||
self._bg_col_idx = np.clip(
|
||||
(np.arange(VW) - self.ox) // self.cw, 0, self.cols - 1
|
||||
)
|
||||
self._bg_textures = {}
|
||||
|
||||
def make_bg_texture(self, palette):
|
||||
"""Pre-render a static ASCII texture (grayscale float32) once."""
|
||||
if palette not in self._bg_textures:
|
||||
texture = np.zeros((VH, VW), dtype=np.float32)
|
||||
rng = random.Random(12345)
|
||||
ch_list = [c for c in palette if c != " " and c in self.bm]
|
||||
if not ch_list:
|
||||
ch_list = list(self.bm.keys())[:5]
|
||||
for row in range(self.rows):
|
||||
y = self.oy + row * self.ch
|
||||
if y + self.ch > VH:
|
||||
break
|
||||
for col in range(self.cols):
|
||||
x = self.ox + col * self.cw
|
||||
if x + self.cw > VW:
|
||||
break
|
||||
bm = self.bm[rng.choice(ch_list)]
|
||||
texture[y:y+self.ch, x:x+self.cw] = bm
|
||||
self._bg_textures[palette] = texture
|
||||
return self._bg_textures[palette]
|
||||
```
|
||||
|
||||
### Render: Color Field x Cached Texture
|
||||
|
||||
```python
|
||||
def render_bg(self, color_field, palette=PAL_CIRCUIT):
|
||||
"""Fast background: pre-rendered ASCII texture * per-cell color field.
|
||||
color_field: (rows, cols, 3) uint8. Returns (VH, VW, 3) uint8."""
|
||||
texture = self.make_bg_texture(palette)
|
||||
# Expand cell colors to pixel coords via pre-computed index maps
|
||||
color_px = color_field[
|
||||
self._bg_row_idx[:, None], self._bg_col_idx[None, :]
|
||||
].astype(np.float32)
|
||||
return (texture[:, :, None] * color_px).astype(np.uint8)
|
||||
```
|
||||
|
||||
### Usage in a Scene
|
||||
|
||||
```python
|
||||
# Build per-cell color from effect fields (cheap — rows*cols, not VH*VW)
|
||||
hue = ((t * 0.05 + val * 0.2) % 1.0).astype(np.float32)
|
||||
R, G, B = hsv2rgb(hue, np.full_like(val, 0.5), val)
|
||||
color_field = mkc(R, G, B, g.rows, g.cols) # (rows, cols, 3) uint8
|
||||
|
||||
# Render background — single matrix multiply, no per-cell loop
|
||||
canvas_bg = g.render_bg(color_field, PAL_DENSE)
|
||||
```
|
||||
|
||||
The texture init loop runs once and is cached per palette. Per-frame cost is one fancy-index lookup + one broadcast multiply — orders of magnitude faster than the per-cell bitmap blit loop in `render()` for dense backgrounds.
|
||||
|
||||
## Coordinate Array Caching
|
||||
|
||||
Pre-compute all grid-relative coordinate arrays at init, not per-frame:
|
||||
|
||||
```python
|
||||
# These are O(rows*cols) and used in every effect
|
||||
self.rr = np.arange(rows)[:, None] # row indices
|
||||
self.cc = np.arange(cols)[None, :] # col indices
|
||||
self.dist = np.sqrt(dx**2 + dy**2) # distance from center
|
||||
self.angle = np.arctan2(dy, dx) # angle from center
|
||||
self.dist_n = ... # normalized distance
|
||||
```
|
||||
|
||||
## Vectorized Effect Patterns
|
||||
|
||||
### Avoid Per-Cell Python Loops in Effects
|
||||
|
||||
The render loop (compositing bitmaps) is unavoidably per-cell. But effect functions must be fully vectorized numpy -- never iterate over rows/cols in Python.
|
||||
|
||||
Bad (O(rows*cols) Python loop):
|
||||
```python
|
||||
for r in range(rows):
|
||||
for c in range(cols):
|
||||
val[r, c] = math.sin(c * 0.1 + t) * math.cos(r * 0.1 - t)
|
||||
```
|
||||
|
||||
Good (vectorized):
|
||||
```python
|
||||
val = np.sin(g.cc * 0.1 + t) * np.cos(g.rr * 0.1 - t)
|
||||
```
|
||||
|
||||
### Vectorized Matrix Rain
|
||||
|
||||
The naive per-column per-trail-pixel loop is the second biggest bottleneck after the render loop. Use numpy fancy indexing:
|
||||
|
||||
```python
|
||||
# Instead of nested Python loops over columns and trail pixels:
|
||||
# Build row index arrays for all active trail pixels at once
|
||||
all_rows = []
|
||||
all_cols = []
|
||||
all_fades = []
|
||||
for c in range(cols):
|
||||
head = int(S["ry"][c])
|
||||
trail_len = S["rln"][c]
|
||||
for i in range(trail_len):
|
||||
row = head - i
|
||||
if 0 <= row < rows:
|
||||
all_rows.append(row)
|
||||
all_cols.append(c)
|
||||
all_fades.append(1.0 - i / trail_len)
|
||||
|
||||
# Vectorized assignment
|
||||
ar = np.array(all_rows)
|
||||
ac = np.array(all_cols)
|
||||
af = np.array(all_fades, dtype=np.float32)
|
||||
# Assign chars and colors in bulk using fancy indexing
|
||||
ch[ar, ac] = ... # vectorized char assignment
|
||||
co[ar, ac, 1] = (af * bri * 255).astype(np.uint8) # green channel
|
||||
```
|
||||
|
||||
### Vectorized Fire Columns
|
||||
|
||||
Same pattern -- accumulate index arrays, assign in bulk:
|
||||
|
||||
```python
|
||||
fire_val = np.zeros((rows, cols), dtype=np.float32)
|
||||
for fi in range(n_cols):
|
||||
fx_c = int((fi * cols / n_cols + np.sin(t * 2 + fi * 0.7) * 3) % cols)
|
||||
height = int(energy * rows * 0.7)
|
||||
dy = np.arange(min(height, rows))
|
||||
fr = rows - 1 - dy
|
||||
frac = dy / max(height, 1)
|
||||
# Width spread: base columns wider at bottom
|
||||
for dx in range(-1, 2): # 3-wide columns
|
||||
c = fx_c + dx
|
||||
if 0 <= c < cols:
|
||||
fire_val[fr, c] = np.maximum(fire_val[fr, c],
|
||||
(1 - frac * 0.6) * (0.5 + rms * 0.5))
|
||||
# Now map fire_val to chars and colors in one vectorized pass
|
||||
```
|
||||
|
||||
## PIL String Rendering for Text-Heavy Scenes
|
||||
|
||||
Alternative to per-cell bitmap blitting when rendering many long text strings (scrolling tickers, typewriter sequences, idea floods). Uses PIL's native `ImageDraw.text()` which renders an entire string in one C call, vs one Python-loop bitmap blit per character.
|
||||
|
||||
Typical win: a scene with 56 ticker rows renders 56 PIL `text()` calls instead of ~10K individual bitmap blits.
|
||||
|
||||
Use when: scene renders many rows of readable text strings. NOT suitable for sparse or spatially-scattered single characters (use normal `render()` for those).
|
||||
|
||||
```python
|
||||
from PIL import Image, ImageDraw
|
||||
|
||||
def render_text_layer(grid, rows_data, font):
|
||||
"""Render dense text rows via PIL instead of per-cell bitmap blitting.
|
||||
|
||||
Args:
|
||||
grid: GridLayer instance (for oy, ch, ox, font metrics)
|
||||
rows_data: list of (row_index, text_string, rgb_tuple) — one per row
|
||||
font: PIL ImageFont instance (grid.font)
|
||||
|
||||
Returns:
|
||||
uint8 array (VH, VW, 3) — canvas with rendered text
|
||||
"""
|
||||
img = Image.new("RGB", (VW, VH), (0, 0, 0))
|
||||
draw = ImageDraw.Draw(img)
|
||||
for row_idx, text, color in rows_data:
|
||||
y = grid.oy + row_idx * grid.ch
|
||||
if y + grid.ch > VH:
|
||||
break
|
||||
draw.text((grid.ox, y), text, fill=color, font=font)
|
||||
return np.array(img)
|
||||
```
|
||||
|
||||
### Usage in a Ticker Scene
|
||||
|
||||
```python
|
||||
# Build ticker data (text + color per row)
|
||||
rows_data = []
|
||||
for row in range(n_tickers):
|
||||
text = build_ticker_text(row, t) # scrolling substring
|
||||
color = hsv2rgb_scalar(hue, 0.85, bri) # (R, G, B) tuple
|
||||
rows_data.append((row, text, color))
|
||||
|
||||
# One PIL pass instead of thousands of bitmap blits
|
||||
canvas_tickers = render_text_layer(g_md, rows_data, g_md.font)
|
||||
|
||||
# Blend with other layers normally
|
||||
result = blend_canvas(canvas_bg, canvas_tickers, "screen", 0.9)
|
||||
```
|
||||
|
||||
This is purely a rendering optimization — same visual output, fewer draw calls. The grid's `render()` method is still needed for sparse character fields where characters are placed individually based on value fields.
|
||||
|
||||
## Bloom Optimization
|
||||
|
||||
**Do NOT use `scipy.ndimage.uniform_filter`** -- measured at 424ms/frame.
|
||||
|
||||
Use 4x downsample + manual box blur instead -- 84ms/frame (5x faster):
|
||||
|
||||
```python
|
||||
sm = canvas[::4, ::4].astype(np.float32) # 4x downsample
|
||||
br = np.where(sm > threshold, sm, 0)
|
||||
for _ in range(3): # 3-pass manual box blur
|
||||
p = np.pad(br, ((1,1),(1,1),(0,0)), mode='edge')
|
||||
br = (p[:-2,:-2] + p[:-2,1:-1] + p[:-2,2:] +
|
||||
p[1:-1,:-2] + p[1:-1,1:-1] + p[1:-1,2:] +
|
||||
p[2:,:-2] + p[2:,1:-1] + p[2:,2:]) / 9.0
|
||||
bl = np.repeat(np.repeat(br, 4, axis=0), 4, axis=1)[:H, :W]
|
||||
```
|
||||
|
||||
## Vignette Caching
|
||||
|
||||
Distance field is resolution- and strength-dependent, never changes per frame:
|
||||
|
||||
```python
|
||||
_vig_cache = {}
|
||||
def sh_vignette(canvas, strength):
|
||||
key = (canvas.shape[0], canvas.shape[1], round(strength, 2))
|
||||
if key not in _vig_cache:
|
||||
Y = np.linspace(-1, 1, H)[:, None]
|
||||
X = np.linspace(-1, 1, W)[None, :]
|
||||
_vig_cache[key] = np.clip(1.0 - np.sqrt(X**2+Y**2) * strength, 0.15, 1).astype(np.float32)
|
||||
return np.clip(canvas * _vig_cache[key][:,:,None], 0, 255).astype(np.uint8)
|
||||
```
|
||||
|
||||
Same pattern for CRT barrel distortion (cache remap coordinates).
|
||||
|
||||
## Film Grain Optimization
|
||||
|
||||
Generate noise at half resolution, tile up:
|
||||
|
||||
```python
|
||||
noise = np.random.randint(-amt, amt+1, (H//2, W//2, 1), dtype=np.int16)
|
||||
noise = np.repeat(np.repeat(noise, 2, axis=0), 2, axis=1)[:H, :W]
|
||||
```
|
||||
|
||||
2x blocky grain looks like film grain and costs 1/4 the random generation.
|
||||
|
||||
## Parallel Rendering
|
||||
|
||||
### Worker Architecture
|
||||
|
||||
```python
|
||||
hw = detect_hardware()
|
||||
N_WORKERS = hw["workers"]
|
||||
|
||||
# Batch splitting (for non-clip architectures)
|
||||
batch_size = (n_frames + N_WORKERS - 1) // N_WORKERS
|
||||
batches = [(i, i*batch_size, min((i+1)*batch_size, n_frames), features, seg_path) ...]
|
||||
|
||||
with multiprocessing.Pool(N_WORKERS) as pool:
|
||||
segments = pool.starmap(render_batch, batches)
|
||||
```
|
||||
|
||||
### Per-Clip Parallelism (Preferred for Segmented Videos)
|
||||
|
||||
```python
|
||||
from concurrent.futures import ProcessPoolExecutor, as_completed
|
||||
|
||||
with ProcessPoolExecutor(max_workers=N_WORKERS) as pool:
|
||||
futures = {pool.submit(render_clip, seg, features, path): seg["id"]
|
||||
for seg, path in clip_args}
|
||||
for fut in as_completed(futures):
|
||||
clip_id = futures[fut]
|
||||
try:
|
||||
fut.result()
|
||||
log(f" {clip_id} done")
|
||||
except Exception as e:
|
||||
log(f" {clip_id} FAILED: {e}")
|
||||
```
|
||||
|
||||
### Worker Isolation
|
||||
|
||||
Each worker:
|
||||
- Creates its own `Renderer` instance (with full grid + bitmap init)
|
||||
- Opens its own ffmpeg subprocess
|
||||
- Has independent random seed (`random.seed(batch_id * 10000)`)
|
||||
- Writes to its own segment file and stderr log
|
||||
|
||||
### ffmpeg Pipe Safety
|
||||
|
||||
**CRITICAL**: Never `stderr=subprocess.PIPE` with long-running ffmpeg. The stderr buffer fills at ~64KB and deadlocks:
|
||||
|
||||
```python
|
||||
# WRONG -- will deadlock
|
||||
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
|
||||
# RIGHT -- stderr to file
|
||||
stderr_fh = open(err_path, "w")
|
||||
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=stderr_fh)
|
||||
# ... write all frames ...
|
||||
pipe.stdin.close()
|
||||
pipe.wait()
|
||||
stderr_fh.close()
|
||||
```
|
||||
|
||||
### Concatenation
|
||||
|
||||
```python
|
||||
with open(concat_file, "w") as cf:
|
||||
for seg in segments:
|
||||
cf.write(f"file '{seg}'\n")
|
||||
|
||||
cmd = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", concat_file]
|
||||
if audio_path:
|
||||
cmd += ["-i", audio_path, "-c:v", "copy", "-c:a", "aac", "-b:a", "192k", "-shortest"]
|
||||
else:
|
||||
cmd += ["-c:v", "copy"]
|
||||
cmd.append(output_path)
|
||||
subprocess.run(cmd, capture_output=True, check=True)
|
||||
```
|
||||
|
||||
## Particle System Performance
|
||||
|
||||
Cap particle counts based on quality profile:
|
||||
|
||||
| System | Low | Standard | High |
|
||||
|--------|-----|----------|------|
|
||||
| Explosion | 300 | 1000 | 2500 |
|
||||
| Embers | 500 | 1500 | 3000 |
|
||||
| Starfield | 300 | 800 | 1500 |
|
||||
| Dissolve | 200 | 600 | 1200 |
|
||||
|
||||
Cull by truncating lists:
|
||||
```python
|
||||
MAX_PARTICLES = profile.get("particles_max", 1200)
|
||||
if len(S["px"]) > MAX_PARTICLES:
|
||||
for k in ("px", "py", "vx", "vy", "life", "char"):
|
||||
S[k] = S[k][-MAX_PARTICLES:] # keep newest
|
||||
```
|
||||
|
||||
## Memory Management
|
||||
|
||||
- Feature arrays: pre-computed for all frames, shared across workers via fork semantics (COW)
|
||||
- Canvas: allocated once per worker, reused (`np.zeros(...)`)
|
||||
- Character arrays: allocated per frame (cheap -- rows*cols U1 strings)
|
||||
- Bitmap cache: ~500KB per grid size, initialized once per worker
|
||||
|
||||
Total memory per worker: ~50-150MB. Total: ~400-800MB for 8 workers.
|
||||
|
||||
For low-memory systems (< 4GB), reduce worker count and use smaller grids.
|
||||
|
||||
## Brightness Verification
|
||||
|
||||
After render, spot-check brightness at sample timestamps:
|
||||
|
||||
```python
|
||||
for t in [2, 30, 60, 120, 180]:
|
||||
cmd = ["ffmpeg", "-ss", str(t), "-i", output_path,
|
||||
"-frames:v", "1", "-f", "rawvideo", "-pix_fmt", "rgb24", "-"]
|
||||
r = subprocess.run(cmd, capture_output=True)
|
||||
arr = np.frombuffer(r.stdout, dtype=np.uint8)
|
||||
print(f"t={t}s mean={arr.mean():.1f} max={arr.max()}")
|
||||
```
|
||||
|
||||
Target: mean > 5 for quiet sections, mean > 15 for active sections. If consistently below, increase brightness floor in effects and/or global boost multiplier.
|
||||
|
||||
## Render Time Estimates
|
||||
|
||||
Scale with hardware. Baseline: 1080p, 24fps, ~180ms/frame/worker.
|
||||
|
||||
| Duration | Frames | 4 workers | 8 workers | 16 workers |
|
||||
|----------|--------|-----------|-----------|------------|
|
||||
| 30s | 720 | ~3 min | ~2 min | ~1 min |
|
||||
| 2 min | 2,880 | ~13 min | ~7 min | ~4 min |
|
||||
| 3.5 min | 5,040 | ~23 min | ~12 min | ~6 min |
|
||||
| 5 min | 7,200 | ~33 min | ~17 min | ~9 min |
|
||||
| 10 min | 14,400 | ~65 min | ~33 min | ~17 min |
|
||||
|
||||
At 720p: multiply times by ~0.5. At 4K: multiply by ~4.
|
||||
|
||||
Heavier effects (many particles, dense grids, extra shader passes) add ~20-50%.
|
||||
|
||||
---
|
||||
|
||||
## Temp File Cleanup
|
||||
|
||||
Rendering generates intermediate files that accumulate across runs. Clean up after the final concat/mux step.
|
||||
|
||||
### Files to Clean
|
||||
|
||||
| File type | Source | Location |
|
||||
|-----------|--------|----------|
|
||||
| WAV extracts | `ffmpeg -i input.mp3 ... tmp.wav` | `tempfile.mktemp()` or project dir |
|
||||
| Segment clips | `render_clip()` output | `segments/seg_00.mp4` etc. |
|
||||
| Concat list | ffmpeg concat demuxer input | `segments/concat.txt` |
|
||||
| ffmpeg stderr logs | piped to file for debugging | `*.log` in project dir |
|
||||
| Feature cache | pickled numpy arrays | `*.pkl` or `*.npz` |
|
||||
|
||||
### Cleanup Function
|
||||
|
||||
```python
|
||||
import glob
|
||||
import tempfile
|
||||
import shutil
|
||||
|
||||
def cleanup_render_artifacts(segments_dir="segments", keep_final=True):
|
||||
"""Remove intermediate files after successful render.
|
||||
|
||||
Call this AFTER verifying the final output exists and plays correctly.
|
||||
|
||||
Args:
|
||||
segments_dir: directory containing segment clips and concat list
|
||||
keep_final: if True, only delete intermediates (not the final output)
|
||||
"""
|
||||
removed = []
|
||||
|
||||
# 1. Segment clips
|
||||
if os.path.isdir(segments_dir):
|
||||
shutil.rmtree(segments_dir)
|
||||
removed.append(f"directory: {segments_dir}")
|
||||
|
||||
# 2. Temporary WAV files
|
||||
for wav in glob.glob("*.wav"):
|
||||
if wav.startswith("tmp") or wav.startswith("extracted_"):
|
||||
os.remove(wav)
|
||||
removed.append(wav)
|
||||
|
||||
# 3. ffmpeg stderr logs
|
||||
for log in glob.glob("ffmpeg_*.log"):
|
||||
os.remove(log)
|
||||
removed.append(log)
|
||||
|
||||
# 4. Feature cache (optional — useful to keep for re-renders)
|
||||
# for cache in glob.glob("features_*.npz"):
|
||||
# os.remove(cache)
|
||||
# removed.append(cache)
|
||||
|
||||
print(f"Cleaned {len(removed)} artifacts: {removed}")
|
||||
return removed
|
||||
```
|
||||
|
||||
### Integration with Render Pipeline
|
||||
|
||||
Call cleanup at the end of the main render script, after the final output is verified:
|
||||
|
||||
```python
|
||||
# At end of main()
|
||||
if os.path.exists(output_path) and os.path.getsize(output_path) > 1000:
|
||||
cleanup_render_artifacts(segments_dir="segments")
|
||||
print(f"Done. Output: {output_path}")
|
||||
else:
|
||||
print("WARNING: final output missing or empty — skipping cleanup")
|
||||
```
|
||||
|
||||
### Temp File Best Practices
|
||||
|
||||
- Use `tempfile.mkdtemp()` for segment directories — avoids polluting the project dir
|
||||
- Name WAV extracts with `tempfile.mktemp(suffix=".wav")` so they're in the OS temp dir
|
||||
- For debugging, set `KEEP_INTERMEDIATES=1` env var to skip cleanup
|
||||
- Feature caches (`.npz`) are cheap to store and expensive to recompute — default to keeping them
|
||||
1011
skills/creative/ascii-video/references/scenes.md
Normal file
1011
skills/creative/ascii-video/references/scenes.md
Normal file
File diff suppressed because it is too large
Load Diff
1385
skills/creative/ascii-video/references/shaders.md
Normal file
1385
skills/creative/ascii-video/references/shaders.md
Normal file
File diff suppressed because it is too large
Load Diff
367
skills/creative/ascii-video/references/troubleshooting.md
Normal file
367
skills/creative/ascii-video/references/troubleshooting.md
Normal file
@@ -0,0 +1,367 @@
|
||||
# Troubleshooting Reference
|
||||
|
||||
> **See also:** composition.md · architecture.md · shaders.md · scenes.md · optimization.md
|
||||
|
||||
## Quick Diagnostic
|
||||
|
||||
| Symptom | Likely Cause | Fix |
|
||||
|---------|-------------|-----|
|
||||
| All black output | tonemap gamma too high or no effects rendering | Lower gamma to 0.5, check scene_fn returns non-zero canvas |
|
||||
| Washed out / too bright | Linear brightness multiplier instead of tonemap | Replace `canvas * N` with `tonemap(canvas, gamma=0.75)` |
|
||||
| ffmpeg hangs mid-render | stderr=subprocess.PIPE deadlock | Redirect stderr to file |
|
||||
| "read-only" array error | broadcast_to view without .copy() | Add `.copy()` after broadcast_to |
|
||||
| PicklingError | Lambda or closure in SCENES table | Define all fx_* at module level |
|
||||
| Random dark holes in output | Font missing Unicode glyphs | Validate palettes at init |
|
||||
| Audio-visual desync | Frame timing accumulation | Use integer frame counter, compute t fresh each frame |
|
||||
| Single-color flat output | Hue field shape mismatch | Ensure h,s,v arrays all (rows,cols) before hsv2rgb |
|
||||
| Text unreadable over busy bg | No contrast between text and background | Use `apply_text_backdrop()` (composition.md) + `reverse_vignette` shader (shaders.md) |
|
||||
| Text garbled/mirrored | Kaleidoscope or mirror shader applied to text scene | **Never apply kaleidoscope, mirror_h/v/quad/diag to scenes with readable text** — radial folding destroys legibility. Apply these only to background layers or text-free scenes |
|
||||
|
||||
Common bugs, gotchas, and platform-specific issues encountered during ASCII video development.
|
||||
|
||||
## NumPy Broadcasting
|
||||
|
||||
### The `broadcast_to().copy()` Trap
|
||||
|
||||
Hue field generators often return arrays that are broadcast views — they have shape `(1, cols)` or `(rows, 1)` that numpy broadcasts to `(rows, cols)`. These views are **read-only**. If any downstream code tries to modify them in-place (e.g., `h %= 1.0`), numpy raises:
|
||||
|
||||
```
|
||||
ValueError: output array is read-only
|
||||
```
|
||||
|
||||
**Fix**: Always `.copy()` after `broadcast_to()`:
|
||||
|
||||
```python
|
||||
h = np.broadcast_to(h, (g.rows, g.cols)).copy()
|
||||
```
|
||||
|
||||
This is especially important in `_render_vf()` where hue arrays flow through `hsv2rgb()`.
|
||||
|
||||
### The `+=` vs `+` Trap
|
||||
|
||||
Broadcasting also fails with in-place operators when operand shapes don't match exactly:
|
||||
|
||||
```python
|
||||
# FAILS if result is (rows,1) and operand is (rows, cols)
|
||||
val += np.sin(g.cc * 0.02 + t * 0.3) * 0.5
|
||||
|
||||
# WORKS — creates a new array
|
||||
val = val + np.sin(g.cc * 0.02 + t * 0.3) * 0.5
|
||||
```
|
||||
|
||||
The `vf_plasma()` function had this bug. Use `+` instead of `+=` when mixing different-shaped arrays.
|
||||
|
||||
### Shape Mismatch in `hsv2rgb()`
|
||||
|
||||
`hsv2rgb(h, s, v)` requires all three arrays to have identical shapes. If `h` is `(1, cols)` and `s` is `(rows, cols)`, the function crashes or produces wrong output.
|
||||
|
||||
**Fix**: Ensure all inputs are broadcast and copied to `(rows, cols)` before calling.
|
||||
|
||||
---
|
||||
|
||||
## Blend Mode Pitfalls
|
||||
|
||||
### Overlay Crushes Dark Inputs
|
||||
|
||||
`overlay(a, b) = 2*a*b` when `a < 0.5`. Two values of 0.12 produce `2 * 0.12 * 0.12 = 0.03`. The result is darker than either input.
|
||||
|
||||
**Impact**: If both layers are dark (which ASCII art usually is), overlay produces near-black output.
|
||||
|
||||
**Fix**: Use `screen` for dark source material. Screen always brightens: `1 - (1-a)*(1-b)`.
|
||||
|
||||
### Colordodge Division by Zero
|
||||
|
||||
`colordodge(a, b) = a / (1 - b)`. When `b = 1.0` (pure white pixels), this divides by zero.
|
||||
|
||||
**Fix**: Add epsilon: `a / (1 - b + 1e-6)`. The implementation in `BLEND_MODES` should include this.
|
||||
|
||||
### Colorburn Division by Zero
|
||||
|
||||
`colorburn(a, b) = 1 - (1-a) / b`. When `b = 0` (pure black pixels), this divides by zero.
|
||||
|
||||
**Fix**: Add epsilon: `1 - (1-a) / (b + 1e-6)`.
|
||||
|
||||
### Multiply Always Darkens
|
||||
|
||||
`multiply(a, b) = a * b`. Since both operands are [0,1], the result is always <= min(a,b). Never use multiply as a feedback blend mode — the frame goes black within a few frames.
|
||||
|
||||
**Fix**: Use `screen` for feedback, or `add` with low opacity.
|
||||
|
||||
---
|
||||
|
||||
## Multiprocessing
|
||||
|
||||
### Pickling Constraints
|
||||
|
||||
`ProcessPoolExecutor` serializes function arguments via pickle. This constrains what you can pass to workers:
|
||||
|
||||
| Can Pickle | Cannot Pickle |
|
||||
|-----------|---------------|
|
||||
| Module-level functions (`def fx_foo():`) | Lambdas (`lambda x: x + 1`) |
|
||||
| Dicts, lists, numpy arrays | Closures (functions defined inside functions) |
|
||||
| Class instances (with `__reduce__`) | Instance methods |
|
||||
| Strings, numbers | File handles, sockets |
|
||||
|
||||
**Impact**: All scene functions referenced in the SCENES table must be defined at module level with `def`. If you use a lambda or closure, you get:
|
||||
|
||||
```
|
||||
_pickle.PicklingError: Can't pickle <function <lambda> at 0x...>
|
||||
```
|
||||
|
||||
**Fix**: Define all scene functions at module top level. Lambdas used inside `_render_vf()` as val_fn/hue_fn are fine because they execute within the worker process — they're not pickled across process boundaries.
|
||||
|
||||
### macOS spawn vs Linux fork
|
||||
|
||||
On macOS, `multiprocessing` defaults to `spawn` (full serialization). On Linux, it defaults to `fork` (copy-on-write). This means:
|
||||
|
||||
- **macOS**: Feature arrays are serialized per worker (~57KB for 30s video, but scales with duration). Each worker re-imports the entire module.
|
||||
- **Linux**: Feature arrays are shared via COW. Workers inherit the parent's memory.
|
||||
|
||||
**Impact**: On macOS, module-level code (like `detect_hardware()`) runs in every worker process. If it has side effects (e.g., subprocess calls), those happen N+1 times.
|
||||
|
||||
### Per-Worker State Isolation
|
||||
|
||||
Each worker creates its own:
|
||||
- `Renderer` instance (with fresh grid cache)
|
||||
- `FeedbackBuffer` (feedback doesn't cross scene boundaries)
|
||||
- Random seed (`random.seed(hash(seg_id) + 42)`)
|
||||
|
||||
This means:
|
||||
- Particle state doesn't carry between scenes (expected)
|
||||
- Feedback trails reset at scene cuts (expected)
|
||||
- `np.random` state is NOT seeded by `random.seed()` — they use separate RNGs
|
||||
|
||||
**Fix for deterministic noise**: Use `np.random.RandomState(seed)` explicitly:
|
||||
|
||||
```python
|
||||
rng = np.random.RandomState(hash(seg_id) + 42)
|
||||
noise = rng.random((rows, cols))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Brightness Issues
|
||||
|
||||
### Dark Scenes After Tonemap
|
||||
|
||||
If a scene is still dark after tonemap, check:
|
||||
|
||||
1. **Gamma too high**: Lower gamma (0.5-0.6) for scenes with destructive post-processing
|
||||
2. **Shader destroying brightness**: Solarize, posterize, or contrast adjustments in the shader chain can undo tonemap's work. Move destructive shaders earlier in the chain, or increase gamma to compensate.
|
||||
3. **Feedback with multiply**: Multiply feedback darkens every frame. Switch to screen or add.
|
||||
4. **Overlay blend in scene**: If the scene function uses `blend_canvas(..., "overlay", ...)` with dark layers, switch to screen.
|
||||
|
||||
### Diagnostic: Test-Frame Brightness
|
||||
|
||||
```bash
|
||||
python reel.py --test-frame 10.0
|
||||
# Output: Mean brightness: 44.3, max: 255
|
||||
```
|
||||
|
||||
If mean < 20, the scene needs attention. Common fixes:
|
||||
- Lower gamma in the SCENES entry
|
||||
- Change internal blend modes from overlay/multiply to screen/add
|
||||
- Increase value field multipliers (e.g., `vf_plasma(...) * 1.5`)
|
||||
- Check that the shader chain doesn't have an aggressive solarize or threshold
|
||||
|
||||
### v1 Brightness Pattern (Deprecated)
|
||||
|
||||
The old pattern used a linear multiplier:
|
||||
|
||||
```python
|
||||
# OLD — don't use
|
||||
canvas = np.clip(canvas.astype(np.float32) * 2.0, 0, 255).astype(np.uint8)
|
||||
```
|
||||
|
||||
This fails because:
|
||||
- Dark scenes (mean 8): `8 * 2.0 = 16` — still dark
|
||||
- Bright scenes (mean 130): `130 * 2.0 = 255` — clipped, lost detail
|
||||
|
||||
Use `tonemap()` instead. See `composition.md` § Adaptive Tone Mapping.
|
||||
|
||||
---
|
||||
|
||||
## ffmpeg Issues
|
||||
|
||||
### Pipe Deadlock
|
||||
|
||||
The #1 production bug. If you use `stderr=subprocess.PIPE`:
|
||||
|
||||
```python
|
||||
# DEADLOCK — stderr buffer fills at 64KB, blocks ffmpeg, blocks your writes
|
||||
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
```
|
||||
|
||||
**Fix**: Always redirect stderr to a file:
|
||||
|
||||
```python
|
||||
stderr_fh = open(err_path, "w")
|
||||
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE,
|
||||
stdout=subprocess.DEVNULL, stderr=stderr_fh)
|
||||
```
|
||||
|
||||
### Frame Count Mismatch
|
||||
|
||||
If the number of frames written to the pipe doesn't match what ffmpeg expects (based on `-r` and duration), the output may have:
|
||||
- Missing frames at the end
|
||||
- Incorrect duration
|
||||
- Audio-video desync
|
||||
|
||||
**Fix**: Calculate frame count explicitly: `n_frames = int(duration * FPS)`. Don't use `range(int(start*FPS), int(end*FPS))` without verifying the total matches.
|
||||
|
||||
### Concat Fails with "unsafe file name"
|
||||
|
||||
```
|
||||
[concat @ ...] Unsafe file name
|
||||
```
|
||||
|
||||
**Fix**: Always use `-safe 0`:
|
||||
```python
|
||||
["ffmpeg", "-f", "concat", "-safe", "0", "-i", concat_path, ...]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Font Issues
|
||||
|
||||
### Cell Height (macOS Pillow)
|
||||
|
||||
`textbbox()` and `getbbox()` return incorrect heights on some macOS Pillow versions. Use `getmetrics()`:
|
||||
|
||||
```python
|
||||
ascent, descent = font.getmetrics()
|
||||
cell_height = ascent + descent # correct
|
||||
# NOT: font.getbbox("M")[3] # wrong on some versions
|
||||
```
|
||||
|
||||
### Missing Unicode Glyphs
|
||||
|
||||
Not all fonts render all Unicode characters. If a palette character isn't in the font, the glyph renders as a blank or tofu box, appearing as a dark hole in the output.
|
||||
|
||||
**Fix**: Validate at init:
|
||||
|
||||
```python
|
||||
all_chars = set()
|
||||
for pal in [PAL_DEFAULT, PAL_DENSE, PAL_RUNE, ...]:
|
||||
all_chars.update(pal)
|
||||
|
||||
valid_chars = set()
|
||||
for c in all_chars:
|
||||
if c == " ":
|
||||
valid_chars.add(c)
|
||||
continue
|
||||
img = Image.new("L", (20, 20), 0)
|
||||
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
|
||||
if np.array(img).max() > 0:
|
||||
valid_chars.add(c)
|
||||
else:
|
||||
log(f"WARNING: '{c}' (U+{ord(c):04X}) missing from font")
|
||||
```
|
||||
|
||||
### Platform Font Paths
|
||||
|
||||
| Platform | Common Paths |
|
||||
|----------|-------------|
|
||||
| macOS | `/System/Library/Fonts/Menlo.ttc`, `/System/Library/Fonts/Monaco.ttf` |
|
||||
| Linux | `/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf` |
|
||||
| Windows | `C:\Windows\Fonts\consola.ttf` (Consolas) |
|
||||
|
||||
Always probe multiple paths and fall back gracefully. See `architecture.md` § Font Selection.
|
||||
|
||||
---
|
||||
|
||||
## Performance
|
||||
|
||||
### Slow Shaders
|
||||
|
||||
Some shaders use Python loops and are very slow at 1080p:
|
||||
|
||||
| Shader | Issue | Fix |
|
||||
|--------|-------|-----|
|
||||
| `wave_distort` | Per-row Python loop | Use vectorized fancy indexing |
|
||||
| `halftone` | Triple-nested loop | Vectorize with block reduction |
|
||||
| `matrix rain` | Per-column per-trail loop | Accumulate index arrays, bulk assign |
|
||||
|
||||
### Render Time Scaling
|
||||
|
||||
If render is taking much longer than expected:
|
||||
1. Check grid count — each extra grid adds ~100-150ms/frame for init
|
||||
2. Check particle count — cap at quality-appropriate limits
|
||||
3. Check shader count — each shader adds 2-25ms
|
||||
4. Check for accidental Python loops in effects (should be numpy only)
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
### Using `r.S` vs the `S` Parameter
|
||||
|
||||
The v2 scene protocol passes `S` (the state dict) as an explicit parameter. But `S` IS `r.S` — they're the same object. Both work:
|
||||
|
||||
```python
|
||||
def fx_scene(r, f, t, S):
|
||||
S["counter"] = S.get("counter", 0) + 1 # via parameter (preferred)
|
||||
r.S["counter"] = r.S.get("counter", 0) + 1 # via renderer (also works)
|
||||
```
|
||||
|
||||
Use the `S` parameter for clarity. The explicit parameter makes it obvious that the function has persistent state.
|
||||
|
||||
### Forgetting to Handle Empty Feature Values
|
||||
|
||||
Audio features default to 0.0 if the audio is silent. Use `.get()` with sensible defaults:
|
||||
|
||||
```python
|
||||
energy = f.get("bass", 0.3) # default to 0.3, not 0
|
||||
```
|
||||
|
||||
If you default to 0, effects go blank during silence.
|
||||
|
||||
### Writing New Files Instead of Editing Existing State
|
||||
|
||||
A common bug in particle systems: creating new arrays every frame instead of updating persistent state.
|
||||
|
||||
```python
|
||||
# WRONG — particles reset every frame
|
||||
S["px"] = []
|
||||
for _ in range(100):
|
||||
S["px"].append(random.random())
|
||||
|
||||
# RIGHT — only initialize once, update each frame
|
||||
if "px" not in S:
|
||||
S["px"] = []
|
||||
# ... emit new particles based on beats
|
||||
# ... update existing particles
|
||||
```
|
||||
|
||||
### Not Clipping Value Fields
|
||||
|
||||
Value fields should be [0, 1]. If they exceed this range, `val2char()` produces index errors:
|
||||
|
||||
```python
|
||||
# WRONG — vf_plasma() * 1.5 can exceed 1.0
|
||||
val = vf_plasma(g, f, t, S) * 1.5
|
||||
|
||||
# RIGHT — clip after scaling
|
||||
val = np.clip(vf_plasma(g, f, t, S) * 1.5, 0, 1)
|
||||
```
|
||||
|
||||
The `_render_vf()` helper clips automatically, but if you're building custom scenes, clip explicitly.
|
||||
|
||||
## Brightness Best Practices
|
||||
|
||||
- Dense animated backgrounds — never flat black, always fill the grid
|
||||
- Vignette minimum clamped to 0.15 (not 0.12)
|
||||
- Bloom threshold 130 (not 170) so more pixels contribute to glow
|
||||
- Use `screen` blend mode (not `overlay`) for dark ASCII layers — overlay squares dark values: `2 * 0.12 * 0.12 = 0.03`
|
||||
- FeedbackBuffer decay minimum 0.5 — below that, feedback disappears too fast to see
|
||||
- Value field floor: `vf * 0.8 + 0.05` ensures no cell is truly zero
|
||||
- Per-scene gamma overrides: default 0.75, solarize 0.55, posterize 0.50, bright scenes 0.85
|
||||
- Test frames early: render single frames at key timestamps before committing to full render
|
||||
|
||||
**Quick checklist before full render:**
|
||||
1. Render 3 test frames (start, middle, end)
|
||||
2. Check `canvas.mean() > 8` after tonemap
|
||||
3. Check no scene is visually flat black
|
||||
4. Verify per-section variation (different bg/palette/color per scene)
|
||||
5. Confirm shader chain includes bloom (threshold 130)
|
||||
6. Confirm vignette strength ≤ 0.25
|
||||
Reference in New Issue
Block a user