JustDoIt — Rendering Architecture & Roadmap
JustDoIt — Rendering Architecture & Roadmap
“The best terminal rendering codebase on the planet.”
Not a goal. A standard.
The Premise
ASCII/ANSI is the hardest rendering medium that exists. Fixed-width monospace grid. 95 printable characters. No subpixel positioning. No blend modes. No transparency. Color only where the terminal cooperates.
If you can make text alive in that medium — truly alive, with sharp edges, real physics, generative fills, and 24fps animation — you can make it alive anywhere. The terminal is where ideas get proven. Everything else is just a richer substrate.
This document describes how JustDoIt becomes the definitive implementation of that proof.
The Architecture Stack
┌─────────────────────────────────────────────────────────────────────┐
│ INPUT LAYER │
│ plain text │ PIL image │ AI-generated image │ live capture │
└──────────────┴─────────────┴──────────────────────┴─────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────┐
│ RENDER PIPELINE │
│ │
│ PATH A: Glyph-Dict (fast, fill-composable) │
│ text → font lookup → glyph mask → fill_fn(mask, t) → rows │
│ │
│ PATH B: Image-Sample (quality, color, any source) │
│ image → [composite effects] → image_to_ascii() → (char, rgb) │
│ │
│ PATH C: Hybrid (best of both) │
│ text → PIL render → effect image → composite → sample │
│ │
└─────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────┐
│ SPATIAL EFFECTS LAYER │
│ warp │ distort │ perspective │ rotate │ fisheye │ zoom │
└─────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────┐
│ ANIMATION LAYER │
│ frame loop │ time parameter t │ simulation state │ easing │
└─────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────┐
│ OUTPUT LAYER │
│ terminal/ANSI │ SVG │ PNG │ APNG │ asciinema │ WebGL │
└─────────────────────────────────────────────────────────────────────┘
Every layer is independently swappable. Every transition between layers is a well-defined data contract. Nothing is hardwired.
Data Contracts
These are the types that flow between layers. Get these right and the whole system is composable.
Glyph (Path A output)
Glyph = list[str] # list of rows, same width per font
GlyphMask = list[list[float]] # 0.0=empty, 1.0=ink, float for anti-alias
Cell (Path B output, universal intermediate)
@dataclass
class Cell:
char: str # single ASCII character
fg: tuple[int, int, int] | None # RGB foreground, None = default
bg: tuple[int, int, int] | None # RGB background, None = transparent
bold: bool = False
dim: bool = False
Grid (universal intermediate — what everything converges to)
Grid = list[list[Cell]] # Grid[row][col]
This is the key type. Everything upstream produces a Grid. Everything downstream consumes one. Both paths converge here. The Grid is the universal handoff point.
Frame (animation unit)
@dataclass
class Frame:
grid: Grid
t: float # normalized time [0.0, 1.0]
index: int # absolute frame number
duration_ms: int # how long to display this frame
AnimationSpec
@dataclass
class AnimationSpec:
fps: float = 24.0
frame_count: int = 48
loop: bool = True
bounce: bool = False # forward + reverse = natural loop
easing: str = "linear" # linear | ease-in | ease-out | ease-in-out
Resolution Model
Resolution is a first-class parameter, not an afterthought.
Display Targets
# Named presets (extend freely)
DISPLAYS = {
"terminal": DisplayTarget(cols=80, rows=24, note="classic terminal"),
"wide": DisplayTarget(cols=220, rows=50, note="modern widescreen terminal"),
"fhd": DisplayTarget(px_w=1920, px_h=1080, cell_w=8, cell_h=16),
"qhd": DisplayTarget(px_w=2560, px_h=1440, cell_w=8, cell_h=16),
"4k": DisplayTarget(px_w=3840, px_h=2160, cell_w=8, cell_h=16),
"4k-hidpi": DisplayTarget(px_w=3840, px_h=2160, cell_w=16, cell_h=32), # 2× cell, same grid as FHD
"5k": DisplayTarget(px_w=5120, px_h=2880, cell_w=8, cell_h=16),
}
Resolution-Derived Values
Given a DisplayTarget, everything else is computable:
cols = px_w // cell_w # ASCII grid width
rows = px_h // cell_h # ASCII grid height
canvas = (px_w, px_h) # PIL image size for Path B
font_pt = fit_to_fill(text, cols) # largest font that fills the width
cell_ar = cell_w / cell_h # aspect ratio (~0.5 for monospace)
The cell_w / cell_h problem
Monospace cells are not square (~0.5 aspect ratio). This matters for:
- Image sampling (a “square” region in pixel space is not square in cell space)
- Effect generation (noise, plasma, etc. must compensate or circles become ovals)
- SDF computation (Euclidean distance in cell space ≠ pixel space)
Rule: All effects that care about geometry receive (cell_w, cell_h) and compensate internally. Callers never need to think about this.
The Three Render Paths
Path A — Glyph-Dict (current, extend don’t replace)
When to use: Animated fills (flame, plasma, turing, RD, slime). Per-letter effects. Low-latency animation. Terminal output.
Pipeline:
text → font.lookup(char) → GlyphMask → fill_fn(mask, t, **params) → Glyph → Grid
Strengths:
- O(letters × fill_cells) per frame — scales with text length, not display resolution
- Fill effects have per-letter masks — can treat each letter independently
- Zero PIL dependency for pure fills
- Works in actual terminals (not just SVG/PNG)
Limitations:
- Letter shape quality bounded by
font_sizein the rasterizer - No color sourced from the fill (fills produce char choices, color is separate)
- No arbitrary image input
Extension points:
fill_fnsignature:(mask: GlyphMask, t: float, **params) -> Glyph— addteverywhere- Anti-aliased masks:
floatvalues in GlyphMask (currently binary 0/1) enable sub-cell effects - Per-fill color: fill_fn can return
list[list[Cell]]instead oflist[str]for color-aware fills
Path B — Image-Sampler (new, G09)
When to use: 4K gallery stills. AI image → ASCII. Photo → ASCII. Maximum quality text. Color-accurate output.
Pipeline:
PIL.Image → image_to_ascii(cell_w, cell_h) → Grid
Internally:
for each cell (row, col):
pixel_block = image[row*cell_h:(row+1)*cell_h, col*cell_w:(col+1)*cell_w]
zone_vec = compute_6zone_coverage(pixel_block.to_grayscale())
char = nearest_char(zone_vec, char_db)
fg_rgb = mean_rgb(pixel_block)
grid[row][col] = Cell(char, fg=fg_rgb)
Strengths:
- Harri-quality edge following (chars follow contours, not just brightness)
- True RGB color per cell
- Source-agnostic: text, AI image, photo, video frame — same code
- PIL kerning, anti-aliasing, subpixel rendering included for free (text source)
Limitations:
- DB build cost (one-time, cached)
- Animation: DB lookup × cells × frames (needs numpy at 4K)
- No per-letter fill effects (no mask isolation)
Performance envelope: | Display | Cells | numpy time/frame | pure-python time/frame | |———|——-|—————–|————————| | fhd | ~16k | ~8ms | ~200ms | | 4k | ~65k | ~30ms | ~800ms | | 4k anim | 65k×24fps | ~720ms/s | not viable |
4K animation: pre-render to Grid list, write APNG. Don’t attempt real-time.
Path C — Hybrid (planned, unlocks everything)
When to use: Best quality + generative effects at any resolution. The future default.
Pipeline:
text
→ PIL render at canvas resolution (Path B quality)
→ composite effect image (flame/plasma/turing as PIL layer, aligned to mask)
→ image_to_ascii()
→ Grid (char from luminance, color from effect layer)
Why this is better than either A or B alone:
- Letter shapes are PIL-quality (not density-mapped glyph dict)
- Effect fills are visible as color in the output (flame = red/orange cells inside sharp letter edges)
- Same effect code can run at any resolution — just generate the effect at canvas size
- Effect simulation runs at a lower grid resolution for performance, upsampled to canvas before composite
The compositing model:
def composite_effect(
text_image: PIL.Image, # white text on black
effect_image: PIL.Image, # colorful effect at same size
mode: str = "mask", # "mask" | "multiply" | "screen" | "overlay"
) -> PIL.Image:
"""
mode="mask": effect visible only where text_image is ink (sharp edges)
mode="multiply": effect dims with text coverage (glow bleeds slightly)
mode="screen": effect brightens on ink (neon glow look)
mode="overlay": photoshop-style — high contrast version of multiply
"""
Effect Taxonomy
Effects fall into two families. Both should work on both paths.
Family 1: Field Effects (value at every point in space and time)
These generate a value f(x, y, t) for every position. Works naturally as an image layer.
| Effect | Generator | Notes |
|---|---|---|
| Noise | Perlin/Simplex | scale, octaves, seed |
| Plasma | Sum of sine waves | frequency, phase, palette |
| Flame | Upward noise convection | turbulence, cooling |
| Voronoi | Nearest-cell distance | n_cells, metric, preset |
| Wave | Directional sine | frequency, amplitude, direction |
| Fractal | Mandelbrot/Julia escape | center, zoom, max_iter |
| Gradient | Linear/radial/conic | stops, angle |
Interface:
class FieldEffect:
def sample(self, x: float, y: float, t: float) -> float:
"""Return value in [0.0, 1.0] at normalized position and time."""
def to_image(self, w: int, h: int, t: float, palette: Palette) -> PIL.Image:
"""Rasterize to PIL image for compositing."""
def to_mask_fill(self, mask: GlyphMask, t: float) -> Glyph:
"""Fast path for Path A — sample at cell centers."""
Same effect, three consumers: image composite (Path C), direct fill (Path A), future GPU shader.
Family 2: Simulation Effects (stateful, evolve over time)
These maintain state between frames. Can’t be computed independently per-frame.
| Effect | State | Notes |
|---|---|---|
| Reaction-Diffusion | U, V grids | Gray-Scott equations |
| Turing | Activator/inhibitor grids | FHN model |
| Slime Mold | Agent positions | Physarum simulation |
| Strange Attractor | (x, y, z) trajectories | Lorenz, Rössler |
| L-System | Grammar state | rules, axiom, generations |
| Conway/CA | Cell grid | rule variants |
Interface:
class SimulationEffect:
def __init__(self, w: int, h: int, **params): ...
def step(self, dt: float = 1.0) -> None:
"""Advance simulation by dt. Called once per frame."""
def to_image(self, palette: Palette) -> PIL.Image:
"""Render current state to PIL image."""
def to_mask_fill(self, mask: GlyphMask) -> Glyph:
"""Fast path for Path A."""
def reset(self) -> None: ...
Critical: simulation effects run at a logical grid that is independent of output resolution. A Turing simulation might run on a 120×34 grid regardless of whether the output is FHD or 4K. The to_image() method upsamples to whatever canvas size is needed.
Animation Model
The time parameter t
Every effect and every fill function receives t: float — normalized time in [0.0, 1.0] for a looping animation, or absolute seconds for non-looping. This is the single threading parameter that makes everything animatable.
# Frame loop (simplified)
for frame_idx in range(spec.frame_count):
t = frame_idx / spec.frame_count # [0.0, 1.0)
# Path A
grid = render_glyph(text, font, fill_fn=flame_fill, t=t)
# Path B / C
effect_img = flame_effect.to_image(canvas_w, canvas_h, t, palette)
composited = composite_effect(text_img, effect_img, mode="mask")
grid = image_to_ascii(composited, cell_w, cell_h)
frames.append(Frame(grid, t=t, index=frame_idx, duration_ms=1000//fps))
Bounce / pingpong
# Bounce: t goes 0→1→0, giving a seamless loop with no jump cut
if spec.bounce:
cycle = frame_idx / (spec.frame_count / 2)
t = cycle if cycle <= 1.0 else 2.0 - cycle
Simulation effects and frame count
Simulation effects must run for frame_count steps. They don’t use normalized t — they use step count. The animation system calls effect.step() once per frame, then captures effect.to_image().
CLI Design — Resolution-Aware and Maximally Flexible
Core flags (stable, don’t break)
python justdoit.py TEXT [options]
--font NAME font name (block|slim|figlet/*|ttf:path)
--color NAME ANSI color or rainbow
--fill NAME fill effect name
--gap N char gap between letters
Resolution flags (new)
--display NAME|WxH named preset or explicit e.g. 3840x2160
--cell-w N cell width in px (default: auto from display)
--cell-h N cell height in px (default: auto from display)
--cols N explicit grid width override
--rows N explicit grid height override
Pipeline flags (new)
--pipeline glyph|image|hybrid|auto
glyph: Path A — fast, fill effects, terminal-safe
image: Path B — quality, color, any image source
hybrid: Path C — best quality + generative effects
auto: pick based on other flags (default)
--source text|file:PATH|ai:PROMPT
text: render --text as ASCII art (default)
file: convert image file to ASCII
ai: generate image via AI then convert (future)
Effect flags (new — generalize existing)
--fill NAME fill effect (existing, extended)
--fill-param K=V arbitrary effect parameter (repeatable)
e.g. --fill plasma --fill-param preset=tight --fill-param t_offset=0.5
--composite-mode mask|multiply|screen|overlay
how effect layer composites with text in hybrid mode
--palette NAME|HEX,HEX,...
color palette for effects that use one
Animation flags (new)
--animate enable animation output
--fps N frames per second (default: 24)
--frames N total frame count (default: 48)
--duration MS total duration in ms (overrides --frames if set)
--loop loop the animation (default: true)
--bounce pingpong loop (forward + reverse)
--easing linear|ease-in|ease-out|ease-in-out
--sim-steps N simulation steps per frame for sim effects (default: 1)
--sim-warmup N warmup steps before first frame (default: 0)
Output flags (extend existing)
--output FILE output file (.svg|.png|.apng|.cast|.txt)
--format auto|svg|png|apng|ansi|cast
auto: infer from --output extension
--profile standard|wide|4k gallery profile (existing)
--bg-color HEX background color for image output
--font-size N override computed font size for SVG/PNG output
Auto-resolution logic
--display 4k --pipeline auto --animate
→ pipeline = hybrid (animate + display > fhd → hybrid preferred)
→ canvas = 3840×2160
→ cell_w=8, cell_h=16 → grid = 480×135
→ font_pt = fit_to_fill(text, 480 cols)
→ output format: .apng (animate implies apng unless --format specified)
--display terminal --fill flame
→ pipeline = glyph (terminal + fill → fast path)
→ cols = terminal_size().cols
→ no PIL dependency, runs in actual terminal
--pipeline auto decision table
| Conditions | Chosen path |
|—|—|
| --source file or --source ai | image |
| --display > fhd AND no --fill | image |
| --fill specified AND --animate | glyph (simulation fills) or hybrid (field fills) |
| --display terminal (no px) | glyph |
| --display 4k AND --fill | hybrid |
| everything else | glyph |
Output Formats
| Format | Description | Color | Animation | Terminal-safe |
|---|---|---|---|---|
| ansi | ANSI escape codes to stdout | 24-bit | no | yes |
| txt | plain text, no color | no | no | yes |
| svg | vector, per-char <text> elements |
full RGB | no (multi-file) | no |
| png | rasterized via PIL | full RGB | no | no |
| apng | animated PNG, frame sequence | full RGB | yes | no |
| cast | asciinema v2 format | ANSI | yes | no |
| html | <pre> with <span style=""> |
full RGB | CSS animation | no |
All output formats consume list[Frame] (or Frame for stills). The output layer is a pure function: frames → bytes.
Performance Strategy
Do not over-engineer early. Profile first.
The bottleneck hierarchy (measured, not assumed):
- Simulation step — RD/Turing at large grids. Solution: run at 1/4 res, upsample.
- DB nearest-neighbor search — 65k cells × 95 chars × 6D L2. Solution: numpy vectorized matmul, or precompute a quantized lookup table.
- PIL render — text drawing. Solution: cache the text image; only re-render when text changes.
- File I/O — APNG write. Solution: stream frames, don’t hold all in RAM.
Numpy is optional but practically required above FHD animation
Make it optional with a pure-Python fallback. Raise a warning (not an error) when numpy is missing and the user requests 4K animation. The pure-Python path exists for correctness testing and small renders.
Caching layer
# Cache keys:
# - char_db: (charset, cell_w, cell_h)
# - text_image: (text, font_path, font_pt, canvas_size, fg, bg)
# - effect_image: for field effects — (effect_name, w, h, t, params_hash)
# Cache policy: LRU, max_size configurable, default 64 entries
Extension Points (not optional — design for these from day one)
Custom fills
from justdoit.effects import register_fill
@register_fill("myeffect")
def my_effect(mask: GlyphMask, t: float, **params) -> Glyph:
...
Custom fonts
from justdoit.fonts import register_font
register_font("myfont", my_glyph_dict)
# or
register_font_from_ttf("myfont", "/path/to/font.ttf", size=16)
# or
register_font_from_image("myfont", "/path/to/sprite.png", char_w=8, char_h=16)
Custom output targets
from justdoit.output import register_output
@register_output("myformat", extensions=[".xyz"])
def write_myformat(frames: list[Frame], path: str, **kwargs) -> None:
...
Custom compositors
from justdoit.composite import register_compositor
@register_compositor("dissolve")
def dissolve(base: PIL.Image, effect: PIL.Image, t: float) -> PIL.Image:
...
The Path to “Best on the Planet”
What “best” actually means here
Not most features. Not most effects. Best means:
- Correctness — characters follow glyph contours. Edges are sharp. Colors are accurate.
- Composability — any fill, any font, any source, any output. No hidden incompatibilities.
- Performance — fast enough to be used. Documented limits. Graceful degradation.
- Extensibility — adding a new fill, font, or output is one file and one registration call.
- Portability — zero hard dependencies for core functionality. PIL optional. numpy optional. GPU optional. If you have them, use them. If you don’t, still works.
Near-term (what Claude is building now)
- ✅
core/char_db.py— 6D zone shape DB - ✅
core/image_sampler.py— image → ASCII grid - ✅
core/image_pipeline.py— text/image → Grid entry points - 4K gallery using image pipeline
Medium-term
Celldataclass as universal intermediate typet: floatparameter threaded through all fill functionsFieldEffectbase class +to_image()method on all field effects--display/--pipeline/--animateCLI flags- Hybrid path (Path C) — text image + effect composite
Long-term
SimulationEffectbase class — stateful, resolution-independentFrame+AnimationSpectypes- HTML output format
- WebGL shader export (the pipeline is the product; the substrate is a parameter)
- AI image source (
--source ai:PROMPT)
Invariants — Never Break These
uv run pytestis always green. Every PR. No exceptions.- Core is zero-dependency.
from justdoit import rendermust work with only stdlib. - Both paths co-exist. Never remove the glyph-dict path. It has real advantages.
- All glyphs in a font have the same height. The rasterizer zips rows.
t: floatis normalized [0.0, 1.0] for looping effects. Never assume absolute time.- The Grid is the handoff type. Everything produces Grids. Everything consumes Grids.
- CLI flags are additive. New flags never change the behavior of old flags.
- Patent flag protocol. Anything with no known prior art: stop, flag, don’t push. (See CLAUDE.md)
Last updated: 2026-04-24
Authors: Jonny Galloway, NumberOne