How Orb reuses the LLM KV cache — and where the reasoning toggle forks it

A companion to docs/architecture/kv-cache.md. This walkthrough is single-model mode: Director, Writer, and Editor all run on the same endpoint and model. Step through a full turn (Director → Writer → Editor), then across turns, then try the toggles at the end.

Concept

Reasoning per pass: click to flip · the passes below redraw live

An LLM call

served from cache (hit) computed now (miss) warm prefix on server

The inference server

A prompt's cache only reuses a matching prefix, from the top down.