How Orb reuses the LLM KV cache — and where the reasoning toggle forks it
A companion to docs/architecture/kv-cache.md. This walkthrough is single-model mode: Director, Writer, and Editor all run on the same endpoint and model. Step through a full turn (Director → Writer → Editor), then across turns, then try the toggles at the end.
Concept
Reasoning per pass:
click to flip · the passes below redraw live
An LLM call
Use the Reasoning per pass bar at the top to flip any pass. The lanes on the right show what ends up sharing a cache for the config you picked.
served from cache (hit)
computed now (miss)
warm prefix on server
The inference server
A prompt's cache only reuses a matching prefix, from the top down.