Lesson 11 · 11 min
Production patterns: caching, fallback, retries
The infrastructure tricks that turn a prompt demo into a real product.
From demo to product
A prompt that works in a notebook needs four production primitives to survive contact with users:
- Prompt caching — many providers cache the long, stable prefix of your prompt. Free latency win.
- Retries with jitter — model APIs throttle and occasionally 5xx. Exponential backoff with jitter is non-negotiable.
- Fallbacks — if Claude is down, fall back to GPT or a smaller model. If structured output fails to parse, try once more, then surface a clean error.
- Cost & latency budgets — every call has a max-tokens, a timeout, and a token budget. Log all three.