Skip to main content

Lesson 11 · 11 min

Production patterns: caching, fallback, retries

The infrastructure tricks that turn a prompt demo into a real product.

From demo to product

A prompt that works in a notebook needs four production primitives to survive contact with users:

  1. Prompt caching — many providers cache the long, stable prefix of your prompt. Free latency win.
  2. Retries with jitter — model APIs throttle and occasionally 5xx. Exponential backoff with jitter is non-negotiable.
  3. Fallbacks — if Claude is down, fall back to GPT or a smaller model. If structured output fails to parse, try once more, then surface a clean error.
  4. Cost & latency budgets — every call has a max-tokens, a timeout, and a token budget. Log all three.