Skip to main content

Lesson 6 · 11 min

Cost optimization that actually moves the needle

90% of LLM cost wins come from 5 patterns. Skip the obscure ones until you've done all five.

The five patterns ranked by impact

1. Use a smaller model when you can

Most requests in production are easy. Tier: Haiku/GPT-4o-mini for 70%, Sonnet/GPT-4o for 25%, Opus for 5%. Often 5-10× cheaper at no quality loss.

2. Prompt caching

Provider-side caching of stable prefixes. Put system prompt + few-shot first, variable user input last. Up to 90% cost cut on the cached prefix (5-min TTL).

3. Output token reduction

LLM output tokens cost 4-5× input tokens. Tighter prompts that produce shorter outputs are massive wins. "Be concise. No preamble." alone cuts 15-25% of tokens.

4. Batch APIs

Anthropic, OpenAI batch APIs run within 24h at 50% off. Anything that doesn't need real-time → batch.

5. Self-hosting (above ~$500/day)

At scale, self-hosting on rented GPUs beats per-token API pricing. Below that, the ops overhead wipes out the savings.