Lesson 6 · 11 min
Cost optimization that actually moves the needle
90% of LLM cost wins come from 5 patterns. Skip the obscure ones until you've done all five.
The five patterns ranked by impact
1. Use a smaller model when you can
Most requests in production are easy. Tier: Haiku/GPT-4o-mini for 70%, Sonnet/GPT-4o for 25%, Opus for 5%. Often 5-10× cheaper at no quality loss.
2. Prompt caching
Provider-side caching of stable prefixes. Put system prompt + few-shot first, variable user input last. Up to 90% cost cut on the cached prefix (5-min TTL).
3. Output token reduction
LLM output tokens cost 4-5× input tokens. Tighter prompts that produce shorter outputs are massive wins. "Be concise. No preamble." alone cuts 15-25% of tokens.
4. Batch APIs
Anthropic, OpenAI batch APIs run within 24h at 50% off. Anything that doesn't need real-time → batch.
5. Self-hosting (above ~$500/day)
At scale, self-hosting on rented GPUs beats per-token API pricing. Below that, the ops overhead wipes out the savings.