Lesson 7 · 10 min
Self-hosting math — when to leave the API
At some point an open-source model on your own GPU beats API pricing. The breakeven calculation that tells you when — and the operational cost honesty that keeps the decision sane.
The economics
A frontier-tier API call (typical 800-input / 200-output) costs ~$0.0054. At 1M calls/day that's $5,400/day = ~$1.97M/year.
The same workload on a fine-tuned 8B model on a single H100 ($2/hour spot) at vLLM throughput of ~80-120 RPS gives you 7-10M calls/day capacity. Per-call cost: ~$0.000003 — three orders of magnitude cheaper. Annualized: ~$17k.
The breakeven point depends on your traffic. With Sonnet-tier APIs and typical workloads, somewhere around 30k-100k requests/day is where the API stops winning on raw cost.