Lesson 3 · 12 min
The gateway layer: routing, rate limiting, and fallbacks
The gateway is the control plane for all LLM traffic: it routes requests to the right model, enforces rate limits, tracks spend, and automatically fails over when a provider is down.
Why a gateway?
Calling an LLM provider directly from application code creates problems that scale:
- No centralized logging — you can't reconstruct what the model received for an incident
- No rate limit visibility — you hit the provider's limit and get 429s in production
- Provider lock-in — switching from GPT-4o to Claude requires touching every API call
- No spend control — a bad deployment can run up a $10k bill before anyone notices
A gateway sits between your application and the provider, handling all of these cross-cutting concerns in one place.