Lesson 5 · 11 min
Model routing — the right model for the right call
Calling a frontier model for 'classify this support ticket into one of 8 categories' is a bug. The router pattern that trades 50-100ms of classifier latency for 5-30× cost reduction.
The pattern
A cheap classifier sees the request and decides difficulty. Easy traffic (~80%) goes to a small fine-tuned model or the cheap tier. Hard traffic (~20%) goes to the frontier model.
Implemented well, this trades 50-100ms of classifier latency for a 5-30× bill reduction. The cost gap is so large that even a noticeable quality drop on your distribution can still favor the routed setup.