Lesson 4 · 10 min
Deterministic evals — structured output and tool use
When your model must produce JSON, call the right tool, or extract a specific field — LLM-as-judge is overkill. Deterministic evals are faster, cheaper, and more reliable. The patterns that cover 80% of use cases.
When deterministic beats probabilistic
If there's a correct answer you can compute, use it. LLM-as-judge costs 5–20× more per eval and introduces noise. Deterministic evals run in milliseconds and never hallucinate a score.
The cases where deterministic wins:
- Structured output — does the JSON parse? Are required fields present? Are values in range?
- Classification — does the label match ground truth? Precision / recall / F1 are deterministic.
- Tool use — did the model call the right tool? With the right arguments?
- Extraction — is the extracted entity in the source text? Exact match or normalized match.
- Format constraints — is the response under N tokens? In the required language?