Lesson 5 · 11 min
Regression testing in CI
An eval suite only prevents regressions if it runs on every PR. The CI integration pattern, threshold-setting discipline, and how to handle the prompt-change vs code-change distinction.
The eval suite without CI is just documentation
If you run evals manually before release, you've built a checklist — not a gate. Engineers skip checklists under deadline pressure. Evals in CI run on every PR, with no exceptions and no skipping.
The goal: a prompt change or model upgrade that drops your eval score below threshold blocks the PR before a human reviews it. Same workflow as a failing unit test.