Skip to main content

Lesson 5 · 11 min

Regression testing in CI

An eval suite only prevents regressions if it runs on every PR. The CI integration pattern, threshold-setting discipline, and how to handle the prompt-change vs code-change distinction.

The eval suite without CI is just documentation

If you run evals manually before release, you've built a checklist — not a gate. Engineers skip checklists under deadline pressure. Evals in CI run on every PR, with no exceptions and no skipping.

The goal: a prompt change or model upgrade that drops your eval score below threshold blocks the PR before a human reviews it. Same workflow as a failing unit test.