Lesson 8 · 9 min

Catastrophic forgetting

The fine-tune learns the new task — and forgets things it used to do well. Here's how to avoid it.

What it looks like

You fine-tune on legal contract extraction. The model becomes great at extraction. Then a teammate tries to use it for casual conversation — and it sounds like a robot, refuses to chat, or breaks into legal-ese mid-sentence.

This is catastrophic forgetting: the new task displaces general capabilities. Especially common when:

Dataset is narrow (one task, one tone, one domain)
Training too long (more epochs = more displacement)
Learning rate too high
Full fine-tuning vs LoRA (LoRA is much less prone)