Lesson 8 · 9 min
Catastrophic forgetting
The fine-tune learns the new task — and forgets things it used to do well. Here's how to avoid it.
What it looks like
You fine-tune on legal contract extraction. The model becomes great at extraction. Then a teammate tries to use it for casual conversation — and it sounds like a robot, refuses to chat, or breaks into legal-ese mid-sentence.
This is catastrophic forgetting: the new task displaces general capabilities. Especially common when:
- Dataset is narrow (one task, one tone, one domain)
- Training too long (more epochs = more displacement)
- Learning rate too high
- Full fine-tuning vs LoRA (LoRA is much less prone)