Lesson 2 · 11 min
LoRA, QLoRA, and PEFT
You don't fine-tune the whole model. You train a tiny adapter and freeze everything else.
Full fine-tuning vs PEFT
Full fine-tuning updates every parameter in the model. For a 7B model that's:
- ~28 GB of FP16 weights
- ~84 GB of optimizer state (Adam: 3× weights for momentum + variance + master copy)
- A single A100 80GB barely fits it; multi-GPU is required for anything bigger.
PEFT (Parameter-Efficient Fine-Tuning) freezes the base model and trains a small number of new parameters. The most popular variant: LoRA (Low-Rank Adaptation).