Lesson 2 · 11 min

LoRA, QLoRA, and PEFT

You don't fine-tune the whole model. You train a tiny adapter and freeze everything else.

Full fine-tuning vs PEFT

Full fine-tuning updates every parameter in the model. For a 7B model that's:

~28 GB of FP16 weights
~84 GB of optimizer state (Adam: 3× weights for momentum + variance + master copy)
A single A100 80GB barely fits it; multi-GPU is required for anything bigger.

PEFT (Parameter-Efficient Fine-Tuning) freezes the base model and trains a small number of new parameters. The most popular variant: LoRA (Low-Rank Adaptation).