Skip to main content

Lesson 2 · 11 min

LoRA, QLoRA, and PEFT

You don't fine-tune the whole model. You train a tiny adapter and freeze everything else.

Full fine-tuning vs PEFT

Full fine-tuning updates every parameter in the model. For a 7B model that's:

  • ~28 GB of FP16 weights
  • ~84 GB of optimizer state (Adam: 3× weights for momentum + variance + master copy)
  • A single A100 80GB barely fits it; multi-GPU is required for anything bigger.

PEFT (Parameter-Efficient Fine-Tuning) freezes the base model and trains a small number of new parameters. The most popular variant: LoRA (Low-Rank Adaptation).