Lesson 5 · 11 min
Gradient descent — how models actually learn
Pick a loss. Compute its gradient. Step downhill. Repeat. That's every neural network ever trained.
The whole picture in one paragraph
A model has parameters θ (millions of them). For any input, it produces an output. We measure how wrong the output is with a loss function L(θ). The gradient ∇L tells us, for each parameter, which direction increases the loss most. We move in the opposite direction (downhill). One step: θ ← θ − η · ∇L(θ). The learning rate η controls step size.