Lesson 6 · 10 min

Sampling — how the next token gets picked

The model outputs a distribution. Picking from it is a separate (and tunable) step.

The output isn't a token — it's a distribution

After all the attention and FFN layers, the final layer projects to vocabulary size logits. Softmax over those gives a probability distribution over every possible next token.

Which one do we actually pick? That's sampling, and it's where you have knobs.