Lesson 8 · 9 min
Temperature, top-p, and sampling
The three knobs that control how "random" the output is.
Sampling, briefly
At each token step, the model produces a probability distribution over the next token. Sampling parameters decide how that distribution becomes the actual chosen token.
- Temperature (0–2) — flattens (high) or sharpens (low) the distribution. Low → deterministic, repetitive, safe. High → creative, weird, unreliable.
- Top-p / nucleus (0–1) — only sample from tokens whose cumulative probability mass is ≤ p. 1.0 = consider all; 0.9 = ignore the long tail.
- Top-k (1–N) — only consider the top K most likely tokens.
In practice you tune temperature and leave the rest at defaults.