Skip to main content

Lesson 8 · 9 min

Temperature, top-p, and sampling

The three knobs that control how "random" the output is.

Sampling, briefly

At each token step, the model produces a probability distribution over the next token. Sampling parameters decide how that distribution becomes the actual chosen token.

  • Temperature (0–2) — flattens (high) or sharpens (low) the distribution. Low → deterministic, repetitive, safe. High → creative, weird, unreliable.
  • Top-p / nucleus (0–1) — only sample from tokens whose cumulative probability mass is ≤ p. 1.0 = consider all; 0.9 = ignore the long tail.
  • Top-k (1–N) — only consider the top K most likely tokens.

In practice you tune temperature and leave the rest at defaults.