Lesson 1 · 9 min
Tokens — what models actually see
Models do not read characters or words. They read tokens. This one reframe explains a lot of weird behavior.
Not characters. Not words. Tokens.
When you send "unbelievable performance" to an LLM, it doesn't see those 23 characters. It sees a sequence of integer token IDs, each looking up a row in the model's embedding table.
Tokens are usually:
- Whole common words like
performance(note the leading space — that's part of the token). - Subwords for rarer or compound words:
un+believ+able. - Single bytes for emoji, rare characters, or anything outside the vocabulary.