Lesson 2 · 10 min

Embeddings — words as coordinates

Once a token is an integer, it becomes a vector in a high-dimensional space. The geometry of that space is where meaning lives.

From IDs to vectors

The model's first layer is a giant lookup table — the embedding matrix. Token ID 472 → row 472 → a vector of, say, 4096 floating-point numbers.

That vector is the model's internal representation of the token. Crucially, the model learns these vectors during training such that semantically similar tokens end up close together in space.