Lesson 2 · 10 min
Embeddings — words as coordinates
Once a token is an integer, it becomes a vector in a high-dimensional space. The geometry of that space is where meaning lives.
From IDs to vectors
The model's first layer is a giant lookup table — the embedding matrix. Token ID 472 → row 472 → a vector of, say, 4096 floating-point numbers.
That vector is the model's internal representation of the token. Crucially, the model learns these vectors during training such that semantically similar tokens end up close together in space.