Skip to main content

Lesson 5 · 8 min

Positional encoding — why order matters

Self-attention is order-blind. We have to inject "where am I in the sequence" by hand.

A surprising flaw

Self-attention treats its inputs as a set, not a sequence. "Cat ate fish" and "Fish ate cat" produce identical attention patterns if the only signal is token identities.

We fix this by adding a positional encoding to each token's embedding before the first layer.