"Attention can compare tokens; positional encoding tells it where those tokens live in the sequence."
Overview
A transformer without positional information is largely order-agnostic: it can compare token content, but it has no built-in reason to know which token came first. Positional encodings inject order through added vectors, relative biases, rotations, or score penalties.
Modern LLMs use several families of position methods. Sinusoidal and learned absolute encodings add position information to hidden states. Relative position methods modify attention scores. RoPE rotates queries and keys. ALiBi adds linear distance biases. These choices affect extrapolation, long-context behavior, KV-cache decoding, and attention diagnostics.
This section uses LaTeX Markdown with $...$ and `
`. The notebooks implement each scheme in small NumPy examples and validate the invariants learners should remember.
Prerequisites
Companion Notebooks
| Notebook | Description |
|---|---|
| theory.ipynb | Executable sinusoidal, learned, relative-bias, RoPE, ALiBi, scaling, and decode-position demonstrations. |
| exercises.ipynb | Ten checked exercises for positional encoding mechanics and long-context diagnostics. |
Learning Objectives
After completing this section, you will be able to:
- Explain why self-attention needs explicit position information.
- Distinguish absolute, relative, additive, rotary, and bias-based position schemes.
- Compute sinusoidal positional encodings for small positions and dimensions.
- Explain the advantages and limits of learned absolute position rows.
- Build relative attention bias matrices from offsets.
- Apply RoPE rotations and verify norm preservation.
- Explain the relative dot-product property of RoPE.
- Build ALiBi distance-bias matrices with head-specific slopes.
- Diagnose long-context and KV-cache position-id bugs.
- Choose a position scheme based on extrapolation, cost, and architecture constraints.
Study Flow
- Read the pages in order and pause after each page to restate the main definition or theorem.
- Run
theory.ipynbwhen you want to check the formulas numerically. - Use
exercises.ipynbafter the reading path, not before it. - Return to this overview page when you need the chapter-level navigation.