Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Notes
2 min read6 headings3 reading parts

"Attention can compare tokens; positional encoding tells it where those tokens live in the sequence."

Overview

A transformer without positional information is largely order-agnostic: it can compare token content, but it has no built-in reason to know which token came first. Positional encodings inject order through added vectors, relative biases, rotations, or score penalties.

Modern LLMs use several families of position methods. Sinusoidal and learned absolute encodings add position information to hidden states. Relative position methods modify attention scores. RoPE rotates queries and keys. ALiBi adds linear distance biases. These choices affect extrapolation, long-context behavior, KV-cache decoding, and attention diagnostics.

This section uses LaTeX Markdown with $...$ and `

......

`. The notebooks implement each scheme in small NumPy examples and validate the invariants learners should remember.

Prerequisites

Companion Notebooks

NotebookDescription
theory.ipynbExecutable sinusoidal, learned, relative-bias, RoPE, ALiBi, scaling, and decode-position demonstrations.
exercises.ipynbTen checked exercises for positional encoding mechanics and long-context diagnostics.

Learning Objectives

After completing this section, you will be able to:

  • Explain why self-attention needs explicit position information.
  • Distinguish absolute, relative, additive, rotary, and bias-based position schemes.
  • Compute sinusoidal positional encodings for small positions and dimensions.
  • Explain the advantages and limits of learned absolute position rows.
  • Build relative attention bias matrices from offsets.
  • Apply RoPE rotations and verify norm preservation.
  • Explain the relative dot-product property of RoPE.
  • Build ALiBi distance-bias matrices with head-specific slopes.
  • Diagnose long-context and KV-cache position-id bugs.
  • Choose a position scheme based on extrapolation, cost, and architecture constraints.

Study Flow

  1. Read the pages in order and pause after each page to restate the main definition or theorem.
  2. Run theory.ipynb when you want to check the formulas numerically.
  3. Use exercises.ipynb after the reading path, not before it.
  4. Return to this overview page when you need the chapter-level navigation.

Runnable Companions

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue