Notes - Math for LLMs Tutorial

Notes

2 min read6 headings3 reading parts

"Attention can compare tokens; positional encoding tells it where those tokens live in the sequence."

Overview

A transformer without positional information is largely order-agnostic: it can compare token content, but it has no built-in reason to know which token came first. Positional encodings inject order through added vectors, relative biases, rotations, or score penalties.

Modern LLMs use several families of position methods. Sinusoidal and learned absolute encodings add position information to hidden states. Relative position methods modify attention scores. RoPE rotates queries and keys. ALiBi adds linear distance biases. These choices affect extrapolation, long-context behavior, KV-cache decoding, and attention diagnostics.

This section uses LaTeX Markdown with $...$ and `

...

`. The notebooks implement each scheme in small NumPy examples and validate the invariants learners should remember.

Prerequisites

Companion Notebooks

Notebook	Description
theory.ipynb	Executable sinusoidal, learned, relative-bias, RoPE, ALiBi, scaling, and decode-position demonstrations.
exercises.ipynb	Ten checked exercises for positional encoding mechanics and long-context diagnostics.

Learning Objectives

After completing this section, you will be able to:

Explain why self-attention needs explicit position information.
Distinguish absolute, relative, additive, rotary, and bias-based position schemes.
Compute sinusoidal positional encodings for small positions and dimensions.
Explain the advantages and limits of learned absolute position rows.
Build relative attention bias matrices from offsets.
Apply RoPE rotations and verify norm preservation.
Explain the relative dot-product property of RoPE.
Build ALiBi distance-bias matrices with head-specific slopes.
Diagnose long-context and KV-cache position-id bugs.
Choose a position scheme based on extrapolation, cost, and architecture constraints.

Study Flow

Read the pages in order and pause after each page to restate the main definition or theorem.
Run theory.ipynb when you want to check the formulas numerically.
Use exercises.ipynb after the reading path, not before it.
Return to this overview page when you need the chapter-level navigation.

Positional Encodings

Overview

Prerequisites

Companion Notebooks

Learning Objectives

Study Flow

Runnable Companions

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?