Notes - Math for LLMs Tutorial

Notes

2 min read6 headings3 reading parts

"Attention is differentiable retrieval over the current context."

Overview

Attention is the mechanism that lets each token representation read from other token representations. It creates query, key, and value vectors, computes compatibility scores, masks illegal positions, normalizes with softmax, and mixes value vectors into context-aware outputs.

For LLMs, attention is the central sequence-mixing operation. It powers in-context learning, copying, retrieval use, chat-template boundaries, and long-context behavior. It is also one of the main systems bottlenecks because score matrices grow quadratically in sequence length.

This section uses LaTeX Markdown with $...$ and `

...

`. The companion notebooks implement stable softmax attention, causal masks, multi-head shapes, entropy diagnostics, KV-cache sizing, and efficient-attention intuition in small NumPy examples.

Prerequisites

Companion Notebooks

Notebook	Description
theory.ipynb	Executable demonstrations of Q/K/V attention, masks, entropy, heads, KV cache, ALiBi-style bias, and FlashAttention intuition.
exercises.ipynb	Ten checked exercises covering attention mechanics and LLM serving consequences.

Learning Objectives

After completing this section, you will be able to:

Define queries, keys, values, attention scores, masks, weights, and outputs.
Compute scaled dot-product attention by hand for a small sequence.
Apply causal and padding masks before softmax.
Explain why attention rows are probability-like but not full explanations.
Track multi-head tensor shapes through split, attention, concat, and output projection.
Compute attention entropy and interpret sharp or diffuse rows.
Explain KV cache memory and the prefill/decode distinction.
Describe why FlashAttention is exact attention with a more efficient memory algorithm.
Diagnose mask leakage and padding bugs.
Connect attention math to in-context learning, RAG, long context, and safety boundaries.

Study Flow

Read the pages in order and pause after each page to restate the main definition or theorem.
Run theory.ipynb when you want to check the formulas numerically.
Use exercises.ipynb after the reading path, not before it.
Return to this overview page when you need the chapter-level navigation.

Attention Mechanism Math

Overview

Prerequisites

Companion Notebooks

Learning Objectives

Study Flow

Runnable Companions

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?