Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Notes
2 min read6 headings3 reading parts

"Attention is differentiable retrieval over the current context."

Overview

Attention is the mechanism that lets each token representation read from other token representations. It creates query, key, and value vectors, computes compatibility scores, masks illegal positions, normalizes with softmax, and mixes value vectors into context-aware outputs.

For LLMs, attention is the central sequence-mixing operation. It powers in-context learning, copying, retrieval use, chat-template boundaries, and long-context behavior. It is also one of the main systems bottlenecks because score matrices grow quadratically in sequence length.

This section uses LaTeX Markdown with $...$ and `

......

`. The companion notebooks implement stable softmax attention, causal masks, multi-head shapes, entropy diagnostics, KV-cache sizing, and efficient-attention intuition in small NumPy examples.

Prerequisites

Companion Notebooks

NotebookDescription
theory.ipynbExecutable demonstrations of Q/K/V attention, masks, entropy, heads, KV cache, ALiBi-style bias, and FlashAttention intuition.
exercises.ipynbTen checked exercises covering attention mechanics and LLM serving consequences.

Learning Objectives

After completing this section, you will be able to:

  • Define queries, keys, values, attention scores, masks, weights, and outputs.
  • Compute scaled dot-product attention by hand for a small sequence.
  • Apply causal and padding masks before softmax.
  • Explain why attention rows are probability-like but not full explanations.
  • Track multi-head tensor shapes through split, attention, concat, and output projection.
  • Compute attention entropy and interpret sharp or diffuse rows.
  • Explain KV cache memory and the prefill/decode distinction.
  • Describe why FlashAttention is exact attention with a more efficient memory algorithm.
  • Diagnose mask leakage and padding bugs.
  • Connect attention math to in-context learning, RAG, long context, and safety boundaries.

Study Flow

  1. Read the pages in order and pause after each page to restate the main definition or theorem.
  2. Run theory.ipynb when you want to check the formulas numerically.
  3. Use exercises.ipynb after the reading path, not before it.
  4. Return to this overview page when you need the chapter-level navigation.

Runnable Companions

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue