Notes - Math for LLMs Tutorial

Notes

2 min read6 headings7 reading parts

"An online experiment is a randomized proof attempt about the real world."

Overview

Online experiments connect offline model evidence to causal user and system impact through randomized comparison, statistical inference, and trust checks.

The chapter treats evaluation as a mathematical object: a controlled protocol that maps model behavior into evidence with uncertainty. A result is not just a number; it is an estimate produced by a task distribution, a scoring rule, and an aggregation rule.

This section is written in LaTeX Markdown. Inline mathematics uses $...$ , and display equations use `

...

`. The emphasis is practical but rigorous: every metric should be linked to the statistical assumptions that make it meaningful.

Prerequisites

Companion Notebooks

Notebook	Description
theory.ipynb	Executable demonstrations for online experimentation and ab testing
exercises.ipynb	Graded practice for online experimentation and ab testing

Learning Objectives

After completing this section, you will be able to:

Define the core objects used in online experimentation and ab testing
Write empirical estimators for model scores and risks
Attach uncertainty statements to finite evaluation results
Separate protocol design from metric computation
Identify common leakage, sampling, and aggregation failures
Use synthetic experiments to test statistical intuitions
Connect offline metrics to downstream AI decisions
Explain where this section ends and where adjacent chapters begin
Design exercises and notebooks that make the math executable
Read modern LLM evaluation papers with sharper statistical judgment

Study Flow

Read the pages in order and pause after each page to restate the main definition or theorem.
Run theory.ipynb when you want to check the formulas numerically.
Use exercises.ipynb after the reading path, not before it.
Return to this overview page when you need the chapter-level navigation.

Online Experimentation and AB Testing

Overview

Prerequisites

Companion Notebooks

Learning Objectives

Study Flow

Runnable Companions

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?