Reading Track

Math for LLMs

Mathematics for AI, ML, and large language models

26Modules
161Lessons
Course Guide

Mathematics for AI / ML / LLM

The Complete Math Foundation You Need to Master AI

GitHub stars GitHub forks License: MIT Notebooks Chapters PRs Welcome

A structured, open-source curriculum covering 25 domains of mathematics essential for AI, Machine Learning, and Large Language Models — from foundational concepts to cutting-edge research.

Get Started · Roadmap · Chapters · Resources · Contributing



Theory notebooks Exercise notebooks Notes files

Why This Repo?

Most AI/ML learners hit the math wall — papers full of symbols that feel alien, optimization steps that seem like magic, and model architectures that assume deep mathematical fluency.

This repository bridges that gap with a learn-by-doing approach:

  • Structured path from high school math to research-level topics
  • Notes → Theory → Exercises flow for every topic
  • Interactive Jupyter notebooks with visualizations, not just formulas
  • Real ML connections — every concept links to practical AI applications
  • Self-contained — no prerequisites beyond basic algebra

"The math you need depends on what you're building — this repo helps you find exactly that."


🚀 Quick Start

# Clone the repository
git clone https://github.com/prmtkr/math_for_llms.git
cd math_for_ai

# Set up the environment
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter
jupyter lab

Each topic follows a 3-step learning flow:

📖 notes.md          → Read the concepts and intuition
🔬 theory.ipynb      → Explore interactive demonstrations
✏️ exercises.ipynb   → Test your understanding

🗺️ Learning Roadmap

The curriculum covers 25 domains organized in 8 phases. Each phase builds on the previous one. Topics marked with ★ are critical for ML/AI.

  START HERE
      │
      ▼
 ┌─────────┐     ┌─────────────┐     ┌──────────┐     ┌──────────────┐
 │ Phase 1  │ ──▶ │   Phase 2   │ ──▶ │ Phase 3  │ ──▶ │   Phase 4    │
 │ Core     │     │ Probability │     │ Learning │     │ Deeper       │
 │ Math     │     │ & Stats     │     │ Engines  │     │ Theory       │
 └─────────┘     └─────────────┘     └──────────┘     └──────────────┘
                                                              │
      ┌───────────────────────────────────────────────────────┘
      ▼
 ┌──────────┐     ┌──────────┐     ┌──────────────┐     ┌──────────────┐
 │ Phase 5  │ ──▶ │ Phase 6  │ ──▶ │   Phase 7    │ ──▶ │   Phase 8    │
 │ ML Math  │     │ LLM Math │     │ Production   │     │ Research     │
 │          │     │          │     │ & Safety     │     │ Frontiers    │
 └──────────┘     └──────────┘     └──────────────┘     └──────────────┘

Phase 1 — Core Math Foundations Ch. 01 → 02 → 04 → 05

The mathematical language everything else is built on.

① MATHEMATICAL FOUNDATIONS (Ch.01)
├── Number Systems (ℕ ℤ ℚ ℝ ℂ)
├── Sets & Logic
├── Functions & Mappings
├── Σ Summation & Product Notation
├── Einstein Summation & Index Notation
└── Proof Techniques (induction, contradiction, direct)
        │
        ├──────────────────────────────────────────┐
        ▼                                          ▼
② LINEAR ALGEBRA (Ch.02 + Ch.03)            ③ CALCULUS (Ch.04 + Ch.05)
├── Vectors & Spaces                        ├── Limits & Continuity
├── Matrix Operations                       ├── Derivatives & Differentiation
├── Systems of Equations                    ├── Integration & Series
├── Determinants & Rank                     ├── Partial Derivatives & Gradients
├── Eigenvalues & Eigenvectors ★            ├── Jacobians & Hessians ★
├── SVD ★                                   ├── Chain Rule → Backpropagation ★
├── PCA                                     ├── Optimality Conditions
├── Orthogonality & Norms                   └── Automatic Differentiation ★
├── Positive Definite Matrices
└── Matrix Decompositions (LU/QR/Cholesky)

Phase 2 — Probabilistic Thinking Ch. 06 → 07

How to reason about uncertainty — the foundation of all ML inference.

④ PROBABILITY & STATISTICS (Ch.06 + Ch.07)
├── Random Variables & Distributions
├── Joint Distributions
├── Expectation & Moments
├── Concentration Inequalities
├── Stochastic Processes
├── Markov Chains ★
├── Descriptive Statistics
├── Estimation Theory & MLE
├── Bayesian Inference ★
├── Hypothesis Testing
├── Time Series
└── Regression Analysis

Phase 3 — Making Models Learn Ch. 08 → 09

The algorithms that train every model and the theory behind loss functions.

⑤ OPTIMIZATION (Ch.08)                     ⑥ INFORMATION THEORY (Ch.09)
├── Convex Optimization ★                  ├── Entropy (Shannon) ★
├── Gradient Descent (SGD/Mini-batch) ★    ├── KL Divergence ★
├── Second-Order Methods (Newton/BFGS)     ├── Mutual Information ★
├── Constrained Optimization (KKT)         ├── Cross-Entropy ★
├── Stochastic Optimization ★              └── Fisher Information
├── Optimization Landscape
├── Adaptive LR (Adam / RMSProp) ★
├── Regularization (L1/L2/Dropout)
├── Hyperparameter Optimization
└── Learning Rate Schedules

Phase 4 — Deeper Theory Ch. 03 → 10 → 11 → 12

Specialized math that powers specific ML architectures.

⑦ NUMERICAL METHODS (Ch.10)                ⑧ GRAPH THEORY (Ch.11)
├── Floating-Point Arithmetic              ├── Graph Basics & Representations
├── Numerical Linear Algebra               ├── Graph Algorithms
├── Numerical Optimization                 ├── Spectral Graph Theory ★
├── Interpolation & Approximation          ├── Graph Neural Networks ★
└── Numerical Integration                  └── Random Graphs

⑨ FUNCTIONAL ANALYSIS (Ch.12)
├── Normed Spaces
├── Hilbert Spaces ★
└── Kernel Methods (SVM / GP) ★

Phase 5 — ML Math in Practice Ch. 13 → 14

The math that directly appears inside ML models.

                        ╔═══════════════════╗
                        ║  ML-SPECIFIC MATH  ║
                        ╚═══════════════════╝
                                 │
          ┌──────────────────────┼──────────────────────┐
          ▼                      ▼                      ▼
⑩ ML MATH CORE (Ch.13)   ⑪ DEEP LEARNING (Ch.14)   ⑫ RL (Ch.14)
├── Loss Functions ★      ├── Neural Net Math ★      ├── MDP (State/Action)
├── Activation Fns ★      ├── CNN & Convolution ★    ├── Bellman Equations ★
├── Normalization ★       ├── RNN & LSTM Math ★      ├── Policy Gradient ★
└── Sampling Methods      ├── Transformer ★          ├── Value Functions ★
                          ├── Generative (VAE/GAN)   └── Actor-Critic
                          └── Probabilistic Models

Phase 6 — LLM Math Ch. 15 → 16

Everything that makes Large Language Models work under the hood.

                         ╔═════════════════╗
                         ║  MATH FOR LLMs   ║
                         ╚═════════════════╝
                                  │
       ┌──────────────────────────┼──────────────────────────┐
       ▼                          ▼                          ▼
⑬ ATTENTION & ARCH         ⑭ TRAINING AT SCALE       ⑮ DATA PIPELINE
   (Ch.15)                    (Ch.15)                    (Ch.16)
├── Tokenization            ├── Scaling Laws ★         ├── Data Format Standards
├── Embedding Space         ├── Training at Scale      ├── JSONL Generation
├── Attention Mech ★        ├── Efficient Attention    ├── Quality Checks
├── Positional Enc ★        ├── MoE & Routing          ├── Dataset Assembly
└── LM Probability ★       ├── Quantization           ├── Contamination & Dedup
                            ├── Distillation           ├── Documentation
                            └── RAG & Retrieval        └── Data Mixture Optimization

Phase 7 — Production & Safety Ch. 17 → 18 → 19

Ship models responsibly — evaluate, align, and monitor.

⑯ EVALUATION (Ch.17)           ⑰ ALIGNMENT & SAFETY (Ch.18)     ⑱ PRODUCTION (Ch.19)
├── Capability Benchmarks      ├── SFT Math ★                    ├── Data Versioning & Lineage
├── Calibration & Uncertainty  ├── RLHF Math ★                   ├── Experiment Tracking
├── Robustness &               ├── DPO / Preference Opt ★        ├── Feature Stores &
│   Distribution Shift         ├── Policy & Guardrails           │   Data Contracts
├── Error Analysis &           └── Human-in-the-Loop             ├── Model Serving &
│   Ablations                      & Monitoring                  │   Inference Optimization
└── A/B Testing &                                                ├── Monitoring, Drift
    Experimentation                                              │   & Retraining
                                                                 └── LLM Observability
                                                                     & Guardrails

Phase 8 — Research Frontiers Ch. 20 → 21 → 22 → 23 → 24 → 25

Advanced theory for research-level understanding.

⑲ FOURIER & SIGNALS (Ch.20)    ⑳ STATISTICAL LEARNING (Ch.21)   ㉑ CAUSAL INFERENCE (Ch.22)
├── Fourier Series              ├── PAC Learning                  ├── Structural Causal Models
├── Fourier Transform           ├── VC Dimension                  ├── Do-Calculus
├── DFT & FFT                  ├── Bias-Variance Tradeoff        ├── Counterfactuals
├── Convolution Theorem         ├── Generalization Bounds         └── Causal Discovery
└── Wavelets                    └── Rademacher Complexity

㉒ GAME THEORY (Ch.23)          ㉓ MEASURE THEORY (Ch.24)         ㉔ DIFF. GEOMETRY (Ch.25)
├── Nash Equilibria             ├── Sigma-Algebras                ├── Manifolds
├── Minimax Theorem             ├── Lebesgue Integration          ├── Riemannian Geometry
├── Multi-Agent Systems         ├── Probability Measure Spaces    ├── Geodesics
└── Adversarial Game Theory     └── Radon-Nikodym Theorem         └── Optimization on Manifolds
╔═══════════════════════════════════════════════════════════════╗
║                                                               ║
║        ★  YOU ARE NOW A MATH-FOR-AI WIZARD  ★                 ║
║                                                               ║
║        ★ = Critical for ML/AI — prioritize these topics       ║
║                                                               ║
╚═══════════════════════════════════════════════════════════════╝

Quick Reference — Learning Order

PhaseChaptersFocus
Phase 1 — Core Foundations01 → 02 → 04 → 05Numbers, vectors, matrices, derivatives, gradients
Phase 2 — Probabilistic Thinking06 → 07Random variables, distributions, estimation, inference
Phase 3 — Making Models Learn08 → 09Optimization algorithms, information-theoretic losses
Phase 4 — Deeper Theory03 → 10 → 11 → 12Advanced linear algebra, numerical methods, graphs, kernels
Phase 5 — ML Math in Practice13 → 14Loss functions, activations, architecture-specific math
Phase 6 — LLM Math15 → 16Attention, embeddings, scaling laws, training pipelines
Phase 7 — Production & Safety17 → 18 → 19Evaluation, alignment (RLHF/DPO), MLOps
Phase 8 — Research Frontiers20 → 21 → 22 → 23 → 24 → 25Fourier analysis, learning theory, causality, geometry

📚 Chapters

Core Mathematics

01 · Mathematical Foundations — Number systems, sets, logic, proofs
TopicDescription
Number SystemsNatural, integer, rational, real, and complex numbers (N Z Q R C)
Sets & LogicSet operations, propositional logic, quantifiers
Functions & MappingsDomain, range, injectivity, surjectivity, composition
Summation & Product NotationSigma/Pi notation, index manipulation
Einstein SummationIndex notation used in tensor operations
Proof TechniquesInduction, contradiction, direct proof, contrapositive
02 · Linear Algebra Basics — Vectors, matrices, systems of equations
TopicML Connection
Vectors & SpacesFeature representations, embeddings
Matrix OperationsForward propagation, transformations
Systems of EquationsLinear regression (normal equations)
DeterminantsChange of variables in normalizing flows
Matrix RankModel capacity, low-rank approximations
Vector Spaces & SubspacesDimensionality, feature spaces
03 · Advanced Linear Algebra — Eigen decomposition, SVD, PCA
TopicML Connection
Eigenvalues & EigenvectorsPCA, spectral clustering, stability analysis
Singular Value DecompositionRecommender systems, dimensionality reduction
Principal Component AnalysisFeature extraction, data compression
Linear TransformationsNeural network layers as transforms
Orthogonality & OrthonormalityGram-Schmidt, decorrelated features
Matrix NormsRegularization, operator bounds
Positive Definite MatricesCovariance matrices, kernel validity
Matrix DecompositionsLU, QR, Cholesky — efficient solvers
04 · Calculus Fundamentals — Limits, derivatives, integrals, series
TopicML Connection
Limits & ContinuityConvergence guarantees, activation smoothness
Derivatives & DifferentiationGradient computation for all parameters
IntegrationProbability densities, normalization constants
Series & SequencesTaylor approximations, convergence analysis
05 · Multivariate Calculus — Gradients, Jacobians, backpropagation
TopicML Connection
Partial Derivatives & GradientsDirection of steepest descent
Jacobians & HessiansMulti-output functions, second-order methods
Chain Rule & BackpropagationTraining every neural network
Optimality ConditionsConvergence criteria, saddle points
Automatic DifferentiationPyTorch autograd, JAX

Probability, Statistics & Optimization

06 · Probability Theory — Distributions, expectations, stochastic processes
TopicML Connection
Random VariablesOutput uncertainty, stochastic models
Common DistributionsGaussian, Bernoulli, Poisson — model assumptions
Joint DistributionsMulti-variate modeling, copulas
Expectation & MomentsLoss functions, feature statistics
Concentration InequalitiesGeneralization bounds, sample complexity
Stochastic ProcessesTime series, diffusion models
Markov ChainsMCMC sampling, language modeling
07 · Statistics — Estimation, testing, Bayesian inference, regression
TopicML Connection
Descriptive StatisticsEDA, feature engineering
Estimation TheoryMLE, MAP — training as estimation
Hypothesis TestingA/B testing, model comparison
Bayesian InferencePosterior updates, uncertainty quantification
Time SeriesSequence forecasting, temporal patterns
Regression AnalysisBaseline models, diagnostics
08 · Optimization — SGD, Adam, constrained optimization, regularization
TopicML Connection
Convex OptimizationGlobal guarantees, convergence proofs
Gradient DescentThe engine behind all training
Second-Order MethodsNewton, BFGS — faster convergence
Constrained OptimizationLagrange multipliers, KKT conditions
Stochastic OptimizationSGD, mini-batch — scaling to big data
Optimization LandscapeLocal minima, saddle points, loss surfaces
Adaptive Learning RateAdam, RMSProp, AdaGrad
Regularization MethodsL1/L2, Dropout, weight decay
Hyperparameter OptimizationGrid search, Bayesian optimization
Learning Rate SchedulesWarmup, cosine annealing, step decay

Information Theory & Numerical Methods

09 · Information Theory — Entropy, KL divergence, cross-entropy
TopicML Connection
EntropyDecision tree splits, uncertainty measurement
KL DivergenceVAE loss, knowledge distillation
Mutual InformationFeature selection, InfoGAN
Cross-EntropyThe most common classification loss
Fisher InformationEfficient estimation, natural gradient
10 · Numerical Methods — Floating-point, stability, interpolation, integration

📖 Chapter README

TopicML Connection
Floating-Point ArithmeticMixed precision training (FP16/BF16/FP8), loss scaling, Flash Attention numerics
Numerical Linear AlgebraStable solvers, iterative methods (CG/Lanczos), condition number for training
Numerical OptimizationL-BFGS two-loop, Armijo line search, gradient checking, trust-region methods
Interpolation & ApproximationRoPE/sinusoidal PE, KAN B-splines, Runge's phenomenon, FFT, random Fourier features
Numerical IntegrationGaussian quadrature, Monte Carlo variance reduction, reparameterization trick (VAE ELBO)

Specialized Mathematics

11 · Graph Theory — Graph algorithms, spectral methods, GNNs
TopicML Connection
Graph BasicsSocial networks, molecular graphs
Graph RepresentationsAdjacency/Laplacian matrices
Graph AlgorithmsShortest path, centrality, traversal
Spectral Graph TheoryCommunity detection, graph wavelets
Graph Neural NetworksMessage passing, GCN, GAT
Random GraphsErdos-Renyi, network analysis
12 · Functional Analysis — Hilbert spaces, kernel methods
TopicML Connection
Normed SpacesRegularization theory
Hilbert SpacesRKHS, function space learning
Kernel MethodsSVM, Gaussian processes, kernel trick

ML-Specific Mathematics

13 · ML-Specific Math — Loss functions, activations, normalization, sampling
TopicML Connection
Loss FunctionsMSE, cross-entropy, hinge, contrastive
Activation FunctionsReLU, GELU, sigmoid, softmax — and their gradients
Normalization TechniquesBatchNorm, LayerNorm, RMSNorm
Sampling MethodsMCMC, rejection sampling, importance sampling
14 · Math for Specific Models — NNs, CNNs, RNNs, Transformers, GANs, RL
TopicML Connection
Linear ModelsRegression, classification foundations
Neural NetworksUniversal approximation, backprop math
Probabilistic ModelsGMMs, HMMs, variational inference
RNN & LSTM MathVanishing gradients, gating mechanisms
Transformer ArchitectureAttention is all you need — the math
Reinforcement LearningBellman equations, policy gradients
Generative ModelsVAEs, GANs, diffusion models
CNN & Convolution MathConvolution theorem, pooling, receptive fields

LLM Mathematics

15 · Math for LLMs — Attention, embeddings, scaling laws, inference
TopicML Connection
Tokenization MathBPE, WordPiece — information-theoretic foundations
Embedding Space MathGeometric properties of learned representations
Attention Mechanism MathScaled dot-product, multi-head, causal masking
Positional EncodingsSinusoidal, RoPE, ALiBi
Language Model ProbabilityNext-token prediction, perplexity
Training at ScaleDistributed training, gradient accumulation
Fine-Tuning MathLoRA, adapters, parameter-efficient methods
Scaling LawsChinchilla, compute-optimal training
Efficient Attention & InferenceFlashAttention, KV-cache, speculative decoding
Mixture of Experts & RoutingSparse gating, load balancing
Quantization & DistillationINT8/INT4, knowledge distillation
RAG Math & RetrievalRetrieval-augmented generation
Serving & Systems TradeoffsLatency, throughput, batching strategies
16 · LLM Training Data Pipeline — Data quality, deduplication, mixture optimization
TopicDescription
Data Format StandardsJSONL, tokenized formats, schema validation
JSONL GenerationEfficient serialization for training
Quality ChecksFiltering, decontamination, toxicity
Full Dataset AssemblyCombining and balancing data sources
Contamination & Dedup AuditsPreventing benchmark leakage
Documentation & GovernanceData cards, provenance tracking
Data Mixture OptimizationOptimal domain ratios for training

Evaluation, Safety & Production

17 · Evaluation & Reliability — Benchmarks, calibration, A/B testing
TopicDescription
Capability BenchmarksMMLU, HumanEval, evaluation methodology
Calibration & UncertaintyConfidence vs. accuracy alignment
Robustness & Distribution ShiftOut-of-distribution detection
Error Analysis & AblationsSystematic debugging
Online Experimentation & A/B TestingStatistical rigor in deployment
18 · Alignment & Safety — SFT, RLHF, DPO, red-teaming
TopicDescription
Instruction Tuning & SFTSupervised fine-tuning mathematics
Preference Optimization (RLHF & DPO)Reward modeling, Bradley-Terry, DPO objective
Red-Teaming & Safety EvaluationsAdversarial robustness testing
Policy & GuardrailsConstitutional AI, rule-based filtering
Human-in-the-Loop & MonitoringActive learning, feedback loops
19 · Production ML & MLOps — Serving, monitoring, drift detection
TopicDescription
Data Versioning & LineageReproducibility at scale
Experiment TrackingMLflow, W&B — systematic experimentation
Feature Stores & Data ContractsConsistent feature engineering
Model Serving & Inference OptimizationLatency, batching, hardware
Monitoring, Drift & RetrainingDetecting degradation
LLM Evaluation, Observability & GuardrailsLLM-specific ops

Advanced Theory

20 · Fourier Analysis & Signal Processing — FFT, wavelets, convolution theorem
TopicML Connection
Fourier SeriesPeriodic signal decomposition
Fourier TransformFrequency domain analysis
DFT & FFTEfficient spectral computation
Convolution TheoremCNNs in frequency domain
WaveletsMulti-resolution analysis, time-frequency
21 · Statistical Learning Theory — PAC learning, VC dimension, generalization
TopicML Connection
PAC LearningLearnability guarantees
VC DimensionModel complexity measurement
Bias-Variance TradeoffThe fundamental modeling tension
Generalization BoundsWhy models work on unseen data
Rademacher ComplexityData-dependent complexity measures
22 · Causal Inference — SCMs, do-calculus, counterfactuals
TopicML Connection
Structural Causal ModelsBeyond correlation
Do-CalculusInterventional reasoning
Counterfactuals"What if" reasoning
Causal DiscoveryLearning causal structure from data
23 · Game Theory — Nash equilibria, minimax, adversarial methods
TopicML Connection
Nash EquilibriaGAN training dynamics
Minimax TheoremAdversarial robustness
Multi-Agent SystemsCooperative/competitive learning
Adversarial Game TheorySecurity and robustness
24 · Measure Theory — Sigma-algebras, Lebesgue integration, probability spaces
TopicML Connection
Sigma-AlgebrasRigorous probability foundations
Lebesgue IntegrationExpectation in continuous spaces
Probability Measure SpacesFormal probability theory
Radon-Nikodym TheoremDensity ratios, importance sampling
25 · Differential Geometry — Manifolds, Riemannian geometry, geodesics
TopicML Connection
ManifoldsData lies on low-dimensional manifolds
Riemannian GeometryNatural gradient, information geometry
GeodesicsShortest paths in curved spaces
Optimization on ManifoldsConstrained optimization on curved surfaces

📊 What's Inside Each Topic

Every topic folder follows a consistent structure:

📂 02-Linear-Algebra-Basics/
├── 📂 01-Vectors-and-Spaces/
│   ├── 📖 notes.md           ← Concepts, intuition, key formulas
│   ├── 🔬 theory.ipynb       ← Interactive demos with visualizations
│   └── ✏️ exercises.ipynb    ← Practice problems with solutions
├── 📂 02-Matrix-Operations/
│   ├── 📖 notes.md
│   ├── 🔬 theory.ipynb
│   └── ✏️ exercises.ipynb
└── ...

📖 Resources

The docs/ folder contains supplementary references:

DocumentDescription
ML Math MapVisual guide — which math is used where in ML
Notation GuideConsistent notation conventions across the repo
CheatsheetQuick-reference formula sheet
Interview PrepCommon ML math interview questions with solutions
Visualization GuideTips for building mathematical intuition visually

🛠️ Tech Stack

ToolPurpose
Python 3.8+Primary language
NumPy / SciPyNumerical computing
Matplotlib / Seaborn / PlotlyVisualizations
SymPySymbolic mathematics
Jupyter LabInteractive notebooks
scikit-learnML examples and demos

🤝 Contributing

37 sections still need implementation — this is the primary way to contribute.

Implement a Missing Section (highest impact)

  1. Browse open issues labelled section: missing
  2. Comment on the issue to claim the section
  3. Fork the repo and create a branch: git checkout -b section/22-causal-inference/02-do-calculus
  4. Implement all three files following CONTRIBUTING.md:
    • notes.md (2000+ lines)
    • theory.ipynb (50+ cells, built via Python builder script)
    • exercises.ipynb (8+ graded exercises)
  5. Open a Pull Request — link it to the issue

Chapters open for contribution

ChapterSections neededIssues
20 Fourier Analysis5Browse
21 Statistical Learning Theory5Browse
22 Causal Inference4Browse
23 Game Theory4Browse
24 Measure Theory4Browse
25 Differential Geometry4Browse

Other ways to help

  • Fix errors — typo or incorrect formula? Open a PR directly
  • Improve exercises — add harder problems or better test cases
  • Add visualizations — interactive plots for existing sections

⭐ Star History

If this repo helped you, consider giving it a star — it helps others find it too.


📄 License

This project is open source and available under the MIT License.


Built for learners, researchers, and engineers who believe understanding the math makes you a better AI practitioner.


"In God we trust. All others must bring data." — W. Edwards Deming


Back to Top