A precise mapping from mathematical concepts to their concrete roles in modern
ML systems. Each entry cites the specific model, paper, or algorithm where the
mathematics is load-bearing — not merely present.
How to Read This Document
Each section identifies a mathematical domain, lists its core concepts, and maps
each concept to: (a) the ML context, (b) the specific operation or formula, and
(c) a concrete 2024–2026 example. This is a research-grade reference, not a survey.
1. Linear Algebra → ML
1.1 Matrix Multiplication
Math concept
ML operation
Formula
Example
y=Wx+b
Fully-connected layer
Forward pass
Every dense layer in every neural network
Attention=softmax(QK⊤/dk)V
Self-attention
O(n2d)
Transformers: GPT-4, LLaMA-3, Gemini
H(l+1)=σ(A^H(l)W(l))
Graph convolution
Message passing
GCN (Kipf & Welling, 2017)
1.2 Eigendecomposition
Concept
ML role
Where
Av=λv
Stability of RNN hidden state
ρ(Wh)<1 prevents exploding gradients
Top eigenvector of X⊤X
First principal component
PCA for dimensionality reduction
Eigenvalues of graph Laplacian L
Spectral graph convolution
ChebNet, spectral GNNs
Hessian spectrum ∇2L
Loss landscape sharpness
Sharpness-aware minimisation (Foret et al., 2021)
Neural Tangent Kernel eigenvalues
Training speed of wide networks
NTK theory (Jacot et al., 2018)
1.3 SVD
Concept
ML role
Where
A=UΣV⊤
Low-rank weight decomposition
LoRA (Hu et al., 2022): ΔW=BA
Eckart-Young theorem
Optimal rank-k approximation
Matrix factorisation for recommenders
Pseudoinverse A†
Least-squares solution
Normal equations; ridge regression
Singular value spectrum
Weight matrix health
WeightWatcher (Martin & Mahoney, 2021)
Randomised SVD
Scalable approximation
Halko, Martinsson & Tropp (2011)
1.4 Norms and Regularisation
Norm
Regulariser
Effect
Used in
∥θ∥22
L2 / weight decay
Penalises large weights
AdamW, all modern LLMs
∥θ∥1
L1 / Lasso
Induces sparsity
Sparse fine-tuning
∥W∥2=σmax(W)
Spectral normalisation
Lipschitz constraint
GANs (Miyato et al., 2018)
∥A∥∗ (nuclear)
Nuclear norm
Low-rank inductive bias
Matrix completion
2. Calculus → ML
2.1 Chain Rule = Backpropagation
The chain rule is not merely used in backpropagation — it is backpropagation.
This table maps each repository chapter to the specific models and papers
that use it as load-bearing mathematics.
Chapter
Core math
Primary models / papers
02 Linear Algebra Basics
Matrix ops, rank, projections
Every neural network
03 Advanced Linear Algebra
SVD, eigenvalues
LoRA, PCA, WeightWatcher
04 Calculus Fundamentals
Derivatives, chain rule
Backpropagation (Rumelhart et al., 1986)
05 Multivariate Calculus
Jacobian, Hessian
Adam, K-FAC, SAM
06 Probability Theory
Distributions, Bayes
VAE, DDPM, Bayesian deep learning
07 Statistics
MLE, MAP, hypothesis tests
Training objectives, model selection
08 Optimisation
Convexity, GD, constraints
All training algorithms
09 Information Theory
Entropy, KL, MI
Cross-entropy loss, RLHF, contrastive learning
10 Numerical Methods
Condition number, stability
Mixed precision, numerical autograd
11 Graph Theory
Laplacian, random walks
GCN, GAT, Node2Vec
12 Functional Analysis
Hilbert spaces, RKHS
SVMs, kernel methods, NTK theory
13 ML-Specific Math
Attention math, normalisation
Transformers (Vaswani et al., 2017)
14 Math for Specific Models
RNN/LSTM, CNN, GAN
Sequence models, generative models
This map is updated with each new section added to the curriculum.
For the definitive reference on mathematics for ML, see:
Goodfellow, Bengio & Courville (2016); Bishop (2006); Shalev-Shwartz & Ben-David (2014).