∑

Reading Track

Math for LLMs

Mathematics for AI, ML, and large language models

26Modules

161Lessons

Course Map

Course Modules

Start anywhere, keep reading in order.

Course Guide

Mathematics for AI / ML / LLM

The Complete Math Foundation You Need to Master AI

$GitHub stars$ $GitHub forks$

A structured, open-source curriculum covering 25 domains of mathematics essential for AI, Machine Learning, and Large Language Models — from foundational concepts to cutting-edge research.

Get Started · Roadmap · Chapters · Resources · Contributing

Why This Repo?

Most AI/ML learners hit the math wall — papers full of symbols that feel alien, optimization steps that seem like magic, and model architectures that assume deep mathematical fluency.

This repository bridges that gap with a learn-by-doing approach:

Structured path from high school math to research-level topics
Notes → Theory → Exercises flow for every topic
Interactive Jupyter notebooks with visualizations, not just formulas
Real ML connections — every concept links to practical AI applications
Self-contained — no prerequisites beyond basic algebra

"The math you need depends on what you're building — this repo helps you find exactly that."

🚀 Quick Start

# Clone the repository
git clone https://github.com/prmtkr/math_for_llms.git
cd math_for_ai

# Set up the environment
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter
jupyter lab

Each topic follows a 3-step learning flow:

📖 notes.md          → Read the concepts and intuition
🔬 theory.ipynb      → Explore interactive demonstrations
✏️ exercises.ipynb   → Test your understanding

🗺️ Learning Roadmap

The curriculum covers 25 domains organized in 8 phases. Each phase builds on the previous one. Topics marked with ★ are critical for ML/AI.

  START HERE
      │
      ▼
 ┌─────────┐     ┌─────────────┐     ┌──────────┐     ┌──────────────┐
 │ Phase 1  │ ──▶ │   Phase 2   │ ──▶ │ Phase 3  │ ──▶ │   Phase 4    │
 │ Core     │     │ Probability │     │ Learning │     │ Deeper       │
 │ Math     │     │ & Stats     │     │ Engines  │     │ Theory       │
 └─────────┘     └─────────────┘     └──────────┘     └──────────────┘
                                                              │
      ┌───────────────────────────────────────────────────────┘
      ▼
 ┌──────────┐     ┌──────────┐     ┌──────────────┐     ┌──────────────┐
 │ Phase 5  │ ──▶ │ Phase 6  │ ──▶ │   Phase 7    │ ──▶ │   Phase 8    │
 │ ML Math  │     │ LLM Math │     │ Production   │     │ Research     │
 │          │     │          │     │ & Safety     │     │ Frontiers    │
 └──────────┘     └──────────┘     └──────────────┘     └──────────────┘

Phase 1 — Core Math Foundations `Ch. 01 → 02 → 04 → 05`

The mathematical language everything else is built on.

① MATHEMATICAL FOUNDATIONS (Ch.01)
├── Number Systems (ℕ ℤ ℚ ℝ ℂ)
├── Sets & Logic
├── Functions & Mappings
├── Σ Summation & Product Notation
├── Einstein Summation & Index Notation
└── Proof Techniques (induction, contradiction, direct)
        │
        ├──────────────────────────────────────────┐
        ▼                                          ▼
② LINEAR ALGEBRA (Ch.02 + Ch.03)            ③ CALCULUS (Ch.04 + Ch.05)
├── Vectors & Spaces                        ├── Limits & Continuity
├── Matrix Operations                       ├── Derivatives & Differentiation
├── Systems of Equations                    ├── Integration & Series
├── Determinants & Rank                     ├── Partial Derivatives & Gradients
├── Eigenvalues & Eigenvectors ★            ├── Jacobians & Hessians ★
├── SVD ★                                   ├── Chain Rule → Backpropagation ★
├── PCA                                     ├── Optimality Conditions
├── Orthogonality & Norms                   └── Automatic Differentiation ★
├── Positive Definite Matrices
└── Matrix Decompositions (LU/QR/Cholesky)

Phase 2 — Probabilistic Thinking `Ch. 06 → 07`

How to reason about uncertainty — the foundation of all ML inference.

④ PROBABILITY & STATISTICS (Ch.06 + Ch.07)
├── Random Variables & Distributions
├── Joint Distributions
├── Expectation & Moments
├── Concentration Inequalities
├── Stochastic Processes
├── Markov Chains ★
├── Descriptive Statistics
├── Estimation Theory & MLE
├── Bayesian Inference ★
├── Hypothesis Testing
├── Time Series
└── Regression Analysis

Phase 3 — Making Models Learn `Ch. 08 → 09`

The algorithms that train every model and the theory behind loss functions.

⑤ OPTIMIZATION (Ch.08)                     ⑥ INFORMATION THEORY (Ch.09)
├── Convex Optimization ★                  ├── Entropy (Shannon) ★
├── Gradient Descent (SGD/Mini-batch) ★    ├── KL Divergence ★
├── Second-Order Methods (Newton/BFGS)     ├── Mutual Information ★
├── Constrained Optimization (KKT)         ├── Cross-Entropy ★
├── Stochastic Optimization ★              └── Fisher Information
├── Optimization Landscape
├── Adaptive LR (Adam / RMSProp) ★
├── Regularization (L1/L2/Dropout)
├── Hyperparameter Optimization
└── Learning Rate Schedules

Phase 4 — Deeper Theory `Ch. 03 → 10 → 11 → 12`

Specialized math that powers specific ML architectures.

⑦ NUMERICAL METHODS (Ch.10)                ⑧ GRAPH THEORY (Ch.11)
├── Floating-Point Arithmetic              ├── Graph Basics & Representations
├── Numerical Linear Algebra               ├── Graph Algorithms
├── Numerical Optimization                 ├── Spectral Graph Theory ★
├── Interpolation & Approximation          ├── Graph Neural Networks ★
└── Numerical Integration                  └── Random Graphs

⑨ FUNCTIONAL ANALYSIS (Ch.12)
├── Normed Spaces
├── Hilbert Spaces ★
└── Kernel Methods (SVM / GP) ★

Phase 5 — ML Math in Practice `Ch. 13 → 14`

The math that directly appears inside ML models.

                        ╔═══════════════════╗
                        ║  ML-SPECIFIC MATH  ║
                        ╚═══════════════════╝
                                 │
          ┌──────────────────────┼──────────────────────┐
          ▼                      ▼                      ▼
⑩ ML MATH CORE (Ch.13)   ⑪ DEEP LEARNING (Ch.14)   ⑫ RL (Ch.14)
├── Loss Functions ★      ├── Neural Net Math ★      ├── MDP (State/Action)
├── Activation Fns ★      ├── CNN & Convolution ★    ├── Bellman Equations ★
├── Normalization ★       ├── RNN & LSTM Math ★      ├── Policy Gradient ★
└── Sampling Methods      ├── Transformer ★          ├── Value Functions ★
                          ├── Generative (VAE/GAN)   └── Actor-Critic
                          └── Probabilistic Models

Phase 6 — LLM Math `Ch. 15 → 16`

Everything that makes Large Language Models work under the hood.

                         ╔═════════════════╗
                         ║  MATH FOR LLMs   ║
                         ╚═════════════════╝
                                  │
       ┌──────────────────────────┼──────────────────────────┐
       ▼                          ▼                          ▼
⑬ ATTENTION & ARCH         ⑭ TRAINING AT SCALE       ⑮ DATA PIPELINE
   (Ch.15)                    (Ch.15)                    (Ch.16)
├── Tokenization            ├── Scaling Laws ★         ├── Data Format Standards
├── Embedding Space         ├── Training at Scale      ├── JSONL Generation
├── Attention Mech ★        ├── Efficient Attention    ├── Quality Checks
├── Positional Enc ★        ├── MoE & Routing          ├── Dataset Assembly
└── LM Probability ★       ├── Quantization           ├── Contamination & Dedup
                            ├── Distillation           ├── Documentation
                            └── RAG & Retrieval        └── Data Mixture Optimization

Phase 7 — Production & Safety `Ch. 17 → 18 → 19`

Ship models responsibly — evaluate, align, and monitor.

⑯ EVALUATION (Ch.17)           ⑰ ALIGNMENT & SAFETY (Ch.18)     ⑱ PRODUCTION (Ch.19)
├── Capability Benchmarks      ├── SFT Math ★                    ├── Data Versioning & Lineage
├── Calibration & Uncertainty  ├── RLHF Math ★                   ├── Experiment Tracking
├── Robustness &               ├── DPO / Preference Opt ★        ├── Feature Stores &
│   Distribution Shift         ├── Policy & Guardrails           │   Data Contracts
├── Error Analysis &           └── Human-in-the-Loop             ├── Model Serving &
│   Ablations                      & Monitoring                  │   Inference Optimization
└── A/B Testing &                                                ├── Monitoring, Drift
    Experimentation                                              │   & Retraining
                                                                 └── LLM Observability
                                                                     & Guardrails

Phase 8 — Research Frontiers `Ch. 20 → 21 → 22 → 23 → 24 → 25`

Advanced theory for research-level understanding.

⑲ FOURIER & SIGNALS (Ch.20)    ⑳ STATISTICAL LEARNING (Ch.21)   ㉑ CAUSAL INFERENCE (Ch.22)
├── Fourier Series              ├── PAC Learning                  ├── Structural Causal Models
├── Fourier Transform           ├── VC Dimension                  ├── Do-Calculus
├── DFT & FFT                  ├── Bias-Variance Tradeoff        ├── Counterfactuals
├── Convolution Theorem         ├── Generalization Bounds         └── Causal Discovery
└── Wavelets                    └── Rademacher Complexity

㉒ GAME THEORY (Ch.23)          ㉓ MEASURE THEORY (Ch.24)         ㉔ DIFF. GEOMETRY (Ch.25)
├── Nash Equilibria             ├── Sigma-Algebras                ├── Manifolds
├── Minimax Theorem             ├── Lebesgue Integration          ├── Riemannian Geometry
├── Multi-Agent Systems         ├── Probability Measure Spaces    ├── Geodesics
└── Adversarial Game Theory     └── Radon-Nikodym Theorem         └── Optimization on Manifolds

╔═══════════════════════════════════════════════════════════════╗
║                                                               ║
║        ★  YOU ARE NOW A MATH-FOR-AI WIZARD  ★                 ║
║                                                               ║
║        ★ = Critical for ML/AI — prioritize these topics       ║
║                                                               ║
╚═══════════════════════════════════════════════════════════════╝

Quick Reference — Learning Order

Phase	Chapters	Focus
Phase 1 — Core Foundations	01 → 02 → 04 → 05	Numbers, vectors, matrices, derivatives, gradients
Phase 2 — Probabilistic Thinking	06 → 07	Random variables, distributions, estimation, inference
Phase 3 — Making Models Learn	08 → 09	Optimization algorithms, information-theoretic losses
Phase 4 — Deeper Theory	03 → 10 → 11 → 12	Advanced linear algebra, numerical methods, graphs, kernels
Phase 5 — ML Math in Practice	13 → 14	Loss functions, activations, architecture-specific math
Phase 6 — LLM Math	15 → 16	Attention, embeddings, scaling laws, training pipelines
Phase 7 — Production & Safety	17 → 18 → 19	Evaluation, alignment (RLHF/DPO), MLOps
Phase 8 — Research Frontiers	20 → 21 → 22 → 23 → 24 → 25	Fourier analysis, learning theory, causality, geometry

📚 Chapters

Core Mathematics

▸01 · Mathematical Foundations — Number systems, sets, logic, proofs

Topic	Description
Number Systems	Natural, integer, rational, real, and complex numbers (N Z Q R C)
Sets & Logic	Set operations, propositional logic, quantifiers
Functions & Mappings	Domain, range, injectivity, surjectivity, composition
Summation & Product Notation	Sigma/Pi notation, index manipulation
Einstein Summation	Index notation used in tensor operations
Proof Techniques	Induction, contradiction, direct proof, contrapositive

▸02 · Linear Algebra Basics — Vectors, matrices, systems of equations

Topic	ML Connection
Vectors & Spaces	Feature representations, embeddings
Matrix Operations	Forward propagation, transformations
Systems of Equations	Linear regression (normal equations)
Determinants	Change of variables in normalizing flows
Matrix Rank	Model capacity, low-rank approximations
Vector Spaces & Subspaces	Dimensionality, feature spaces

▸03 · Advanced Linear Algebra — Eigen decomposition, SVD, PCA

Topic	ML Connection
Eigenvalues & Eigenvectors	PCA, spectral clustering, stability analysis
Singular Value Decomposition	Recommender systems, dimensionality reduction
Principal Component Analysis	Feature extraction, data compression
Linear Transformations	Neural network layers as transforms
Orthogonality & Orthonormality	Gram-Schmidt, decorrelated features
Matrix Norms	Regularization, operator bounds
Positive Definite Matrices	Covariance matrices, kernel validity
Matrix Decompositions	LU, QR, Cholesky — efficient solvers

▸04 · Calculus Fundamentals — Limits, derivatives, integrals, series

Topic	ML Connection
Limits & Continuity	Convergence guarantees, activation smoothness
Derivatives & Differentiation	Gradient computation for all parameters
Integration	Probability densities, normalization constants
Series & Sequences	Taylor approximations, convergence analysis

▸05 · Multivariate Calculus — Gradients, Jacobians, backpropagation

Topic	ML Connection
Partial Derivatives & Gradients	Direction of steepest descent
Jacobians & Hessians	Multi-output functions, second-order methods
Chain Rule & Backpropagation	Training every neural network
Optimality Conditions	Convergence criteria, saddle points
Automatic Differentiation	PyTorch autograd, JAX

Probability, Statistics & Optimization

▸06 · Probability Theory — Distributions, expectations, stochastic processes

Topic	ML Connection
Random Variables	Output uncertainty, stochastic models
Common Distributions	Gaussian, Bernoulli, Poisson — model assumptions
Joint Distributions	Multi-variate modeling, copulas
Expectation & Moments	Loss functions, feature statistics
Concentration Inequalities	Generalization bounds, sample complexity
Stochastic Processes	Time series, diffusion models
Markov Chains	MCMC sampling, language modeling

▸07 · Statistics — Estimation, testing, Bayesian inference, regression

Topic	ML Connection
Descriptive Statistics	EDA, feature engineering
Estimation Theory	MLE, MAP — training as estimation
Hypothesis Testing	A/B testing, model comparison
Bayesian Inference	Posterior updates, uncertainty quantification
Time Series	Sequence forecasting, temporal patterns
Regression Analysis	Baseline models, diagnostics

▸08 · Optimization — SGD, Adam, constrained optimization, regularization

Topic	ML Connection
Convex Optimization	Global guarantees, convergence proofs
Gradient Descent	The engine behind all training
Second-Order Methods	Newton, BFGS — faster convergence
Constrained Optimization	Lagrange multipliers, KKT conditions
Stochastic Optimization	SGD, mini-batch — scaling to big data
Optimization Landscape	Local minima, saddle points, loss surfaces
Adaptive Learning Rate	Adam, RMSProp, AdaGrad
Regularization Methods	L1/L2, Dropout, weight decay
Hyperparameter Optimization	Grid search, Bayesian optimization
Learning Rate Schedules	Warmup, cosine annealing, step decay

Information Theory & Numerical Methods

▸09 · Information Theory — Entropy, KL divergence, cross-entropy

Topic	ML Connection
Entropy	Decision tree splits, uncertainty measurement
KL Divergence	VAE loss, knowledge distillation
Mutual Information	Feature selection, InfoGAN
Cross-Entropy	The most common classification loss
Fisher Information	Efficient estimation, natural gradient

▸10 · Numerical Methods — Floating-point, stability, interpolation, integration

📖 Chapter README

Topic	ML Connection
Floating-Point Arithmetic	Mixed precision training (FP16/BF16/FP8), loss scaling, Flash Attention numerics
Numerical Linear Algebra	Stable solvers, iterative methods (CG/Lanczos), condition number for training
Numerical Optimization	L-BFGS two-loop, Armijo line search, gradient checking, trust-region methods
Interpolation & Approximation	RoPE/sinusoidal PE, KAN B-splines, Runge's phenomenon, FFT, random Fourier features
Numerical Integration	Gaussian quadrature, Monte Carlo variance reduction, reparameterization trick (VAE ELBO)

Specialized Mathematics

▸11 · Graph Theory — Graph algorithms, spectral methods, GNNs

Topic	ML Connection
Graph Basics	Social networks, molecular graphs
Graph Representations	Adjacency/Laplacian matrices
Graph Algorithms	Shortest path, centrality, traversal
Spectral Graph Theory	Community detection, graph wavelets
Graph Neural Networks	Message passing, GCN, GAT
Random Graphs	Erdos-Renyi, network analysis

▸12 · Functional Analysis — Hilbert spaces, kernel methods

Topic	ML Connection
Normed Spaces	Regularization theory
Hilbert Spaces	RKHS, function space learning
Kernel Methods	SVM, Gaussian processes, kernel trick

ML-Specific Mathematics

▸13 · ML-Specific Math — Loss functions, activations, normalization, sampling

Topic	ML Connection
Loss Functions	MSE, cross-entropy, hinge, contrastive
Activation Functions	ReLU, GELU, sigmoid, softmax — and their gradients
Normalization Techniques	BatchNorm, LayerNorm, RMSNorm
Sampling Methods	MCMC, rejection sampling, importance sampling

▸14 · Math for Specific Models — NNs, CNNs, RNNs, Transformers, GANs, RL

Topic	ML Connection
Linear Models	Regression, classification foundations
Neural Networks	Universal approximation, backprop math
Probabilistic Models	GMMs, HMMs, variational inference
RNN & LSTM Math	Vanishing gradients, gating mechanisms
Transformer Architecture	Attention is all you need — the math
Reinforcement Learning	Bellman equations, policy gradients
Generative Models	VAEs, GANs, diffusion models
CNN & Convolution Math	Convolution theorem, pooling, receptive fields

LLM Mathematics

▸15 · Math for LLMs — Attention, embeddings, scaling laws, inference

Topic	ML Connection
Tokenization Math	BPE, WordPiece — information-theoretic foundations
Embedding Space Math	Geometric properties of learned representations
Attention Mechanism Math	Scaled dot-product, multi-head, causal masking
Positional Encodings	Sinusoidal, RoPE, ALiBi
Language Model Probability	Next-token prediction, perplexity
Training at Scale	Distributed training, gradient accumulation
Fine-Tuning Math	LoRA, adapters, parameter-efficient methods
Scaling Laws	Chinchilla, compute-optimal training
Efficient Attention & Inference	FlashAttention, KV-cache, speculative decoding
Mixture of Experts & Routing	Sparse gating, load balancing
Quantization & Distillation	INT8/INT4, knowledge distillation
RAG Math & Retrieval	Retrieval-augmented generation
Serving & Systems Tradeoffs	Latency, throughput, batching strategies

▸16 · LLM Training Data Pipeline — Data quality, deduplication, mixture optimization

Topic	Description
Data Format Standards	JSONL, tokenized formats, schema validation
JSONL Generation	Efficient serialization for training
Quality Checks	Filtering, decontamination, toxicity
Full Dataset Assembly	Combining and balancing data sources
Contamination & Dedup Audits	Preventing benchmark leakage
Documentation & Governance	Data cards, provenance tracking
Data Mixture Optimization	Optimal domain ratios for training

Evaluation, Safety & Production

▸17 · Evaluation & Reliability — Benchmarks, calibration, A/B testing

Topic	Description
Capability Benchmarks	MMLU, HumanEval, evaluation methodology
Calibration & Uncertainty	Confidence vs. accuracy alignment
Robustness & Distribution Shift	Out-of-distribution detection
Error Analysis & Ablations	Systematic debugging
Online Experimentation & A/B Testing	Statistical rigor in deployment

▸18 · Alignment & Safety — SFT, RLHF, DPO, red-teaming

Topic	Description
Instruction Tuning & SFT	Supervised fine-tuning mathematics
Preference Optimization (RLHF & DPO)	Reward modeling, Bradley-Terry, DPO objective
Red-Teaming & Safety Evaluations	Adversarial robustness testing
Policy & Guardrails	Constitutional AI, rule-based filtering
Human-in-the-Loop & Monitoring	Active learning, feedback loops

▸19 · Production ML & MLOps — Serving, monitoring, drift detection

Topic	Description
Data Versioning & Lineage	Reproducibility at scale
Experiment Tracking	MLflow, W&B — systematic experimentation
Feature Stores & Data Contracts	Consistent feature engineering
Model Serving & Inference Optimization	Latency, batching, hardware
Monitoring, Drift & Retraining	Detecting degradation
LLM Evaluation, Observability & Guardrails	LLM-specific ops

Advanced Theory

▸20 · Fourier Analysis & Signal Processing — FFT, wavelets, convolution theorem

Topic	ML Connection
Fourier Series	Periodic signal decomposition
Fourier Transform	Frequency domain analysis
DFT & FFT	Efficient spectral computation
Convolution Theorem	CNNs in frequency domain
Wavelets	Multi-resolution analysis, time-frequency

▸21 · Statistical Learning Theory — PAC learning, VC dimension, generalization

Topic	ML Connection
PAC Learning	Learnability guarantees
VC Dimension	Model complexity measurement
Bias-Variance Tradeoff	The fundamental modeling tension
Generalization Bounds	Why models work on unseen data
Rademacher Complexity	Data-dependent complexity measures

▸22 · Causal Inference — SCMs, do-calculus, counterfactuals

Topic	ML Connection
Structural Causal Models	Beyond correlation
Do-Calculus	Interventional reasoning
Counterfactuals	"What if" reasoning
Causal Discovery	Learning causal structure from data

▸23 · Game Theory — Nash equilibria, minimax, adversarial methods

Topic	ML Connection
Nash Equilibria	GAN training dynamics
Minimax Theorem	Adversarial robustness
Multi-Agent Systems	Cooperative/competitive learning
Adversarial Game Theory	Security and robustness

▸24 · Measure Theory — Sigma-algebras, Lebesgue integration, probability spaces

Topic	ML Connection
Sigma-Algebras	Rigorous probability foundations
Lebesgue Integration	Expectation in continuous spaces
Probability Measure Spaces	Formal probability theory
Radon-Nikodym Theorem	Density ratios, importance sampling

▸25 · Differential Geometry — Manifolds, Riemannian geometry, geodesics

Topic	ML Connection
Manifolds	Data lies on low-dimensional manifolds
Riemannian Geometry	Natural gradient, information geometry
Geodesics	Shortest paths in curved spaces
Optimization on Manifolds	Constrained optimization on curved surfaces

📊 What's Inside Each Topic

Every topic folder follows a consistent structure:

📂 02-Linear-Algebra-Basics/
├── 📂 01-Vectors-and-Spaces/
│   ├── 📖 notes.md           ← Concepts, intuition, key formulas
│   ├── 🔬 theory.ipynb       ← Interactive demos with visualizations
│   └── ✏️ exercises.ipynb    ← Practice problems with solutions
├── 📂 02-Matrix-Operations/
│   ├── 📖 notes.md
│   ├── 🔬 theory.ipynb
│   └── ✏️ exercises.ipynb
└── ...

📖 Resources

The docs/ folder contains supplementary references:

Document	Description
ML Math Map	Visual guide — which math is used where in ML
Notation Guide	Consistent notation conventions across the repo
Cheatsheet	Quick-reference formula sheet
Interview Prep	Common ML math interview questions with solutions
Visualization Guide	Tips for building mathematical intuition visually

🛠️ Tech Stack

Tool	Purpose
Python 3.8+	Primary language
NumPy / SciPy	Numerical computing
Matplotlib / Seaborn / Plotly	Visualizations
SymPy	Symbolic mathematics
Jupyter Lab	Interactive notebooks
scikit-learn	ML examples and demos

🤝 Contributing

37 sections still need implementation — this is the primary way to contribute.

Implement a Missing Section (highest impact)

Browse open issues labelled section: missing
Comment on the issue to claim the section
Fork the repo and create a branch: git checkout -b section/22-causal-inference/02-do-calculus
Implement all three files following CONTRIBUTING.md:
- notes.md (2000+ lines)
- theory.ipynb (50+ cells, built via Python builder script)
- exercises.ipynb (8+ graded exercises)
Open a Pull Request — link it to the issue

Chapters open for contribution

Chapter	Sections needed	Issues
20 Fourier Analysis	5	Browse
21 Statistical Learning Theory	5	Browse
22 Causal Inference	4	Browse
23 Game Theory	4	Browse
24 Measure Theory	4	Browse
25 Differential Geometry	4	Browse

Other ways to help

Fix errors — typo or incorrect formula? Open a PR directly
Improve exercises — add harder problems or better test cases
Add visualizations — interactive plots for existing sections

⭐ Star History

If this repo helped you, consider giving it a star — it helps others find it too.

📄 License

This project is open source and available under the MIT License.

Built for learners, researchers, and engineers who believe understanding the math makes you a better AI practitioner.

"In God we trust. All others must bring data." — W. Edwards Deming

Math for LLMs

Course Modules

Mathematical Foundations

Linear Algebra Basics

Advanced Linear Algebra

Calculus Fundamentals

Multivariate Calculus

Probability Theory

Statistics

Optimization

Information Theory

Numerical Methods

Graph Theory

Functional Analysis

ML Specific Math

Math for Specific Models

Math for LLMs

LLM Training Data Pipeline

Evaluation and Reliability

Alignment and Safety

Production ML and MLOps

Fourier Analysis and Signal Processing

Statistical Learning Theory

Causal Inference

Game Theory

Measure Theory

Differential Geometry

docs

Mathematics for AI / ML / LLM

The Complete Math Foundation You Need to Master AI

Why This Repo?

🚀 Quick Start

🗺️ Learning Roadmap

Phase 1 — Core Math Foundations Ch. 01 → 02 → 04 → 05

Phase 2 — Probabilistic Thinking Ch. 06 → 07

Phase 3 — Making Models Learn Ch. 08 → 09

Phase 4 — Deeper Theory Ch. 03 → 10 → 11 → 12

Phase 5 — ML Math in Practice Ch. 13 → 14

Phase 6 — LLM Math Ch. 15 → 16

Phase 7 — Production & Safety Ch. 17 → 18 → 19

Phase 8 — Research Frontiers Ch. 20 → 21 → 22 → 23 → 24 → 25

Quick Reference — Learning Order

📚 Chapters

Core Mathematics

Probability, Statistics & Optimization

Information Theory & Numerical Methods

Specialized Mathematics

ML-Specific Mathematics

LLM Mathematics

Evaluation, Safety & Production

Advanced Theory

📊 What's Inside Each Topic

📖 Resources

🛠️ Tech Stack

🤝 Contributing

Implement a Missing Section (highest impact)

Chapters open for contribution

Other ways to help

⭐ Star History

📄 License

Phase 1 — Core Math Foundations `Ch. 01 → 02 → 04 → 05`

Phase 2 — Probabilistic Thinking `Ch. 06 → 07`

Phase 3 — Making Models Learn `Ch. 08 → 09`

Phase 4 — Deeper Theory `Ch. 03 → 10 → 11 → 12`

Phase 5 — ML Math in Practice `Ch. 13 → 14`

Phase 6 — LLM Math `Ch. 15 → 16`

Phase 7 — Production & Safety `Ch. 17 → 18 → 19`

Phase 8 — Research Frontiers `Ch. 20 → 21 → 22 → 23 → 24 → 25`