Exercises Notebook

Converted from exercises.ipynb for web reading.

Integration - Exercises

10 graded exercises covering the full section arc, from core calculus mechanics to ML-facing applications.

Format	Description
Problem	Markdown cell with task description
Your Solution	Code cell for learner work
Solution	Reference solution with checks

Difficulty: straightforward -> moderate -> challenging.

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Code cell 3

import numpy as np
import numpy.linalg as la
from scipy import integrate, special, stats
from math import factorial
import matplotlib.patches as patches

COLORS = {
    "primary": "#0077BB",
    "secondary": "#EE7733",
    "tertiary": "#009988",
    "error": "#CC3311",
    "neutral": "#555555",
    "highlight": "#EE3377",
}
HAS_MPL = True
np.set_printoptions(precision=8, suppress=True)
np.random.seed(42)

def header(title):
    print("\n" + "=" * len(title))
    print(title)
    print("=" * len(title))

def check_true(name, cond):
    ok = bool(cond)
    print(f"{'PASS' if ok else 'FAIL'} - {name}")
    return ok

def check_close(name, got, expected, tol=1e-8):
    ok = np.allclose(got, expected, atol=tol, rtol=tol)
    print(f"{'PASS' if ok else 'FAIL'} - {name}: got {got}, expected {expected}")
    return ok



def centered_diff(f, x, h=1e-6):
    return (f(x + h) - f(x - h)) / (2 * h)

def forward_diff(f, x, h=1e-6):
    return (f(x + h) - f(x)) / h

def backward_diff(f, x, h=1e-6):
    return (f(x) - f(x - h)) / h



def grad_check(f, x, analytic_grad, h=1e-6):
    x = np.asarray(x, dtype=float)
    analytic_grad = np.asarray(analytic_grad, dtype=float)
    numeric_grad = np.zeros_like(x, dtype=float)
    for idx in np.ndindex(x.shape):
        x_plus = x.copy(); x_minus = x.copy()
        x_plus[idx] += h; x_minus[idx] -= h
        numeric_grad[idx] = (f(x_plus) - f(x_minus)) / (2 * h)
    denom = la.norm(analytic_grad) + la.norm(numeric_grad) + 1e-12
    return la.norm(analytic_grad - numeric_grad) / denom



def check(name, got, expected, tol=1e-8):
    return check_close(name, got, expected, tol=tol)

print("Chapter helper setup complete.")

Exercise 1 (★): Riemann Sums and Convergence

Consider $f(x) = x^2$ on $[0, 1]$ . The true integral is $\int_0^1 x^2\,dx = \frac{1}{3}$ .

(a) Implement the left Riemann sum $L_n = \frac{1}{n}\sum_{i=0}^{n-1} f(x_i)$ where $x_i = i/n$ .

(b) Implement the right Riemann sum $R_n$ and midpoint rule $M_n$ .

(c) Compute $L_n$ , $R_n$ , $M_n$ for $n \in \{10, 100, 1000\}$ and measure the error $|\text{approx} - 1/3|$ .

(d) What is the convergence order of each method? Fit a log-log slope to confirm $O(h)$ for left/right and $O(h^2)$ for midpoint.

Code cell 5

# Your Solution
# Exercise 1 - learner workspace
# Write your solution here, then run the reference solution below to compare.
print("Learner workspace ready for Exercise 1.")

Code cell 6

# Solution
# Exercise 1 - reference solution

import numpy as np

def f(x): return x**2
a, b = 0.0, 1.0
true_val = 1/3

def left_riemann(f, a, b, n):
    xs = np.linspace(a, b, n+1)[:-1]
    h = (b-a)/n
    return h * np.sum(f(xs))

def right_riemann(f, a, b, n):
    xs = np.linspace(a, b, n+1)[1:]
    h = (b-a)/n
    return h * np.sum(f(xs))

def midpoint_rule(f, a, b, n):
    xs = np.linspace(a, b, n+1)
    mids = (xs[:-1] + xs[1:]) / 2
    h = (b-a)/n
    return h * np.sum(f(mids))

header('Exercise 1: Riemann Sums and Convergence')

ns = [10, 100, 1000]
for n in ns:
    L = left_riemann(f, a, b, n)
    R = right_riemann(f, a, b, n)
    M = midpoint_rule(f, a, b, n)
    print(f'n={n:5d}: L_err={abs(L-true_val):.2e}, R_err={abs(R-true_val):.2e}, M_err={abs(M-true_val):.2e}')

# (d) Convergence order
ns_fit = [10, 50, 100, 500, 1000, 5000]
errs_L = [abs(left_riemann(f,a,b,n) - true_val) for n in ns_fit]
errs_M = [abs(midpoint_rule(f,a,b,n) - true_val) for n in ns_fit]
log_ns = np.log(ns_fit)
slope_L = np.polyfit(log_ns, np.log(errs_L), 1)[0]
slope_M = np.polyfit(log_ns, np.log(errs_M), 1)[0]
check_close('Left Riemann convergence order ≈ -1', slope_L, -1.0, tol=0.2)
check_close('Midpoint convergence order ≈ -2', slope_M, -2.0, tol=0.3)

print(f'\nLeft/Right: O(h^{abs(slope_L):.2f}), Midpoint: O(h^{abs(slope_M):.2f})')
print('\nTakeaway: Midpoint rule is O(h²) — same as trapezoid, both better than O(h) left/right sums.')

print("Exercise 1 solution complete.")

Exercise 2 (★): Fundamental Theorem of Calculus

(a) Define the accumulation function $F(x) = \int_0^x \sin(t^2)\,dt$ . Implement it numerically using scipy.integrate.quad and verify that $F'(x) = \sin(x^2)$ using centered finite differences.

(b) Use FTC Part 2 to evaluate $\int_1^4 (3x^2 - 2x + 1)\,dx$ analytically, then verify numerically.

(c) Compute $\frac{d}{dx}\int_{x}^{x^2} \cos(t)\,dt$ using the Leibniz rule (moving-limits version). Verify at $x = 0.5$ .

Code cell 8

# Your Solution
# Exercise 2 - learner workspace
# Write your solution here, then run the reference solution below to compare.
print("Learner workspace ready for Exercise 2.")

Code cell 9

# Solution
# Exercise 2 - reference solution

import numpy as np
from scipy import integrate

# (a)
def F(x):
    val, _ = integrate.quad(lambda t: np.sin(t**2), 0, x)
    return val

def F_prime_numerical(x, h=1e-7):
    return (F(x+h) - F(x-h)) / (2*h)

header('Exercise 2: Fundamental Theorem of Calculus')

x_vals = [0.5, 1.0, 1.5, 2.0]
print('(a) FTC Part 1: d/dx[∫₀ˣ sin(t²)dt] = sin(x²)')
for x in x_vals:
    Fp_num = F_prime_numerical(x)
    Fp_exact = np.sin(x**2)
    ok = abs(Fp_num - Fp_exact) < 1e-4
    print(f"  {'PASS' if ok else 'FAIL'} x={x}: numerical={Fp_num:.8f}, sin(x²)={Fp_exact:.8f}")

# (b) FTC Part 2
def antideriv_b(x): return x**3 - x**2 + x
analytic_b = antideriv_b(4) - antideriv_b(1)
numerical_b, _ = integrate.quad(lambda x: 3*x**2 - 2*x + 1, 1, 4)
print(f'\n(b) ∫₁⁴ (3x²-2x+1) dx = [x³-x²+x]₁⁴ = {analytic_b}')
check_close('FTC Part 2', analytic_b, numerical_b)

# (c) Leibniz rule: d/dx ∫ₓ^(x²) cos(t)dt
# = cos(x²)·(2x) - cos(x)·1 = 2x·cos(x²) - cos(x)
def leibniz_deriv(x):
    return 2*x*np.cos(x**2) - np.cos(x)

def G(x):
    if x > x**2:
        val, _ = integrate.quad(np.cos, x**2, x)
        return -val
    val, _ = integrate.quad(np.cos, x, x**2)
    return val

x0 = 0.5
G_deriv_num = (G(x0+1e-6) - G(x0-1e-6)) / 2e-6
G_deriv_analytic = leibniz_deriv(x0)
check_close(f'Leibniz at x={x0}', G_deriv_analytic, G_deriv_num, tol=1e-4)

print('\nTakeaway: FTC Part 1 = accumulation functions; Part 2 = evaluation by antiderivative.')

print("Exercise 2 solution complete.")

Exercise 3 (★): u-Substitution

(a) Compute $\int_0^1 \frac{x}{1+x^2}\,dx$ analytically (use $u = 1+x^2$ ). Verify numerically.

(b) Compute $\int_0^\pi \sin^3(x)\,dx$ using $u = \cos(x)$ . Verify numerically.

(c) Compute $\int_0^{\infty} x\,e^{-x^2}\,dx$ analytically. Verify numerically with an improper integral.

(d) The softmax normalizing constant is $Z(x) = \sum_i e^{x_i}$ . For a continuous analogue, suppose scores are $s(t) = -t^2$ on $(-\infty,\infty)$ . Compute $Z = \int_{-\infty}^\infty e^{-t^2}\,dt$ and explain its connection to the Gaussian.

Code cell 11

# Your Solution
# Exercise 3 - learner workspace
# Write your solution here, then run the reference solution below to compare.
print("Learner workspace ready for Exercise 3.")

Code cell 12

# Solution
# Exercise 3 - reference solution

import numpy as np
from scipy import integrate

header('Exercise 3: u-Substitution')

# (a) u = 1+x², du = 2x dx
# ∫₀¹ x/(1+x²) dx = (1/2)[ln|u|]₁² = (1/2)ln(2)
analytic_a = np.log(2)/2
numerical_a, _ = integrate.quad(lambda x: x/(1+x**2), 0, 1)
check_close('(a) ∫₀¹ x/(1+x²) dx = ln(2)/2', analytic_a, numerical_a)

# (b) ∫₀^π sin³(x) dx = 4/3
analytic_b = 4/3
numerical_b, _ = integrate.quad(lambda x: np.sin(x)**3, 0, np.pi)
check_close('(b) ∫₀^π sin³(x) dx = 4/3', analytic_b, numerical_b)

# (c) ∫₀^∞ x·e^(-x²) dx = 1/2
analytic_c = 1/2
numerical_c, _ = integrate.quad(lambda x: x*np.exp(-x**2), 0, np.inf)
check_close('(c) ∫₀^∞ x·e^(-x²) dx = 1/2', analytic_c, numerical_c)

# (d) Z = ∫_{-∞}^∞ e^(-t²) dt = √π
Z = np.sqrt(np.pi)
Z_numerical, _ = integrate.quad(lambda t: np.exp(-t**2), -50, 50)
check_close('(d) ∫ e^(-t²) dt = √π', Z, Z_numerical)
print(f'Z = √π = {Z:.8f}')
print('Connection: N(0, 1/2) has density e^(-x²)/√π — normalizing const = √π')

print('\nTakeaway: u-substitution = reverse chain rule; transforms hard integrals into standard forms.')

print("Exercise 3 solution complete.")

Exercise 4 (★★): Integration by Parts

(a) Compute $\int x^2 \ln(x)\,dx$ analytically (use $u=\ln x$ , $dv=x^2\,dx$ ). Verify at $x=2$ .

(b) Compute $\int_0^1 x\,e^{-x}\,dx$ analytically. What is the limit $\int_0^\infty x\,e^{-x}\,dx$ ?

(c) Use the tabular method to compute $\int x^3 e^x\,dx$ . Verify your antiderivative by differentiation.

(d) The entropy of an exponential distribution $p(x) = \lambda e^{-\lambda x}$ ( $x\ge 0$ ) is $H = -\int_0^\infty p(x)\ln p(x)\,dx$ . Compute it analytically (use integration by parts). Verify numerically for $\lambda = 2$ .

Code cell 14

# Your Solution
# Exercise 4 - learner workspace
# Write your solution here, then run the reference solution below to compare.
print("Learner workspace ready for Exercise 4.")

Code cell 15

# Solution
# Exercise 4 - reference solution

import numpy as np
from scipy import integrate

header('Exercise 4: Integration by Parts')

# (a) u = ln(x), dv = x²dx => du = 1/x dx, v = x³/3
# ∫ x²ln(x)dx = (x³/3)ln(x) - ∫(x³/3)(1/x)dx = (x³/3)ln(x) - x³/9 + C
def antideriv_a(x): return (x**3/3)*np.log(x) - x**3/9

x0 = 2.0
d_analytic = x0**2 * np.log(x0)  # what the derivative should be
d_numerical = (antideriv_a(x0+1e-7) - antideriv_a(x0-1e-7)) / 2e-7
check_close('(a) d/dx[antideriv] = x²ln(x) at x=2', d_numerical, d_analytic)

# (b) u=x, dv=e^(-x)dx => du=dx, v=-e^(-x)
# ∫₀¹ xe^(-x)dx = [-xe^(-x)]₀¹ + ∫₀¹ e^(-x)dx = -e^(-1) + [-e^(-x)]₀¹
#               = -1/e + (-1/e + 1) = 1 - 2/e
analytic_b = 1 - 2/np.e
numerical_b, _ = integrate.quad(lambda x: x*np.exp(-x), 0, 1)
check_close('(b) ∫₀¹ x·e^(-x) dx = 1-2/e', analytic_b, numerical_b)

# ∫₀^∞ x·e^(-x) dx = Γ(2) = 1! = 1
improper_b, _ = integrate.quad(lambda x: x*np.exp(-x), 0, np.inf)
check_close('∫₀^∞ x·e^(-x) dx = Γ(2) = 1', improper_b, 1.0)

# (c) Tabular: ∫ x³eˣ dx
# derivatives of x³: 3x², 6x, 6, 0
# integrals of eˣ: eˣ, eˣ, eˣ, eˣ
# = x³eˣ - 3x²eˣ + 6xeˣ - 6eˣ = eˣ(x³-3x²+6x-6)
def antideriv_c(x): return np.exp(x)*(x**3 - 3*x**2 + 6*x - 6)
def f_c(x): return x**3 * np.exp(x)
x_test = 1.5
d_c = (antideriv_c(x_test+1e-7) - antideriv_c(x_test-1e-7)) / 2e-7
check_close('(c) tabular: d/dx[eˣ(x³-3x²+6x-6)] = x³eˣ', d_c, f_c(x_test))

# (d) Entropy of Exp(λ): H = 1 - ln(λ)
lam = 2.0
H_analytic = 1 - np.log(lam)
def H_integrand(x):
    p = lam * np.exp(-lam*x)
    return -p * np.log(p) if p > 1e-300 else 0.0
H_numerical, _ = integrate.quad(H_integrand, 0, 200)
check_close(f'(d) H[Exp({lam})] = 1-ln(λ) = {H_analytic:.6f}', H_analytic, H_numerical, tol=1e-4)

print('\nTakeaway: IBP reduces integrals of products; tabular method handles repeated applications.')

print("Exercise 4 solution complete.")

Exercise 5 (★★): Numerical Integration Comparison

The integral $\int_0^2 \sqrt{x}\,e^{-x}\,dx$ has no elementary closed form.

(a) Implement the trapezoid rule and Simpson's rule. Compare their accuracy for $n \in \{4, 8, 16, 32, 64\}$ .

(b) Implement Monte Carlo integration with $n$ uniform samples. Compare the error scaling to trapezoid.

(c) Show that the trapezoid error scales as $O(h^2)$ and Simpson's as $O(h^4)$ by fitting log-log slopes.

(d) Use scipy.integrate.quad to get the reference value and compute a relative error table.

Code cell 17

# Your Solution
# Exercise 5 - learner workspace
# Write your solution here, then run the reference solution below to compare.
print("Learner workspace ready for Exercise 5.")

Code cell 18

# Solution
# Exercise 5 - reference solution

import numpy as np
from scipy import integrate

def f(x): return np.sqrt(x) * np.exp(-x)
a, b = 0.0, 2.0
true_val, _ = integrate.quad(f, a, b)

def my_trapezoid(f, a, b, n):
    xs = np.linspace(a, b, n+1)
    h = (b-a)/n
    return h * (f(xs[0])/2 + np.sum(f(xs[1:-1])) + f(xs[-1])/2)

def my_simpsons(f, a, b, n):
    if n % 2 != 0: n += 1
    xs = np.linspace(a, b, n+1)
    h = (b-a)/n
    c = np.ones(n+1); c[1:-1:2]=4; c[2:-2:2]=2
    return (h/3) * np.dot(c, f(xs))

def monte_carlo_int(f, a, b, n, seed=42):
    np.random.seed(seed)
    xs = np.random.uniform(a, b, n)
    return (b-a) * np.mean(f(xs))

header('Exercise 5: Numerical Integration Comparison')
print(f'Reference: {true_val:.10f}\n')

ns = [4, 8, 16, 32, 64, 128]
t_errs, s_errs, mc_errs = [], [], []
print(f'{'n':>6} | {'Trap err':>12} | {'Simp err':>12} | {'MC err (n²)':>14}')
for n in ns:
    t = my_trapezoid(f, a, b, n)
    s = my_simpsons(f, a, b, n)
    mc = monte_carlo_int(f, a, b, n*n)
    t_errs.append(abs(t - true_val))
    s_errs.append(abs(s - true_val))
    mc_errs.append(abs(mc - true_val))
    print(f'{n:6d} | {t_errs[-1]:12.2e} | {s_errs[-1]:12.2e} | {mc_errs[-1]:14.2e}')

# Convergence order
log_ns = np.log(ns)
slope_t = np.polyfit(log_ns, np.log(np.array(t_errs)+1e-15), 1)[0]
slope_s = np.polyfit(log_ns, np.log(np.array(s_errs)+1e-15), 1)[0]
check_close('Trapezoid O(h²): slope ≈ -2', slope_t, -2.0, tol=0.5)
check_close('Simpsons O(h⁴): slope ≈ -4', slope_s, -4.0, tol=1.0)

print('\nTakeaway: Simpson\'s (O(h⁴)) much more efficient than trapezoid (O(h²)) for smooth integrands.')

print("Exercise 5 solution complete.")

Exercise 6 (★★): Improper Integrals

(a) Determine convergence of $\int_1^\infty \frac{1}{x^p}\,dx$ for $p = 0.5, 1.0, 1.5, 2.0$ . For convergent cases, compute the value analytically.

(b) Show that $\int_{-\infty}^\infty \frac{1}{1+x^2}\,dx = \pi$ analytically (antiderivative is $\arctan$ ). Verify numerically.

(c) The Gamma function $\Gamma(n) = \int_0^\infty x^{n-1}e^{-x}\,dx$ satisfies $\Gamma(n) = (n-1)!$ for positive integers. Verify $\Gamma(1) = 1$ , $\Gamma(2) = 1$ , $\Gamma(3) = 2$ , $\Gamma(4) = 6$ numerically.

(d) The Gaussian integral $\int_{-\infty}^\infty e^{-x^2}\,dx = \sqrt{\pi}$ is proved using polar coordinates. Verify the result numerically and show that $\int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}}e^{-x^2/2}\,dx = 1$ (normal PDF normalizes).

Code cell 20

# Your Solution
# Exercise 6 - learner workspace
# Write your solution here, then run the reference solution below to compare.
print("Learner workspace ready for Exercise 6.")

Code cell 21

# Solution
# Exercise 6 - reference solution

import numpy as np
from scipy import integrate, special

header('Exercise 6: Improper Integrals')

# (a) p-integrals
print('(a) p-integral: ∫₁^∞ x^(-p) dx converges iff p > 1')
ps = [0.5, 1.0, 1.5, 2.0]
for p in ps:
    converges = p > 1
    if converges:
        # ∫₁^∞ x^(-p) = [x^(1-p)/(1-p)]₁^∞ = 1/(p-1)
        analytic_val = 1/(p-1)
        try:
            numerical_val, _ = integrate.quad(lambda x: x**(-p), 1, np.inf)
        except:
            numerical_val = float('inf')
        ok = abs(analytic_val - numerical_val) < 1e-6
        print(f"  {'PASS' if ok else 'FAIL'} p={p}: converges, value=1/(p-1)={analytic_val:.4f}")
    else:
        print(f'  PASS p={p}: diverges (p ≤ 1)')

# (b)
analytic_b = np.pi
numerical_b, _ = integrate.quad(lambda x: 1/(1+x**2), -np.inf, np.inf)
check_close('(b) ∫ 1/(1+x²) dx = π', analytic_b, numerical_b)

# (c)
print('\n(c) Gamma function:')
for n in [1, 2, 3, 4, 5]:
    G, _ = integrate.quad(lambda x: x**(n-1) * np.exp(-x), 0, np.inf)
    expected = float(factorial(n-1))
    ok = abs(G - expected) < 1e-5
    print(f"  {'PASS' if ok else 'FAIL'} Γ({n}) = {G:.6f} = {int(expected)}! = {int(expected)}")

# (d)
gaussian, _ = integrate.quad(lambda x: np.exp(-x**2), -50, 50)
check_close('(d) ∫ e^(-x²) dx = √π', gaussian, np.sqrt(np.pi))

from scipy.stats import norm
normal_norm, _ = integrate.quad(norm.pdf, -50, 50)
check_close('N(0,1) PDF normalizes to 1', normal_norm, 1.0)

print('\nTakeaway: Improper integrals require limits; Gaussian integral = √π via polar trick.')

print("Exercise 6 solution complete.")

Exercise 7 (★★★): KL Divergence and Entropy

(a) Compute $D_{KL}(p\|q)$ numerically for $p = N(\mu_1, \sigma_1^2)$ and $q = N(\mu_2, \sigma_2^2)$ for several parameter pairs. Verify against the analytic formula:

D_{KL}(p\|q) = \ln\frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1-\mu_2)^2}{2\sigma_2^2} - \frac{1}{2}

(b) Verify the Gibbs inequality ( $D_{KL} \ge 0$ ) by sampling random Gaussian pairs. For which pairs is $D_{KL} = 0$ ?

(c) Compute the VAE KL term analytically: $D_{KL}(N(\mu, \mathbf{I}) \| N(\mathbf{0}, \mathbf{I})) = \frac{1}{2}\|\mu\|^2$ for $d=4$ dimensions. Verify numerically (via integration over the 4D density — approximate using Monte Carlo).

(d) Compute $H[p] = -\int p\ln p\,dx$ for $p = N(0, \sigma^2)$ analytically ( $H = \frac{1}{2}\ln(2\pi e \sigma^2)$ ). Verify for $\sigma \in \{0.5, 1.0, 2.0\}$ .

Code cell 23

# Your Solution
# Exercise 7 - learner workspace
# Write your solution here, then run the reference solution below to compare.
print("Learner workspace ready for Exercise 7.")

Code cell 24

# Solution
# Exercise 7 - reference solution

import numpy as np
from scipy import integrate, stats

def kl_analytic(mu1, s1, mu2, s2):
    return np.log(s2/s1) + (s1**2 + (mu1-mu2)**2)/(2*s2**2) - 0.5

def kl_numerical(mu1, s1, mu2, s2):
    p = stats.norm(mu1, s1)
    q = stats.norm(mu2, s2)
    def integrand(x):
        px, qx = p.pdf(x), q.pdf(x)
        return px*np.log(px/qx) if px>1e-300 and qx>1e-300 else 0.0
    val, _ = integrate.quad(integrand, mu1-15*s1, mu1+15*s1)
    return val

header('Exercise 7: KL Divergence and Entropy')

cases = [
    (0.0,1.0,0.0,1.0,  0.0),
    (1.0,1.0,0.0,1.0,  0.5),
    (0.0,2.0,0.0,1.0,  None),  # 2*ln(0.5)+1.5 = ~0.193
    (2.0,0.5,0.0,1.0,  None),
]
print('(a)/(b) KL Divergence:')
for mu1,s1,mu2,s2,expected in cases:
    ka = kl_analytic(mu1,s1,mu2,s2)
    kn = kl_numerical(mu1,s1,mu2,s2)
    ok = abs(ka-kn)<1e-4 and ka >= -1e-8
    print(f"  {'PASS' if ok else 'FAIL'} KL(N({mu1},{s1}²)||N({mu2},{s2}²)) = {ka:.6f} (numerical={kn:.6f})")

# (c) VAE KL term (d-dimensional, isotropic)
print('\n(c) VAE KL = ||mu||²/2:')
np.random.seed(42)
for mu_vec in [np.array([1.,0.,0.,0.]), np.array([1.,1.,0.,0.]), np.array([1.,1.,1.,1.])]:
    analytic_kl = 0.5 * np.dot(mu_vec, mu_vec)
    # MC estimate: E_p[log p - log q] = E_N(mu,I)[sum_j (mu_j*z_j - mu_j²/2)]
    n = 100000
    z = np.random.randn(n, len(mu_vec)) + mu_vec
    log_ratio = 0.5 * np.sum(mu_vec**2) + np.sum((z - mu_vec) * mu_vec, axis=1)
    mc_kl = np.mean(log_ratio)
    ok = abs(analytic_kl - mc_kl) < 0.05
    print(f"  {'PASS' if ok else 'FAIL'} mu={mu_vec}: ||mu||²/2={analytic_kl:.4f}, MC={mc_kl:.4f}")

# (d) Entropy of Gaussian
print('\n(d) Entropy of N(0,σ²):')
for sigma in [0.5, 1.0, 2.0]:
    H_analytic = 0.5 * np.log(2*np.pi*np.e*sigma**2)
    dist = stats.norm(0, sigma)
    def h_int(x): p=dist.pdf(x); return -p*np.log(p) if p>1e-300 else 0.0
    H_num, _ = integrate.quad(h_int, -20, 20)
    check_close(f'H[N(0,{sigma}²)] = {H_analytic:.6f}', H_analytic, H_num, tol=1e-4)

print('\nTakeaway: KL≥0 (Gibbs); VAE regularizes encoder toward N(0,I); entropy = integration of -p·log p.')

print("Exercise 7 solution complete.")

Exercise 8 (★★★): Monte Carlo Integration and Expected Loss

This exercise connects integration to the core of ML training.

(a) The expected loss is $\mathcal{L}(\theta) = \mathbb{E}_{x\sim p}[(x-\theta)^2]$ . For $p = N(\mu, \sigma^2)$ , compute $\mathcal{L}(\theta)$ analytically as a function of $\theta$ . Find the minimizer $\theta^*$ .

(b) Implement a Monte Carlo estimator $\hat{\mathcal{L}}(\theta) = \frac{1}{n}\sum_{i=1}^n (x_i-\theta)^2$ . Show the error $|\hat{\mathcal{L}} - \mathcal{L}|$ scales as $O(1/\sqrt{n})$ by testing $n \in \{10, 100, 1000, 10000\}$ .

(c) Implement gradient descent on $\hat{\mathcal{L}}(\theta)$ using mini-batches of size $B$ . Show that this converges to $\theta^* = \mu$ starting from $\theta_0 = 10$ .

(d) Compare convergence speed for batch sizes $B \in \{1, 8, 32, 128, 1000\}$ with the same total budget of $N=10000$ gradient steps. Explain why small batches (= Monte Carlo with few samples) can still converge.

Code cell 26

# Your Solution
# Exercise 8 - learner workspace
# Write your solution here, then run the reference solution below to compare.
print("Learner workspace ready for Exercise 8.")

Code cell 27

# Solution
# Exercise 8 - reference solution

import numpy as np

mu_true, sigma_true = 3.0, 1.5

def expected_loss(theta, mu=mu_true, sigma=sigma_true):
    # E[(X-theta)²] = Var[X] + (E[X]-theta)² = sigma² + (mu-theta)²
    return sigma**2 + (mu - theta)**2

def mc_loss(theta, n, seed=42):
    np.random.seed(seed)
    xs = np.random.normal(mu_true, sigma_true, n)
    return np.mean((xs - theta)**2)

header('Exercise 8: Monte Carlo and Expected Loss')

# (a)
theta_star = mu_true  # minimizer: dL/dtheta = -2(mu-theta) = 0 => theta = mu
L_star = expected_loss(theta_star)
check_close('L(theta*) = sigma² = Var[X]', L_star, sigma_true**2)
print(f'theta* = {theta_star}, L(theta*) = {L_star}')

# (b) MC error scaling
print('\n(b) MC estimator error scaling:')
ns = [10, 100, 1000, 10000, 100000]
errs = []
for n in ns:
    mc = mc_loss(theta_star, n)
    err = abs(mc - L_star)
    errs.append(err)
    print(f'  n={n:8d}: MC={mc:.6f}, err={err:.2e}')

# Check O(1/√n) scaling
log_ns = np.log(ns)
slope = np.polyfit(log_ns, np.log(np.array(errs)+1e-15), 1)[0]
check_close('MC error ~ O(1/√n): slope ≈ -0.5', slope, -0.5, tol=0.3)

# (c) Mini-batch GD
print('\n(c) Mini-batch GD convergence:')
for B in [1, 8, 32, 128, 1000]:
    np.random.seed(42)
    theta = 10.0
    lr = 0.1
    N_steps = 10000
    for step in range(N_steps):
        x_batch = np.random.normal(mu_true, sigma_true, B)
        grad = -2 * np.mean(x_batch - theta)  # d/dtheta mean(xi-theta)²
        theta -= lr * grad
    ok = abs(theta - mu_true) < 0.2
    print(f"  {'PASS' if ok else 'FAIL'} B={B:5d}: final theta={theta:.4f} (target {mu_true})")

print('\nTakeaway: E[L] is an integral; mini-batch = MC estimate of gradient; converges by CLT even for B=1.')

print("Exercise 8 solution complete.")

Exercise 9 (★★★): Expected Squared Error as an Integral

Let $X \sim \mathcal{N}(\mu,\sigma^2)$ and define

R(\theta)=\mathbb{E}[(X-\theta)^2].

Derive $R(\theta)=\sigma^2+(\mu-\theta)^2$ .
Verify the formula with numerical quadrature.
Show the minimizer is $\theta=\mu$ .

Code cell 29

# Your Solution
# Exercise 9 - learner workspace
# Use scipy.integrate.quad to compute the expected risk numerically.
print("Learner workspace ready for Exercise 9.")

Code cell 30

# Solution
# Exercise 9 - expected squared error integral
header("Exercise 9: expected squared error")

mu, sigma = 1.3, 0.8

def normal_pdf(x, mu=mu, sigma=sigma):
    return stats.norm.pdf(x, loc=mu, scale=sigma)

def risk_quad(theta):
    val, err = integrate.quad(lambda x: (x - theta)**2 * normal_pdf(x), mu - 8*sigma, mu + 8*sigma)
    return val

def risk_formula(theta):
    return sigma**2 + (mu - theta)**2

for theta in [-0.5, 1.3, 2.2]:
    q = risk_quad(theta)
    f = risk_formula(theta)
    print(f"theta={theta:+.2f}: quadrature={q:.8f}, formula={f:.8f}")
    check_close("risk formula", q, f, tol=1e-6)

thetas = np.linspace(mu - 2, mu + 2, 101)
best = thetas[np.argmin([risk_formula(t) for t in thetas])]
check_close("risk minimized at the mean", best, mu, tol=0.05)
print("Takeaway: supervised regression with squared loss targets conditional means.")

Exercise 10 (★★★): Partition Function of a One-Dimensional Energy Model

An unnormalized density has energy

E(x)=\frac{x^4}{4}-\frac{x^2}{2}, \qquad \tilde{p}(x)=e^{-E(x)}.

Compute the partition function

Z=\int_{-\infty}^{\infty} e^{-E(x)}\,dx

and use it to estimate probabilities of intervals.

Code cell 32

# Your Solution
# Exercise 10 - learner workspace
# Compute Z and interval probabilities by numerical integration.
print("Learner workspace ready for Exercise 10.")

Code cell 33

# Solution
# Exercise 10 - partition function by quadrature
header("Exercise 10: energy-model partition function")

def energy(x):
    return 0.25 * x**4 - 0.5 * x**2

def unnormalized(x):
    return np.exp(-energy(x))

Z, Zerr = integrate.quad(unnormalized, -np.inf, np.inf, epsabs=1e-10)
left_mass, _ = integrate.quad(unnormalized, -np.inf, 0.0)
center_mass, _ = integrate.quad(unnormalized, -1.0, 1.0)

p_left = left_mass / Z
p_center = center_mass / Z
print(f"Z={Z:.8f} (error estimate {Zerr:.2e})")
print(f"P(X<0)={p_left:.8f}, P(-1<X<1)={p_center:.8f}")
check_close("symmetry gives P(X<0)=1/2", p_left, 0.5, tol=1e-8)
check_true("center interval has nontrivial mass", 0.35 < p_center < 0.8)
print("Takeaway: normalization constants turn energy scores into probabilities.")

What to Review After Finishing

Exercise 1 — Riemann sums converge at different rates; midpoint = O(h²)
Exercise 2 — FTC Part 1: d/dx ∫f = f; Part 2: ∫f = F(b)-F(a)
Exercise 3 — u-substitution: u = inner function, transform limits
Exercise 4 — Integration by parts: LIATE order; tabular for repeated IBP
Exercise 5 — Trapezoid O(h²), Simpson's O(h⁴), MC O(1/√n)
Exercise 6 — p-integral convergence; Gamma function = factorial generalization
Exercise 7 — KL ≥ 0 (Gibbs); VAE KL = ||μ||²/2; entropy of Gaussian
Exercise 8 — Expected loss = integral; mini-batch SGD = Monte Carlo gradient

References

Stewart, J. Calculus: Early Transcendentals — Chapters 5–8 (techniques)
Apostol, T. Calculus Vol. 1 — rigorous treatment of Riemann integral
Chen, R.T.Q. et al. (2018). Neural Ordinary Differential Equations. NeurIPS.
Williams, R.J. (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning.
Kingma & Welling (2014). Auto-Encoding Variational Bayes. ICLR.

Integration