Part 1

21 min read6 headingsSplit lesson page

Lesson overview | Lesson overview | Next part

Riemannian Geometry: Part 1: Intuition

1. Intuition

Intuition develops the part of riemannian geometry specified by the approved Chapter 25 table of contents. The treatment is geometry-first and AI-facing.

1.1 Adding inner products to tangent spaces

Adding inner products to tangent spaces belongs to the canonical scope of Riemannian Geometry. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.

Working scope for this subsection: Riemannian metrics, curve length, induced distance, Riemannian gradients, metric tensors, connections, curvature previews, and information geometry. The recurring pattern is localize, linearize, measure, move, and return to the manifold.

g_p:T_pM\times T_pM\to \mathbb{R},\qquad g_p(\mathbf{u},\mathbf{v})=\langle \mathbf{u},\mathbf{v}\rangle_p.

Operational definition.

A tangent space is the vector space of allowable first-order velocities through a point on a manifold.

Worked reading.

For the unit sphere, tangent vectors at $\mathbf{x}$ are exactly vectors $\mathbf{v}$ satisfying $\mathbf{x}^\top\mathbf{v}=0$ .

Geometric object	Meaning	AI interpretation
Manifold $M$	Curved space with local coordinates	Data manifold, latent space, constraint set, parameter space
Chart $\varphi$	Local coordinate map	Local representation or embedding coordinates
Tangent space $T_pM$	Linearized directions at $p$	Local perturbations, gradients, velocities
Metric $g_p$	Inner product on $T_pM$	Geometry-aware length, angle, steepest descent
Geodesic	Straightest curved-space path	Latent interpolation, shortest motion, curved optimization path
Retraction	Practical map from tangent step back to $M$	Efficient constrained update in training loops

Three examples of adding inner products to tangent spaces:

Velocity of a curve on a sphere.
Jacobian pushing embedding perturbations forward.
A vector field assigning one tangent direction per point.

Two non-examples clarify the boundary:

An arbitrary ambient vector not tangent to the constraint.
A finite difference step that leaves the manifold without retraction.

Proof or verification habit for adding inner products to tangent spaces:

For embedded manifolds, differentiate the constraint; for abstract manifolds, use curves or derivations.

global object      -> curved manifold or constraint set
local object       -> chart, tangent space, or coordinate patch
linear operation   -> derivative, gradient, velocity, Hessian approximation
geometric measure  -> metric, length, distance, curvature
algorithmic move   -> tangent step followed by geodesic or retraction

In AI systems, adding inner products to tangent spaces matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.

Tangent spaces are where local sensitivity, Jacobians, and first-order optimization live.

Mini derivation lens.

Choose a point $p$ on the manifold $M$ and name the local representation used near $p$ .
Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
Check the invariant: the point remains on $M$ , the direction remains in $T_pM$ , or the distance/gradient uses the stated metric.

Implementation lens.

A practical ML implementation should store both the ambient array representation and the geometric contract attached to it. For example, a normalized embedding is not just a vector; it is a point on a sphere. An orthogonal weight matrix is not just a matrix; it is a point on a Stiefel-type constraint. A covariance matrix is not just a symmetric array; it must stay positive definite.

The clean computational pattern is: encode the state, compute an ambient derivative if needed, convert it into a tangent or metric-aware object, take a small local step, and then return to the manifold with a geodesic formula or retraction. This is the same pattern used in the companion notebooks, just scaled down to visible two- and three-dimensional examples.

The important warning is that coordinate code can pass shape checks while still violating geometry. Differential geometry adds checks that are semantic: tangentness, smooth compatibility, metric choice, path validity, and constraint preservation.

Practical checklist:

State the manifold and whether it is abstract, embedded, or quotient-like.
State the local coordinates or tangent representation being used.
Separate ambient vectors from tangent vectors.
Name the metric before computing distances, angles, or gradients.
Use geodesics or retractions when moving on the manifold.
For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.

Local diagnostic: Verify the proposed direction satisfies the tangent constraint.

The companion notebook uses low-dimensional synthetic examples: circles, spheres, tangent projections, spherical interpolation, SPD matrices, and orthogonality constraints. These examples keep geometry visible while preserving the same update logic used in higher-dimensional ML systems.

Compact ML phrase	Differential-geometric reading
local linearization	tangent-space approximation at a point
normalized embedding	point on a sphere with tangent constraints
natural gradient	Riemannian gradient under Fisher metric
orthogonal weights	point on a Stiefel-type manifold
latent interpolation	path that may need geodesic structure
covariance geometry	SPD manifold rather than arbitrary matrices

A useful learning move is to compute everything first on a sphere. The sphere has visible curvature, simple tangent spaces, closed-form geodesics, and practical retractions. Once those are clear, Stiefel, Grassmann, SPD, and information-geometric examples become less mysterious.

For implementation, the main discipline is to avoid leaving the manifold silently. If a gradient step violates a constraint, either project the gradient into the tangent space before stepping or use a method whose update is intrinsic by design.

The final question for this subsection is whether a Euclidean formula is being used as an approximation, a coordinate expression, or a mistaken replacement for geometry. Differential geometry is the habit of telling those cases apart.

1.2 Length angle and distance on curved spaces

Length angle and distance on curved spaces belongs to the canonical scope of Riemannian Geometry. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.

L(\gamma)=\int_a^b \sqrt{g_{\gamma(t)}(\dot{\gamma}(t),\dot{\gamma}(t))}\,dt.

Operational definition.

A Riemannian metric assigns an inner product to every tangent space smoothly.

Worked reading.

If a coordinate metric is $G(\mathbf{x})$ , then length of a velocity $\dot{\mathbf{x}}$ is $\sqrt{\dot{\mathbf{x}}^\top G(\mathbf{x})\dot{\mathbf{x}}}$ .

Geometric object	Meaning	AI interpretation
Manifold $M$	Curved space with local coordinates	Data manifold, latent space, constraint set, parameter space
Chart $\varphi$	Local coordinate map	Local representation or embedding coordinates
Tangent space $T_pM$	Linearized directions at $p$	Local perturbations, gradients, velocities
Metric $g_p$	Inner product on $T_pM$	Geometry-aware length, angle, steepest descent
Geodesic	Straightest curved-space path	Latent interpolation, shortest motion, curved optimization path
Retraction	Practical map from tangent step back to $M$	Efficient constrained update in training loops

Three examples of length angle and distance on curved spaces:

Euclidean metric on a sphere inherited from ambient space.
Fisher metric on statistical models.
Affine-invariant metric on SPD matrices.

Two non-examples clarify the boundary:

A distance formula with no tangent-space inner product.
A fixed Euclidean metric used after nonlinear reparameterization without checking geometry.

Proof or verification habit for length angle and distance on curved spaces:

Check symmetry, bilinearity, positive definiteness, and smooth variation with the base point.

global object      -> curved manifold or constraint set
local object       -> chart, tangent space, or coordinate patch
linear operation   -> derivative, gradient, velocity, Hessian approximation
geometric measure  -> metric, length, distance, curvature
algorithmic move   -> tangent step followed by geodesic or retraction

In AI systems, length angle and distance on curved spaces matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.

The metric determines what steepest descent, distance, and regularization mean for a representation or parameter space.

Mini derivation lens.

Choose a point $p$ on the manifold $M$ and name the local representation used near $p$ .
Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
Check the invariant: the point remains on $M$ , the direction remains in $T_pM$ , or the distance/gradient uses the stated metric.

Implementation lens.

Practical checklist:

State the manifold and whether it is abstract, embedded, or quotient-like.
State the local coordinates or tangent representation being used.
Separate ambient vectors from tangent vectors.
Name the metric before computing distances, angles, or gradients.
Use geodesics or retractions when moving on the manifold.
For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.

Local diagnostic: State the metric before computing lengths or gradients.

Compact ML phrase	Differential-geometric reading
local linearization	tangent-space approximation at a point
normalized embedding	point on a sphere with tangent constraints
natural gradient	Riemannian gradient under Fisher metric
orthogonal weights	point on a Stiefel-type manifold
latent interpolation	path that may need geodesic structure
covariance geometry	SPD manifold rather than arbitrary matrices

1.3 Why Euclidean gradients are coordinate-dependent

Why Euclidean gradients are coordinate-dependent belongs to the canonical scope of Riemannian Geometry. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.

g_p(\operatorname{grad} f,\mathbf{v})=df_p[\mathbf{v}]\quad \forall \mathbf{v}\in T_pM.

Operational definition.

The Riemannian gradient is the tangent vector whose inner product with any direction equals the directional derivative.

Worked reading.

In coordinates with metric matrix $G$ , the Riemannian gradient is $G^{-1} abla f$ , not usually the raw Euclidean gradient.

Geometric object	Meaning	AI interpretation
Manifold $M$	Curved space with local coordinates	Data manifold, latent space, constraint set, parameter space
Chart $\varphi$	Local coordinate map	Local representation or embedding coordinates
Tangent space $T_pM$	Linearized directions at $p$	Local perturbations, gradients, velocities
Metric $g_p$	Inner product on $T_pM$	Geometry-aware length, angle, steepest descent
Geodesic	Straightest curved-space path	Latent interpolation, shortest motion, curved optimization path
Retraction	Practical map from tangent step back to $M$	Efficient constrained update in training loops

Three examples of why euclidean gradients are coordinate-dependent:

Natural gradient using Fisher information.
Projected gradient on the sphere.
Geometry-aware update for SPD covariance matrices.

Two non-examples clarify the boundary:

Raw parameter gradient treated as invariant under reparameterization.
A direction off the tangent space called a manifold gradient.

Proof or verification habit for why euclidean gradients are coordinate-dependent:

Use the defining identity $g_p(\operatorname{grad} f,\mathbf{v})=df_p[\mathbf{v}]$ for all tangent directions.

global object      -> curved manifold or constraint set
local object       -> chart, tangent space, or coordinate patch
linear operation   -> derivative, gradient, velocity, Hessian approximation
geometric measure  -> metric, length, distance, curvature
algorithmic move   -> tangent step followed by geodesic or retraction

In AI systems, why euclidean gradients are coordinate-dependent matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.

Natural gradient and second-order preconditioning are geometry choices, not only optimizer tricks.

Mini derivation lens.

Choose a point $p$ on the manifold $M$ and name the local representation used near $p$ .
Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
Check the invariant: the point remains on $M$ , the direction remains in $T_pM$ , or the distance/gradient uses the stated metric.

Implementation lens.

Practical checklist:

State the manifold and whether it is abstract, embedded, or quotient-like.
State the local coordinates or tangent representation being used.
Separate ambient vectors from tangent vectors.
Name the metric before computing distances, angles, or gradients.
Use geodesics or retractions when moving on the manifold.
For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.

Local diagnostic: Ask which metric converts covectors into update vectors.

Compact ML phrase	Differential-geometric reading
local linearization	tangent-space approximation at a point
normalized embedding	point on a sphere with tangent constraints
natural gradient	Riemannian gradient under Fisher metric
orthogonal weights	point on a Stiefel-type manifold
latent interpolation	path that may need geodesic structure
covariance geometry	SPD manifold rather than arbitrary matrices

1.4 Curvature as changing geometry

Curvature as changing geometry belongs to the canonical scope of Riemannian Geometry. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.

\operatorname{grad} f=G^{-1}\nabla_{\mathbf{x}} f\quad\text{in local coordinates with metric matrix }G.

Operational definition.

A connection differentiates vector fields along curves while keeping the result tangent to the manifold; curvature measures how tangent spaces twist around loops.

Worked reading.

On a sphere, a tangent vector transported around a loop can rotate relative to its starting direction. That mismatch is curvature made visible.

Geometric object	Meaning	AI interpretation
Manifold $M$	Curved space with local coordinates	Data manifold, latent space, constraint set, parameter space
Chart $\varphi$	Local coordinate map	Local representation or embedding coordinates
Tangent space $T_pM$	Linearized directions at $p$	Local perturbations, gradients, velocities
Metric $g_p$	Inner product on $T_pM$	Geometry-aware length, angle, steepest descent
Geodesic	Straightest curved-space path	Latent interpolation, shortest motion, curved optimization path
Retraction	Practical map from tangent step back to $M$	Efficient constrained update in training loops

Three examples of curvature as changing geometry:

Levi-Civita connection.
Covariant derivative of a velocity field.
Curvature affecting geodesic spread.

Two non-examples clarify the boundary:

Ordinary derivative of a tangent vector that leaves the tangent space.
Curvature treated as only a visualization artifact.

Proof or verification habit for curvature as changing geometry:

For a first course, focus on compatibility and projection intuition; full curvature tensors are preview material here.

global object      -> curved manifold or constraint set
local object       -> chart, tangent space, or coordinate patch
linear operation   -> derivative, gradient, velocity, Hessian approximation
geometric measure  -> metric, length, distance, curvature
algorithmic move   -> tangent step followed by geodesic or retraction

In AI systems, curvature as changing geometry matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.

Curvature affects interpolation, optimization stability, and how local neighborhoods scale.

Mini derivation lens.

Choose a point $p$ on the manifold $M$ and name the local representation used near $p$ .
Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
Check the invariant: the point remains on $M$ , the direction remains in $T_pM$ , or the distance/gradient uses the stated metric.

Implementation lens.

Practical checklist:

State the manifold and whether it is abstract, embedded, or quotient-like.
State the local coordinates or tangent representation being used.
Separate ambient vectors from tangent vectors.
Name the metric before computing distances, angles, or gradients.
Use geodesics or retractions when moving on the manifold.
For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.

Local diagnostic: Distinguish differentiating coordinates from differentiating geometric vector fields.

Compact ML phrase	Differential-geometric reading
local linearization	tangent-space approximation at a point
normalized embedding	point on a sphere with tangent constraints
natural gradient	Riemannian gradient under Fisher metric
orthogonal weights	point on a Stiefel-type manifold
latent interpolation	path that may need geodesic structure
covariance geometry	SPD manifold rather than arbitrary matrices

1.5 Information geometry and natural gradients

Information geometry and natural gradients belongs to the canonical scope of Riemannian Geometry. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.

g_p:T_pM\times T_pM\to \mathbb{R},\qquad g_p(\mathbf{u},\mathbf{v})=\langle \mathbf{u},\mathbf{v}\rangle_p.

Operational definition.

The Riemannian gradient is the tangent vector whose inner product with any direction equals the directional derivative.

Worked reading.

In coordinates with metric matrix $G$ , the Riemannian gradient is $G^{-1} abla f$ , not usually the raw Euclidean gradient.

Geometric object	Meaning	AI interpretation
Manifold $M$	Curved space with local coordinates	Data manifold, latent space, constraint set, parameter space
Chart $\varphi$	Local coordinate map	Local representation or embedding coordinates
Tangent space $T_pM$	Linearized directions at $p$	Local perturbations, gradients, velocities
Metric $g_p$	Inner product on $T_pM$	Geometry-aware length, angle, steepest descent
Geodesic	Straightest curved-space path	Latent interpolation, shortest motion, curved optimization path
Retraction	Practical map from tangent step back to $M$	Efficient constrained update in training loops

Three examples of information geometry and natural gradients:

Natural gradient using Fisher information.
Projected gradient on the sphere.
Geometry-aware update for SPD covariance matrices.

Two non-examples clarify the boundary:

Raw parameter gradient treated as invariant under reparameterization.
A direction off the tangent space called a manifold gradient.

Proof or verification habit for information geometry and natural gradients:

Use the defining identity $g_p(\operatorname{grad} f,\mathbf{v})=df_p[\mathbf{v}]$ for all tangent directions.

global object      -> curved manifold or constraint set
local object       -> chart, tangent space, or coordinate patch
linear operation   -> derivative, gradient, velocity, Hessian approximation
geometric measure  -> metric, length, distance, curvature
algorithmic move   -> tangent step followed by geodesic or retraction

In AI systems, information geometry and natural gradients matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.

Natural gradient and second-order preconditioning are geometry choices, not only optimizer tricks.

Mini derivation lens.

Choose a point $p$ on the manifold $M$ and name the local representation used near $p$ .
Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
Check the invariant: the point remains on $M$ , the direction remains in $T_pM$ , or the distance/gradient uses the stated metric.

Implementation lens.

Practical checklist:

State the manifold and whether it is abstract, embedded, or quotient-like.
State the local coordinates or tangent representation being used.
Separate ambient vectors from tangent vectors.
Name the metric before computing distances, angles, or gradients.
Use geodesics or retractions when moving on the manifold.
For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.

Local diagnostic: Ask which metric converts covectors into update vectors.

Compact ML phrase	Differential-geometric reading
local linearization	tangent-space approximation at a point
normalized embedding	point on a sphere with tangent constraints
natural gradient	Riemannian gradient under Fisher metric
orthogonal weights	point on a Stiefel-type manifold
latent interpolation	path that may need geodesic structure
covariance geometry	SPD manifold rather than arbitrary matrices

Riemannian Geometry: Part 1 - Intuition

Riemannian Geometry: Part 1: Intuition

1. Intuition

1.1 Adding inner products to tangent spaces

1.2 Length angle and distance on curved spaces

1.3 Why Euclidean gradients are coordinate-dependent

1.4 Curvature as changing geometry

1.5 Information geometry and natural gradients

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?