NotesMath for LLMs

Graph Basics

Graph Theory / Graph Basics

Notes

"A graph is the mathematician's way of saying: I don't care what the objects are - I care how they're connected. This single abstraction unifies social networks, molecular structures, knowledge bases, parse trees, and the computation graphs inside every neural network."

Overview

A graph is a mathematical structure that captures pairwise relationships between objects. Formally, a graph G=(V,E)G = (V, E) consists of a set of vertices (or nodes) VV and a set of edges EE connecting pairs of vertices. This deceptively simple definition gives rise to one of the richest and most applicable branches of mathematics.

For AI and machine learning, graphs are not merely a data structure - they are the native language of relational reasoning. Every knowledge graph (Wikidata, Freebase), every molecular structure (drug discovery, protein folding), every social network (recommendation systems), every dependency parse tree (NLP), and every computation graph (PyTorch, JAX autograd) is a graph. The adjacency matrix of a graph is a matrix, connecting graph theory directly to linear algebra. The eigenvalues of graph matrices reveal community structure, connecting graph theory to spectral methods. And the message-passing paradigm of Graph Neural Networks (Kipf & Welling, 2017; Velickovic et al., 2018) is a direct formalization of how information propagates through graph structure.

This section builds the foundational vocabulary and core theory of graphs from first principles. We define graphs rigorously, develop the theory of degree, paths, connectivity, and special graph families, and introduce graph isomorphism and coloring. Every concept is connected to its role in modern AI systems. Later sections in this chapter build on this foundation: 02 covers how to store graphs computationally, 03 develops classical algorithms, 04 introduces spectral methods, 05 covers Graph Neural Networks, and 06 explores random graph models.

Prerequisites

Companion Notebooks

NotebookDescription
theory.ipynbInteractive demonstrations: graph construction, degree analysis, connectivity, coloring, WL test
exercises.ipynb8 graded exercises: handshaking lemma through Weisfeiler-Leman and real-world graph modeling

Learning Objectives

After completing this section, you will:

  • Define graphs formally and distinguish directed, undirected, weighted, and simple variants
  • Compute vertex degrees, verify the handshaking lemma, and analyse degree sequences
  • Distinguish walks, trails, paths, and cycles and compute graph distance and diameter
  • Determine connectivity, identify connected components, and find bridges and articulation points
  • Recognise and verify special graph families: complete, bipartite, trees, DAGs, planar
  • Construct subgraphs, induced subgraphs, complements, and graph products
  • Define graph isomorphism and apply the Weisfeiler-Leman test to distinguish non-isomorphic graphs
  • Apply graph coloring and compute chromatic numbers for small graphs
  • Connect every graph concept to its role in GNNs, knowledge graphs, molecular modeling, and computation graphs

Table of Contents


1. Intuition

1.1 What Is a Graph?

At its most elemental, a graph is a collection of things and connections between things. The "things" are called vertices (or nodes), and the connections are called edges (or links). This definition is deliberately vague about what the things are - and that is exactly the point. The same mathematical framework that describes friendships between people also describes bonds between atoms, links between web pages, and data flow between operations in a neural network.

Consider a simple social network:

SOCIAL NETWORK AS A GRAPH
========================================================================

  Alice ------- Bob              Vertices: {Alice, Bob, Carol, Dave}
    |          / |               Edges:    {Alice-Bob, Alice-Carol,
    |        /   |                          Bob-Carol, Bob-Dave}
    |      /     |
  Carol -/     Dave              4 vertices, 4 edges

========================================================================

The graph abstracts away everything except the relational structure: who knows whom. It doesn't matter whether Alice is 25 or 65, whether she lives in Tokyo or Toronto. The graph captures only the connectivity pattern.

For AI: This abstraction is precisely why graphs are so powerful in machine learning. A graph neural network processes the structure of relationships, not the raw pixel data or text. When DeepMind's AlphaFold predicts protein structure, it represents amino acid residues as vertices and spatial proximity as edges - the graph structure encodes the physics that determines how the protein folds.

1.2 Why Graphs Matter for AI

Graphs appear in AI in three fundamentally different roles:

1. Graphs as input data. Many real-world datasets are naturally graph-structured: social networks, citation networks, molecular structures, protein interaction networks, scene graphs in computer vision, abstract syntax trees in code analysis. Graph Neural Networks (GNNs) process these directly.

2. Graphs as computational structure. Every neural network defines a directed acyclic graph (DAG) - the computation graph. Vertices are operations (matmul, add, activation), edges carry tensors. Backpropagation is reverse topological traversal of this DAG. PyTorch's autograd and JAX's tracing both build and traverse computation graphs.

3. Graphs as knowledge representation. Knowledge graphs (Wikidata: 100B+ triples, Google Knowledge Graph) represent entities as vertices and relations as labeled edges. Retrieval-augmented generation (RAG) systems traverse these graphs to ground LLM responses in factual knowledge.

THREE ROLES OF GRAPHS IN AI
========================================================================

  +-----------------+   +-----------------+   +-----------------+
  |  GRAPHS AS DATA  |   | GRAPHS AS COMP. |   | GRAPHS AS       |
  |                  |   |   STRUCTURE     |   |  KNOWLEDGE      |
  +------------------+   +-----------------+   +-----------------+
  | Molecules        |   | Autograd DAGs   |   | Wikidata        |
  | Social networks  |   | Model arch.     |   | Google KG       |
  | Citation graphs  |   | Dataflow graphs  |   | ConceptNet      |
  | Scene graphs     |   | Compiler IR     |   | RAG retrieval   |
  +------------------+   +-----------------+   +-----------------+
       GNNs process           Backprop              LLMs query
       these directly          traverses              these for
                               these                  grounding

========================================================================

1.3 Graphs Are Everywhere

To build intuition for the diversity of graphs, consider these concrete examples:

Chemistry: A molecule is a graph where atoms are vertices and chemical bonds are edges. Benzene (C6H6C_6H_6) is a cycle graph C6C_6 with alternating single and double bonds. Drug discovery uses GNNs to predict molecular properties from graph structure (Gilmer et al., 2017).

Natural Language Processing: A dependency parse tree is a directed graph where words are vertices and grammatical relationships are edges. The sentence "The cat sat on the mat" produces a tree rooted at "sat" with edges to "cat" (subject) and "on" (preposition). Abstract Meaning Representations (AMRs) are DAGs used in semantic parsing.

Social Networks: Facebook's social graph has ~3 billion vertices (users) and hundreds of billions of edges (friendships). The degree distribution follows a power law - most users have few friends, a few "hub" users have thousands.

The Internet: The World Wide Web is a directed graph where pages are vertices and hyperlinks are edges. Google's PageRank algorithm (Page et al., 1999) - the foundation of web search - computes the dominant eigenvector of a modified adjacency matrix.

Biology: Protein-protein interaction networks, gene regulatory networks, neural connectomes (the wiring diagram of a brain) - all are graphs. The human connectome has ~86 billion vertices (neurons) and ~100 trillion edges (synapses).

1.4 The Language of Connections

Graph theory provides a precise vocabulary for describing structural properties that arise across all these domains:

  • How many connections does each object have? -> Degree
  • Can we reach any object from any other? -> Connectivity
  • What is the shortest route between two objects? -> Distance, diameter
  • Are there natural groupings? -> Components, communities
  • Can we split objects into two non-interacting groups? -> Bipartiteness
  • Is there redundancy in connections? -> Cycles
  • Are two networks "the same" structurally? -> Isomorphism

Each of these questions has a rigorous mathematical formalization that we develop in this section. And each has direct consequences for AI: the degree distribution determines how information spreads through a GNN, connectivity determines whether a computation graph can propagate gradients, and the Weisfeiler-Leman isomorphism test is provably equivalent to the expressive power of message-passing GNNs (Xu et al., 2019).

1.5 Historical Timeline

YearMilestoneSignificance
1736Euler solves Konigsberg bridge problemBirth of graph theory - first proof that no Eulerian circuit exists
1852Guthrie poses the four color conjectureSparked 124 years of research; proved computationally in 1976
1878Cayley enumerates treesFirst systematic study of tree structures
1936Konig publishes first graph theory textbookFormalised bipartite matching, Konig's theorem
1959Erdos and Renyi introduce random graphsFoundation of probabilistic graph theory
1962Dijkstra publishes shortest-path algorithmStill used in network routing today
1970sNP-completeness results (graph coloring, Hamiltonian cycle)Showed fundamental limits of graph computation
1976Appel and Haken prove the four color theoremFirst major theorem proved by computer
1998Watts and Strogatz model small-world networksExplained "six degrees of separation"
1999Page et al. publish PageRankGraph eigenvector as the basis of web search
2005Semantic Web and knowledge graphs emergeGraphs as structured knowledge for AI
2017Kipf and Welling publish GCNSpectral graph convolutions for node classification
2018Velickovic et al. publish GATAttention mechanisms on graphs
2019Xu et al. prove GNN-WL equivalence1-WL test bounds message-passing GNN expressiveness
2024Graph Transformers (GPS, Graphormer)Combining graph structure with transformer attention
2026Heterogeneous graph models at scaleMulti-relational graphs for knowledge-augmented LLMs

1.6 What This Section Covers vs. What Comes Later

This section (01) is the vocabulary and core theory of graph theory. It defines what graphs are, introduces the fundamental properties (degree, paths, connectivity, special families, isomorphism, coloring), and provides the mathematical language used throughout the rest of the chapter.

What comes next builds on this vocabulary:

  • 02 Graph Representations - How to store graphs in memory (adjacency matrix, adjacency list, CSR format)
  • 03 Graph Algorithms - Algorithms that compute the properties defined here (BFS, DFS, shortest paths, MST)
  • 04 Spectral Graph Theory - The eigenvalues of graph matrices reveal structural properties
  • 05 Graph Neural Networks - Neural architectures that learn from graph structure
  • 06 Random Graphs - Probabilistic models of graph generation

2. Formal Definitions

2.1 The Graph G=(V,E)G = (V, E)

Definition (Undirected Graph). An undirected graph is an ordered pair G=(V,E)G = (V, E) where:

  • VV is a finite, non-empty set called the vertex set (or node set)
  • E(V2)E \subseteq \binom{V}{2} is a set of unordered pairs of distinct vertices, called the edge set

Here (V2)={{u,v}:u,vV,uv}\binom{V}{2} = \{\{u, v\} : u, v \in V, u \neq v\} denotes the set of all 2-element subsets of VV.

We write n=Vn = |V| for the order of GG (number of vertices) and m=Em = |E| for the size of GG (number of edges).

Notation. An edge between vertices uu and vv is written as {u,v}\{u, v\}, uvuv, or (u,v)(u, v) depending on context. In this section we use {u,v}\{u, v\} for undirected edges and (u,v)(u, v) for directed edges.

Example 1: The triangle graph K3K_3.

V={1,2,3},E={{1,2},{1,3},{2,3}}V = \{1, 2, 3\}, \quad E = \{\{1,2\}, \{1,3\}, \{2,3\}\}
THE TRIANGLE GRAPH K_3
========================================================================

      1                 n = 3 (order)
     / \                m = 3 (size)
    /   \               Every pair connected -> "complete graph"
   2 --- 3

========================================================================

Example 2: The path graph P4P_4.

V={1,2,3,4},E={{1,2},{2,3},{3,4}}V = \{1, 2, 3, 4\}, \quad E = \{\{1,2\}, \{2,3\}, \{3,4\}\}
   1 -- 2 -- 3 -- 4          n = 4, m = 3

Example 3: The cycle graph C5C_5.

V={1,2,3,4,5},E={{1,2},{2,3},{3,4},{4,5},{5,1}}V = \{1, 2, 3, 4, 5\}, \quad E = \{\{1,2\}, \{2,3\}, \{3,4\}, \{4,5\}, \{5,1\}\}

For AI: When a GNN processes a molecular graph, each atom is a vertex in VV and each chemical bond is an edge in EE. The formal definition ensures that the graph is well-defined regardless of how atoms are labeled - only the connection pattern matters.

Key terminology summary:

TermSymbolMeaning
Vertex (node)vVv \in VAn object in the graph
Edge (link)eEe \in EA connection between two vertices
Ordern=Vn = \lvert V \rvertNumber of vertices
Sizem=Em = \lvert E \rvertNumber of edges
Adjacentuvu \sim vVertices u,vu, v connected by an edge
Incidentvev \in eVertex vv is an endpoint of edge ee
NeighbouruN(v)u \in \mathcal{N}(v)uu is adjacent to vv
NeighbourhoodN(v)\mathcal{N}(v)Set of all neighbours of vv
Endpointu,vu, v of {u,v}\{u,v\}The two vertices of an edge

Maximum number of edges. A simple undirected graph on nn vertices has at most (n2)=n(n1)2\binom{n}{2} = \frac{n(n-1)}{2} edges (achieved by KnK_n). A simple directed graph has at most n(n1)n(n-1) arcs. A graph with mm close to (n2)\binom{n}{2} is called dense; a graph with m=O(n)m = O(n) is called sparse. Most real-world graphs are sparse.

2.2 Directed vs. Undirected Graphs

Definition (Directed Graph / Digraph). A directed graph (or digraph) is an ordered pair G=(V,E)G = (V, E) where EV×VE \subseteq V \times V is a set of ordered pairs of vertices. An edge (u,v)(u, v) is called an arc directed from uu (the tail) to vv (the head).

The key distinction: in an undirected graph, {u,v}={v,u}\{u, v\} = \{v, u\} - the edge has no direction. In a digraph, (u,v)(v,u)(u, v) \neq (v, u) - the arc goes from uu to vv, not the reverse.

UNDIRECTED vs. DIRECTED
========================================================================

  Undirected:              Directed:
   A --- B                  A ----> B
   |     |                  up       |
   |     |                  |       down
   D --- C                  D <---- C

  {A,B} = {B,A}            (A,B) \neq (B,A)
  "A knows B"              "A follows B"
  Symmetric relation        Asymmetric relation

========================================================================

Example (Undirected): A social network where friendship is mutual - if Alice is friends with Bob, then Bob is friends with Alice. The edge set is symmetric.

Example (Directed): Twitter's follow graph - Alice can follow Bob without Bob following Alice. Citation networks - paper A cites paper B, but B does not cite A.

Example (Directed, AI-critical): A computation graph in PyTorch. Each operation node receives input tensors along incoming arcs and produces output tensors along outgoing arcs. The direction encodes dataflow: gradients propagate in the reverse direction during backpropagation.

Non-example: A "graph" where edges connect a vertex to itself (self-loop) - this violates the uvu \neq v requirement for simple graphs. We address self-loops in 2.4.

2.3 Weighted Graphs

Definition (Weighted Graph). A weighted graph is a triple G=(V,E,w)G = (V, E, w) where (V,E)(V, E) is a graph and w:ERw: E \to \mathbb{R} is a weight function assigning a real number to each edge.

Common weight interpretations:

  • Distance/cost: In a road network, w({u,v})w(\{u,v\}) is the distance between cities uu and vv
  • Similarity/affinity: In a kk-NN graph, w({u,v})=exp(xuxv2/2σ2)w(\{u,v\}) = \exp(-\lVert \mathbf{x}_u - \mathbf{x}_v \rVert^2 / 2\sigma^2) (Gaussian kernel)
  • Capacity: In a flow network, w((u,v))w((u,v)) is the maximum flow along the arc
  • Probability: In a Markov chain, w((u,v))=P(transition uv)w((u,v)) = P(\text{transition } u \to v)

For AI: Attention weights in a transformer define a weighted directed graph over token positions. The attention matrix softmax(QK/dk)\operatorname{softmax}(QK^\top / \sqrt{d_k}) is precisely the weight matrix of a weighted digraph where w((i,j))w((i,j)) is the attention that position ii pays to position jj. Graph Attention Networks (GAT, Velickovic et al., 2018) learn these weights as a function of node features.

Non-example (not a weighted graph): A matrix ARn×nA \in \mathbb{R}^{n \times n} with negative entries - this can be interpreted as a signed weighted graph, but the standard weighted graph definition allows w:ERw: E \to \mathbb{R} including negative weights. The distinction matters: negative weights break shortest-path algorithms like Dijkstra but are handled by Bellman-Ford.

2.4 Simple Graphs, Multigraphs, and Pseudographs

The formal definition in 2.1 defines a simple graph: no self-loops and no parallel edges. Real-world graphs often violate these constraints:

Definition (Self-loop). A self-loop is an edge from a vertex to itself: {v,v}\{v, v\} or (v,v)(v, v).

Definition (Multigraph). A multigraph allows multiple edges between the same pair of vertices. Formally, EE is a multiset rather than a set.

Definition (Pseudograph). A pseudograph allows both self-loops and multiple edges.

Graph TypeSelf-loopsParallel EdgesExample
Simple graphNoNoSocial network (at most one friendship)
MultigraphNoYesTransportation network (multiple bus routes)
PseudographYesYesState machine (self-transitions)

For AI: Most GNN frameworks assume simple graphs. Self-loops are added explicitly in GCN: A~=A+In\tilde{A} = A + I_n adds a self-loop to every vertex, ensuring that each node's own features are included in the aggregation step. This is a deliberate design choice, not a property of the input graph.

Non-example: A relation "is the same as" on a set - every element is related to itself, producing self-loops on every vertex. This is a pseudograph, not a simple graph.

2.5 Standard Examples and Non-Examples

Example 1: The Petersen graph. A famous graph with 10 vertices and 15 edges, 3-regular (every vertex has degree 3). It is a counterexample to many graph theory conjectures - it is not Hamiltonian, not edge-4-colorable, and not planar.

Example 2: The Karate Club graph. Zachary's karate club (1977) - 34 members, 78 edges representing social interactions. The graph split into two communities when the club divided, making it a standard benchmark for community detection. This is the "MNIST of graph learning."

Example 3: A molecule. Water (H2OH_2O) has 3 vertices (O, H, H) and 2 edges (O-H bonds). Caffeine (C8H10N4O2C_8H_{10}N_4O_2) has 24 vertices and 25 edges.

Non-example 1: A set without edges. The pair ({1,2,3},)(\{1,2,3\}, \emptyset) is technically a valid graph (the "empty graph" on 3 vertices), but it has no edges - no relationships are captured.

Non-example 2: A "graph" with edges between non-existent vertices. If V={1,2}V = \{1,2\} and E={{1,3}}E = \{\{1,3\}\}, this is not a valid graph because vertex 3 is not in VV. Edges must connect vertices in the vertex set.

Non-example 3: An ordered list. The sequence [1,2,3,4][1, 2, 3, 4] is not a graph - it has no explicit edge set. However, we can construct a path graph from it: V={1,2,3,4}V = \{1,2,3,4\}, E={{1,2},{2,3},{3,4}}E = \{\{1,2\}, \{2,3\}, \{3,4\}\}.

2.6 The Null Graph, Empty Graph, and Trivial Graph

These edge cases clarify the boundaries of the definition:

Definition (Trivial Graph). The graph with exactly one vertex and no edges: G=({v},)G = (\{v\}, \emptyset). This is the simplest possible graph.

Definition (Empty Graph). A graph G=(V,)G = (V, \emptyset) with n1n \geq 1 vertices and no edges. Also called an edgeless graph or independent set. Denoted Kˉn\bar{K}_n (the complement of the complete graph).

Definition (Null Graph). The graph with no vertices and no edges: G=(,)G = (\emptyset, \emptyset). Some authors exclude this, requiring VV \neq \emptyset. In this curriculum, we require V1|V| \geq 1.

For AI: The empty graph Kˉn\bar{K}_n appears when constructing GNN inputs: before adding edges (via kk-NN or radius graphs), the initial graph has nn nodes with features but no connections. The GCN self-loop trick A~=A+I\tilde{A} = A + I on an empty graph gives A~=I\tilde{A} = I - each node only aggregates its own features, equivalent to a standard MLP.


3. Degree and Degree Sequences

3.1 Vertex Degree

Definition (Degree). For an undirected graph G=(V,E)G = (V, E), the degree of a vertex vVv \in V is the number of edges incident to vv:

deg(v)={eE:ve}\deg(v) = |\{e \in E : v \in e\}|

Equivalently, deg(v)\deg(v) is the number of neighbours of vv: the vertices adjacent to vv.

Definition (In-degree and Out-degree). For a directed graph G=(V,E)G = (V, E):

deg+(v)={(v,u)E}(out-degree: edges leaving v)\deg^+(v) = |\{(v, u) \in E\}| \quad \text{(out-degree: edges leaving } v\text{)} deg(v)={(u,v)E}(in-degree: edges entering v)\deg^-(v) = |\{(u, v) \in E\}| \quad \text{(in-degree: edges entering } v\text{)}

The total degree of a vertex in a digraph is deg(v)=deg+(v)+deg(v)\deg(v) = \deg^+(v) + \deg^-(v).

DEGREE EXAMPLES
========================================================================

  Undirected:                    Directed:
      A --- B --- C                 A ---> B ---> C
      |           |                 up           |
      D ---------- E                D <--------- E

  deg(A) = 2  (B, D)           deg^+(A) = 1, deg^-(A) = 1
  deg(B) = 2  (A, C)           deg^+(B) = 1, deg^-(B) = 1
  deg(C) = 2  (B, E)           deg^+(C) = 0, deg^-(C) = 1  (sink)
  deg(D) = 2  (A, E)           deg^+(D) = 1, deg^-(D) = 1
  deg(E) = 2  (C, D)           deg^+(E) = 1, deg^-(E) = 1

========================================================================

For AI: In a GNN, a vertex's degree determines how many messages it receives during message passing. Vertices with very high degree (hubs) aggregate information from many neighbours, while vertices with degree 1 (leaves) receive information from a single source. This is why GCN normalises by 1/deg(u)deg(v)1/\sqrt{\deg(u)\deg(v)} - to prevent high-degree nodes from dominating the aggregation.

Notation (Adjacency matrix connection). If AA is the adjacency matrix of an undirected graph, then deg(vi)=j=1nAij\deg(v_i) = \sum_{j=1}^n A_{ij} - the row sum. The degree matrix D=diag(deg(v1),,deg(vn))D = \operatorname{diag}(\deg(v_1), \ldots, \deg(v_n)) is the diagonal matrix of degrees. Both AA and DD are central to spectral graph theory (04).

3.2 The Handshaking Lemma

Theorem (Handshaking Lemma). For any undirected graph G=(V,E)G = (V, E):

vVdeg(v)=2E\sum_{v \in V} \deg(v) = 2|E|

Proof. Each edge {u,v}E\{u, v\} \in E contributes exactly 1 to deg(u)\deg(u) and exactly 1 to deg(v)\deg(v). Thus each edge contributes exactly 2 to the total degree sum. Summing over all edges gives vVdeg(v)=2E\sum_{v \in V} \deg(v) = 2|E|. \square

Corollary. The number of vertices with odd degree is always even. (If there were an odd number of odd-degree vertices, the total degree sum would be odd, contradicting 2E2|E| being even.)

Directed version. For a digraph: vVdeg+(v)=vVdeg(v)=E\sum_{v \in V} \deg^+(v) = \sum_{v \in V} \deg^-(v) = |E|. Every arc has exactly one tail and one head.

Example. The Petersen graph: 10 vertices, each of degree 3. Total degree sum =10×3=30=2×15= 10 \times 3 = 30 = 2 \times 15 edges. \checkmark

Example. The star graph S5S_5 (one center connected to 4 leaves): deg(center)=4\deg(\text{center}) = 4, deg(leafi)=1\deg(\text{leaf}_i) = 1 for i=1,,4i = 1, \ldots, 4. Sum =4+1+1+1+1=8=2×4= 4 + 1 + 1 + 1 + 1 = 8 = 2 \times 4 edges. \checkmark

Application: Average degree. The average degree of a graph is dˉ=1nvVdeg(v)=2mn\bar{d} = \frac{1}{n}\sum_{v \in V}\deg(v) = \frac{2m}{n}. For sparse graphs (m=O(n)m = O(n)), the average degree is O(1)O(1). For dense graphs (m=Θ(n2)m = \Theta(n^2)), the average degree is Θ(n)\Theta(n).

GraphnnmmAverage degree 2m/n2m/n
KnK_nnnn(n1)/2n(n-1)/2n1n - 1
CnC_nnnnn22
Tree on nn verticesnnn1n-122/n22 - 2/n \approx 2
Km,nK_{m,n}m+nm+nmnmn2mn/(m+n)2mn/(m+n)

For AI: The handshaking lemma provides a quick sanity check when constructing graphs for GNN input. If you compute degrees from an edge list and the sum is odd, your graph construction has a bug. In large-scale graph datasets (OGB, ogbg-molhiv), this is a standard validation step. The average degree also determines the computational cost of one GNN layer: each message-passing step processes O(m)=O(ndˉ/2)O(m) = O(n\bar{d}/2) edges.

3.3 Degree Sequences

Definition (Degree Sequence). The degree sequence of a graph GG is the list of vertex degrees arranged in non-increasing order:

d1d2dnd_1 \geq d_2 \geq \cdots \geq d_n

Example. The complete bipartite graph K2,3K_{2,3}: vertices on one side have degree 3, vertices on the other have degree 2. Degree sequence: (3,3,2,2,2)(3, 3, 2, 2, 2).

Definition (Graphic Sequence). A non-increasing sequence of non-negative integers (d1,,dn)(d_1, \ldots, d_n) is graphic if there exists a simple graph with that degree sequence.

Theorem (Erdos-Gallai, 1960). A non-increasing sequence (d1,d2,,dn)(d_1, d_2, \ldots, d_n) of non-negative integers with even sum is graphic if and only if for each k{1,2,,n}k \in \{1, 2, \ldots, n\}:

i=1kdik(k1)+i=k+1nmin(di,k)\sum_{i=1}^k d_i \leq k(k-1) + \sum_{i=k+1}^n \min(d_i, k)

Example. Is (3,3,2,2,2)(3, 3, 2, 2, 2) graphic? Sum =12= 12 (even). Check k=1k = 1: 30+min(3,1)+min(2,1)+min(2,1)+min(2,1)=0+1+1+1+1=43 \leq 0 + \min(3,1) + \min(2,1) + \min(2,1) + \min(2,1) = 0 + 1+1+1+1 = 4. Yes. (Full check confirms graphicality - it is the degree sequence of K2,3K_{2,3}.)

Non-example. Is (3,3,3,1)(3, 3, 3, 1) graphic? Sum =10= 10 (even), n=4n = 4. Check k=1k = 1: 30+min(3,1)+min(3,1)+min(1,1)=33 \leq 0 + \min(3,1) + \min(3,1) + \min(1,1) = 3. Equality holds. Check k=3k = 3: 3+3+3=932+min(1,3)=6+1=73+3+3 = 9 \leq 3 \cdot 2 + \min(1,3) = 6 + 1 = 7. Fails! Not graphic - you cannot build a simple graph on 4 vertices where three vertices have degree 3 and one has degree 1.

3.4 Regular Graphs

Definition (kk-Regular Graph). A graph is kk-regular if every vertex has degree kk: deg(v)=k\deg(v) = k for all vVv \in V.

Examples:

  • 0-regular: The empty graph Kˉn\bar{K}_n (no edges)
  • 1-regular: A perfect matching (disjoint edges covering all vertices; requires even nn)
  • 2-regular: A disjoint union of cycles
  • 3-regular (cubic): The Petersen graph, the complete bipartite graph K3,3K_{3,3}
  • (n1)(n-1)-regular: The complete graph KnK_n (every vertex connected to every other)

Proposition. A kk-regular graph on nn vertices has exactly m=kn/2m = kn/2 edges (by the handshaking lemma). This requires knkn to be even.

For AI: Regular graphs are the "uniform" case in GNN analysis. On a kk-regular graph, the GCN normalisation factor 1/deg(u)deg(v)=1/k1/\sqrt{\deg(u)\deg(v)} = 1/k is constant, so GCN reduces to simple averaging of neighbour features. This makes regular graphs the easiest case for theoretical GNN analysis (e.g., proving the WL-equivalence result of Xu et al., 2019).

3.5 Degree Distributions in Real-World Graphs

Real-world graphs rarely have uniform degree distributions. Two patterns dominate:

Poisson degree distribution. In Erdos-Renyi random graphs G(n,p)G(n, p), the degree of each vertex is approximately Binomial(n1,p)Poisson(λ)\operatorname{Binomial}(n-1, p) \approx \operatorname{Poisson}(\lambda) where λ=(n1)p\lambda = (n-1)p. Most vertices have degree close to λ\lambda.

Power-law degree distribution. In many real-world networks (social, citation, web), the fraction of vertices with degree kk follows:

P(deg=k)kγ,γ(2,3) typicallyP(\deg = k) \propto k^{-\gamma}, \quad \gamma \in (2, 3) \text{ typically}

This is called a scale-free distribution. It means a few "hub" vertices have enormously high degree while most vertices have low degree.

NetworkTypeTypical γ\gamma
WWW (out-degree)Scale-freeγ2.1\gamma \approx 2.1
Citation networksScale-freeγ3.0\gamma \approx 3.0
Protein interactionsScale-freeγ2.4\gamma \approx 2.4
Erdos-RenyiPoissonN/A
Road networksNear-uniformN/A (deg34\deg \approx 3{-}4)

For AI: Power-law degree distributions create challenges for GNNs. Hub vertices with degree >1000>1000 dominate mini-batch sampling (GraphSAGE, Hamilton et al., 2017) and can cause representation collapse - the "over-squashing" problem. Solutions include degree-based sampling, virtual nodes, and graph transformers that bypass local message passing.

Clustering coefficient. Beyond degree, an important local structural measure is the clustering coefficient, which quantifies how clustered a vertex's neighbourhood is.

Definition (Local Clustering Coefficient). For a vertex vv with deg(v)2\deg(v) \geq 2:

C(v)=number of edges among neighbours of v(deg(v)2)C(v) = \frac{\text{number of edges among neighbours of } v}{\binom{\deg(v)}{2}}

This is the fraction of possible edges among vv's neighbours that actually exist. C(v)=1C(v) = 1 means the neighbourhood is a clique; C(v)=0C(v) = 0 means no two neighbours are connected.

Definition (Global Clustering Coefficient). The average over all vertices: C=1nvC(v)C = \frac{1}{n}\sum_{v} C(v).

Alternative: Transitivity. T(G)=3×number of trianglesnumber of connected triples=tr(A3)1A21tr(A2)T(G) = \frac{3 \times \text{number of triangles}}{\text{number of connected triples}} = \frac{\operatorname{tr}(A^3)}{\mathbf{1}^\top A^2 \mathbf{1} - \operatorname{tr}(A^2)}.

Network typeTypical CCInterpretation
Erdos-Renyi G(n,p)G(n,p)p\approx pLow, unclustered
Social networks0.10.1-0.50.5High, clustered (friends of friends are friends)
Protein interactions0.10.1-0.30.3Medium clustering
Regular latticeHighVery clustered but large diameter
Small-world (Watts-Strogatz)HighHigh clustering AND small diameter

For AI: The clustering coefficient is a standard graph-level feature in GNN benchmarks. High clustering indicates the presence of many triangles - triangle-counting is a canonical task that standard 1-WL GNNs cannot perform exactly (triangles require 3-WL). This motivates higher-order GNN architectures that explicitly count triangular motifs.

Note: The full theory of degree distributions and random graph models is developed in 06 Random Graphs. Here we introduce the concept; there we formalise it.


4. Paths, Walks, and Cycles

4.1 Walks, Trails, and Paths

These three concepts form a hierarchy of increasing restriction on vertex/edge repetition:

Definition (Walk). A walk of length kk in GG is a sequence of vertices v0,v1,,vkv_0, v_1, \ldots, v_k such that {vi1,vi}E\{v_{i-1}, v_i\} \in E for each i{1,,k}i \in \{1, \ldots, k\}. Vertices and edges may repeat.

Definition (Trail). A trail is a walk in which no edge is repeated (but vertices may repeat).

Definition (Path). A path is a walk in which no vertex is repeated (and hence no edge is repeated).

WALK vs. TRAIL vs. PATH
========================================================================

  Graph:   A --- B --- C
           |     |     |
           D --- E --- F

  Walk:    A -> B -> E -> B -> C          (B visited twice - OK for walk)
  Trail:   A -> B -> E -> D -> A -> B -> C  (no edge repeated, A visited twice)
  Path:    A -> B -> E -> F -> C           (no vertex repeated)

  Restriction:   Walk \supseteq Trail \supseteq Path
                 (every path is a trail, every trail is a walk)

========================================================================

Notation. The length of a walk/trail/path is the number of edges traversed (not vertices). A path from uu to vv is called a uu-vv path.

Proposition. If there exists a walk from uu to vv, then there exists a path from uu to vv. (Proof: remove cycles from the walk to obtain a path.)

For AI: In a GNN with kk message-passing layers, information from vertex uu can reach vertex vv if and only if there exists a walk of length k\leq k from uu to vv. The GNN's receptive field at depth kk is exactly the set of vertices reachable by walks of length k\leq k. Note: it is walks (not paths) because GNNs can re-visit vertices through different neighbours at different layers.

4.2 Cycles

Definition (Cycle). A cycle of length kk (also called a kk-cycle) is a closed walk v0,v1,,vkv_0, v_1, \ldots, v_k where v0=vkv_0 = v_k, all other vertices are distinct, and k3k \geq 3.

Definition (Girth). The girth of a graph is the length of its shortest cycle. If the graph has no cycles (is acyclic), the girth is defined as \infty.

Examples:

  • K3K_3 (triangle): girth =3= 3
  • K4K_4: girth =3= 3 (contains many triangles)
  • CnC_n (cycle graph): girth =n= n
  • Any tree: girth == \infty (acyclic)
  • Petersen graph: girth =5= 5 (smallest cycle has 5 vertices)

Definition (Acyclic Graph). A graph with no cycles. An undirected acyclic graph is a forest; a connected forest is a tree.

For AI: Cycles in computation graphs would create infinite loops during forward pass evaluation. This is why computation graphs are DAGs (directed acyclic graphs) - acyclicity ensures a valid topological ordering for sequential evaluation. Recurrent Neural Networks (RNNs) appear to have cycles, but when "unrolled" over time steps, the computation graph is a DAG.

4.3 Shortest Paths and Distance

Definition (Distance). The distance d(u,v)d(u, v) between vertices uu and vv is the length of the shortest uu-vv path. If no such path exists (they are in different components), d(u,v)=d(u, v) = \infty.

Definition (Eccentricity). The eccentricity of a vertex vv is ε(v)=maxuVd(v,u)\varepsilon(v) = \max_{u \in V} d(v, u) - the distance to the farthest vertex.

Definition (Diameter). The diameter of a graph is diam(G)=maxvVε(v)=maxu,vVd(u,v)\operatorname{diam}(G) = \max_{v \in V} \varepsilon(v) = \max_{u, v \in V} d(u, v) - the maximum distance between any two vertices.

Definition (Radius). The radius of a graph is rad(G)=minvVε(v)\operatorname{rad}(G) = \min_{v \in V} \varepsilon(v).

Definition (Center). The center of a graph is the set of vertices with minimum eccentricity: {v:ε(v)=rad(G)}\{v : \varepsilon(v) = \operatorname{rad}(G)\}.

Examples:

  • Path graph PnP_n: diameter =n1= n - 1, radius =(n1)/2= \lfloor(n-1)/2\rfloor
  • Cycle CnC_n: diameter =n/2= \lfloor n/2 \rfloor
  • Complete graph KnK_n: diameter =1= 1 (every pair is adjacent)
  • Star graph SnS_n (one center connected to n1n-1 leaves): diameter =2= 2

Proposition. For any connected graph: rad(G)diam(G)2rad(G)\operatorname{rad}(G) \leq \operatorname{diam}(G) \leq 2 \cdot \operatorname{rad}(G).

Proof sketch. The left inequality is immediate: the max over all vertices is \geq the min. For the right inequality: let u,vu, v achieve the diameter and let cc be a center vertex (achieving the radius). Then d(u,v)d(u,c)+d(c,v)ε(c)+ε(c)=2rad(G)d(u,v) \leq d(u,c) + d(c,v) \leq \varepsilon(c) + \varepsilon(c) = 2 \cdot \operatorname{rad}(G). \square

Distance properties (metric space). The graph distance function d:V×VN{0}d: V \times V \to \mathbb{N} \cup \{0\} is a metric:

  1. Non-negativity: d(u,v)0d(u,v) \geq 0, with d(u,v)=0    u=vd(u,v) = 0 \iff u = v
  2. Symmetry: d(u,v)=d(v,u)d(u,v) = d(v,u) (for undirected graphs)
  3. Triangle inequality: d(u,w)d(u,v)+d(v,w)d(u,w) \leq d(u,v) + d(v,w)

This means the vertex set of any connected graph forms a metric space under graph distance. This observation connects graph theory to metric geometry and is used in graph embedding methods (e.g., embedding graph vertices into Euclidean space while approximately preserving distances).

Distance matrix. The distance matrix DNn×nD \in \mathbb{N}^{n \times n} has entries Dij=d(vi,vj)D_{ij} = d(v_i, v_j). This is distinct from the degree matrix (also sometimes denoted DD). The distance matrix contains the full pairwise distance information of the graph and is used as a positional encoding in graph transformers (Li et al., 2020).

DISTANCE MATRIX EXAMPLE (P_4: 1-2-3-4)
========================================================================

  Distance matrix:          Eccentricities:
       1  2  3  4           \epsilon(1) = 3  (farthest: vertex 4)
  1 [  0  1  2  3 ]         \epsilon(2) = 2  (farthest: vertex 4)
  2 [  1  0  1  2 ]         \epsilon(3) = 2  (farthest: vertex 1)
  3 [  2  1  0  1 ]         \epsilon(4) = 3  (farthest: vertex 1)
  4 [  3  2  1  0 ]
                            Diameter = 3, Radius = 2
                            Center = {2, 3}

========================================================================

For AI: The diameter of a graph determines the minimum number of GNN layers needed to propagate information between the two most distant vertices. If diam(G)=d\operatorname{diam}(G) = d, a GNN with fewer than dd layers cannot use the full graph structure for prediction. The "small-world" property - most real-world graphs have diam(G)=O(logn)\operatorname{diam}(G) = O(\log n) - explains why GNNs with 2-3 layers often suffice in practice.

Note: Algorithms for computing shortest paths (BFS for unweighted, Dijkstra for weighted, Bellman-Ford for negative weights) are developed in 03 Graph Algorithms.

4.4 Eulerian and Hamiltonian Paths

Definition (Eulerian Trail/Circuit). An Eulerian trail is a trail that visits every edge exactly once. An Eulerian circuit is a closed Eulerian trail (starts and ends at the same vertex).

Theorem (Euler, 1736). A connected graph has an Eulerian circuit if and only if every vertex has even degree. It has an Eulerian trail (but not a circuit) if and only if exactly two vertices have odd degree.

This was the first theorem of graph theory, proved by Euler to show that the Konigsberg bridge problem has no solution (the graph of Konigsberg had four vertices, all of odd degree).

THE KONIGSBERG BRIDGE PROBLEM
========================================================================

  The city of Konigsberg (now Kaliningrad) had 7 bridges connecting
  4 landmasses. Can you walk across each bridge exactly once?

  Landmasses as vertices, bridges as edges:

       A -------- B         deg(A) = 3  (odd)
      /|\        |         deg(B) = 5  (odd)
     / | \       |         deg(C) = 3  (odd)
    /  |  \      |         deg(D) = 3  (odd)
   C   |   D-----+
       |                   All degrees odd -> no Eulerian trail exists
       +-----------D       (Euler's theorem: need 0 or 2 odd-degree vertices)

========================================================================

Proof sketch (Euler's theorem - necessity). If a graph has an Eulerian circuit, every time the walk enters a vertex on one edge it must leave on a different edge. So edges at each vertex are paired: enter-leave, enter-leave, ... This means every vertex uses an even number of edges, so deg(v)\deg(v) is even. For an Eulerian trail (not circuit), the start and end vertices each use one unpaired edge, so exactly two vertices have odd degree. \square

Definition (Hamiltonian Path/Cycle). A Hamiltonian path is a path that visits every vertex exactly once. A Hamiltonian cycle is a cycle that visits every vertex exactly once.

Contrast. Euler: every edge once. Hamilton: every vertex once.

PropertyEulerianHamiltonian
VisitsEvery edge onceEvery vertex once
Existence testPolynomial (check degrees)NP-complete
KnK_nHas Eulerian circuit iff nn is oddAlways has Hamiltonian cycle (n3n \geq 3)
Km,nK_{m,n}Has Eulerian circuit iff m,nm, n both evenHas Hamiltonian cycle iff m=nm = n

For AI: The Travelling Salesman Problem (TSP) - finding the shortest Hamiltonian cycle in a weighted complete graph - is a canonical NP-hard combinatorial optimization problem. Neural combinatorial optimization (Vinyals et al., 2015; Kool et al., 2019) uses attention-based models to learn heuristic TSP solutions. The graph structure of TSP is KnK_n with distance weights.

4.5 Counting Walks with the Adjacency Matrix

Theorem. Let AA be the adjacency matrix of a graph GG. The entry (Ak)ij(A^k)_{ij} counts the number of walks of length kk from vertex ii to vertex jj.

Proof sketch. By induction on kk. Base case: A1=AA^1 = A, and Aij=1A_{ij} = 1 iff there is a walk of length 1 (an edge) from ii to jj. Inductive step: (Ak+1)ij=(Ak)iAj(A^{k+1})_{ij} = \sum_{\ell} (A^k)_{i\ell} \cdot A_{\ell j}. A walk of length k+1k+1 from ii to jj consists of a walk of length kk from ii to some \ell, followed by an edge from \ell to jj. Summing over all possible intermediate vertices \ell gives the matrix product. \square

Example. For the triangle K3K_3 with adjacency matrix:

A=(011101110),A2=(211121112)A = \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix}, \quad A^2 = \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}

(A2)11=2(A^2)_{11} = 2: there are 2 walks of length 2 from vertex 1 to itself (1211 \to 2 \to 1 and 1311 \to 3 \to 1). The diagonal entries of A2A^2 equal the vertex degrees.

Corollary. The number of triangles containing vertex ii is (A3)ii/2(A^3)_{ii} / 2 (each triangle is counted twice, once in each direction). The total number of triangles in GG is tr(A3)/6\operatorname{tr}(A^3) / 6.

Example: walk counts in K3K_3.

A3=(233323332)A^3 = \begin{pmatrix} 2 & 3 & 3 \\ 3 & 2 & 3 \\ 3 & 3 & 2 \end{pmatrix}

(A3)11=2(A^3)_{11} = 2, so vertex 1 is in 2/2=12/2 = 1 triangle. tr(A3)=6\operatorname{tr}(A^3) = 6, so there are 6/6=16/6 = 1 triangle total. Correct for K3K_3.

Extended corollary: subgraph counting.

SubgraphFormula using AA
Edgestr(A2)/2=m\operatorname{tr}(A^2)/2 = m
Trianglestr(A3)/6\operatorname{tr}(A^3)/6
Walks of length kk (total)1Ak1\mathbf{1}^\top A^k \mathbf{1}
Paths of length kkHarder - requires inclusion-exclusion

Connection to GNN layers. After kk GNN layers, the aggregated representation of vertex ii contains information from all vertices reachable by walks of length k\leq k from ii. The contribution of vertex jj to vertex ii's representation after exactly kk layers is proportional to (Ak)ij(A^k)_{ij} (in the linear, non-normalised case). The normalised GCN layer uses A^=D1/2AD1/2\hat{A} = D^{-1/2}AD^{-1/2}, so the kk-layer GCN computes A^kXW\hat{A}^k X W - a matrix whose entries are walks of length kk normalised by vertex degrees.

For AI: This theorem connects graph theory directly to linear algebra. The spectral decomposition A=QΛQA = Q\Lambda Q^\top gives Ak=QΛkQA^k = Q\Lambda^k Q^\top, so the number of walks of length kk is controlled by the eigenvalues of AA. The dominant eigenvalue determines the exponential growth rate of walk counts - this is the mathematical foundation of PageRank (stationary distribution of a random walk = dominant eigenvector of the transition matrix). Full spectral analysis is in 04 Spectral Graph Theory.


5. Connectivity

5.1 Connected Graphs and Components

Definition (Connected Graph). An undirected graph GG is connected if for every pair of vertices u,vVu, v \in V, there exists a path from uu to vv.

Definition (Connected Component). A connected component of GG is a maximal connected subgraph - a connected subgraph that is not a proper subgraph of any larger connected subgraph.

Every graph can be uniquely decomposed into its connected components. If GG has cc connected components, then c=1c = 1 means GG is connected.

CONNECTED COMPONENTS
========================================================================

  Component 1:    Component 2:    Component 3:
  A --- B         E --- F         H
  |   /           |
  | /             G
  C --- D

  3 connected components
  Vertices: {A,B,C,D} \cup {E,F,G} \cup {H}

========================================================================

Proposition. A graph on nn vertices is connected if and only if it has at least n1n - 1 edges (necessary but not sufficient - trees achieve exactly n1n - 1).

Proposition (Component count from Laplacian). The number of connected components equals the multiplicity of eigenvalue 0 in the graph Laplacian L=DAL = D - A. This is a deep result connecting connectivity (combinatorial) to spectrum (algebraic).

For AI: If a GNN's input graph has multiple connected components, information cannot flow between components through message passing. Each component is processed independently. In molecular graphs, a molecule with disconnected fragments (e.g., a salt like NaCl in solution) has multiple components - the GNN must handle each separately or use a virtual node to connect them.

5.2 Strongly and Weakly Connected Digraphs

For directed graphs, connectivity bifurcates into two notions:

Definition (Strongly Connected). A digraph is strongly connected if for every pair u,vu, v, there exists a directed path from uu to vv AND a directed path from vv to uu.

Definition (Weakly Connected). A digraph is weakly connected if the underlying undirected graph (obtained by ignoring edge directions) is connected.

Definition (Strongly Connected Component / SCC). A strongly connected component is a maximal strongly connected subgraph.

STRONG vs. WEAK CONNECTIVITY
========================================================================

  A ---> B ---> C ---> D        Weakly connected: YES (underlying
  up           |                                   undirected graph
  +-----------+                                   is connected)

  SCCs: {A, B, C} and {D}    Strongly connected: NO (no path D -> A)
  
  Within {A, B, C}:          A->B->C->A forms a directed cycle
                              (strongly connected)

========================================================================

Example: A citation network is weakly connected (following links in either direction, you can reach most papers) but not strongly connected (you cannot follow citations from a 2024 paper to a 2023 paper - citations only go backward in time).

Example (Condensation DAG). Given the SCC decomposition, we can form the condensation of a digraph: contract each SCC to a single vertex. The resulting graph is always a DAG (if it had a directed cycle, the SCCs involved would merge into one). This DAG reveals the macroscopic flow structure of the digraph.

Algorithms for SCCs. Tarjan's algorithm (1972) and Kosaraju's algorithm both find all SCCs in O(n+m)O(n + m) time using DFS. These are covered in detail in 03 Graph Algorithms.

For AI: The SCC decomposition of a computation graph reveals which operations form feedback loops. In standard feedforward networks, each SCC is a single vertex (the graph is a DAG). In architectures with weight sharing (like RNNs unrolled over time), understanding the SCC structure helps with gradient flow analysis. The condensation DAG of a knowledge graph reveals the hierarchical structure of entity relationships.

5.3 Bridges and Articulation Points

Definition (Bridge). An edge ee is a bridge (or cut edge) if removing ee increases the number of connected components.

Definition (Articulation Point). A vertex vv is an articulation point (or cut vertex) if removing vv and all its incident edges increases the number of connected components.

Example: In the path graph P4=1234P_4 = 1{-}2{-}3{-}4, every edge is a bridge and vertices 2 and 3 are articulation points. Removing edge {2,3}\{2,3\} disconnects {1,2}\{1,2\} from {3,4}\{3,4\}.

Example (non-bridge): In the cycle C4=12341C_4 = 1{-}2{-}3{-}4{-}1, no edge is a bridge - removing any single edge leaves the graph connected (as a path). No vertex is an articulation point.

Proposition. An edge e={u,v}e = \{u,v\} is a bridge if and only if ee is not contained in any cycle. Equivalently, ee is a bridge iff there is no alternative path from uu to vv.

Definition (Biconnected Graph). A connected graph with no articulation points is biconnected. Equivalently, between any two vertices there are at least two vertex-disjoint paths.

For AI: Bridges and articulation points identify structural vulnerabilities in networks. In a knowledge graph, an articulation point is an entity whose removal disconnects parts of the knowledge base. In network analysis for social media, bridges between communities are the edges that enable information flow between groups.

5.4 Vertex and Edge Connectivity

Definition (Vertex Connectivity). The vertex connectivity κ(G)\kappa(G) is the minimum number of vertices whose removal disconnects GG (or reduces it to a single vertex). A graph is kk-connected if κ(G)k\kappa(G) \geq k.

Definition (Edge Connectivity). The edge connectivity λ(G)\lambda(G) is the minimum number of edges whose removal disconnects GG.

Theorem (Whitney, 1932). For any graph GG:

κ(G)λ(G)δ(G)\kappa(G) \leq \lambda(G) \leq \delta(G)

where δ(G)=minvVdeg(v)\delta(G) = \min_{v \in V} \deg(v) is the minimum degree.

Theorem (Menger, 1927). The maximum number of vertex-disjoint uu-vv paths equals the minimum number of vertices whose removal disconnects uu from vv.

This is a max-flow min-cut result: the maximum "flow" of disjoint paths equals the minimum "cut" of vertices. The edge version: the maximum number of edge-disjoint uu-vv paths equals the minimum edge cut separating uu from vv.

Example of Whitney's inequality. Consider a graph that is a "book" - two triangles sharing one edge:

  A --- B         deg(A) = deg(B) = 3
 /|\   /|\        \delta(G) = 3  (minimum degree)
C | D-E | F       \lambda(G) = 2  (remove the two edges at the shared edge)
 \|/   \|/        \kappa(G) = 1  (remove the shared vertex D or E)
  +-------+

So κ=1λ=2δ=3\kappa = 1 \leq \lambda = 2 \leq \delta = 3. The inequality is strict.

Achieving equality. For kk-regular graphs: if GG is kk-regular and kk-edge-connected, then κ(G)=λ(G)=k\kappa(G) = \lambda(G) = k. Complete graphs KnK_n achieve κ=λ=δ=n1\kappa = \lambda = \delta = n-1 (maximum possible).

For AI: Menger's theorem is the combinatorial version of the max-flow min-cut theorem, which is fundamental to network flow algorithms used in image segmentation (graph cuts), assignment problems, and scheduling. The edge connectivity λ(G)\lambda(G) measures graph robustness: how many links must fail before the network breaks. This is directly applicable to reliability analysis of distributed systems and communication networks.

5.5 Connectivity in AI

Computation graph connectivity. A computation graph must be connected (in the weakly-connected sense for digraphs) for gradients to flow from the loss to all parameters. If a parameter's computation is disconnected from the loss, its gradient is zero - it cannot be trained. PyTorch detects this and raises warnings.

Knowledge graph connectivity. The "reachability" of entities in a knowledge graph determines what a RAG system can retrieve. If the query entity and the answer entity are in different connected components, no amount of graph traversal will find the answer.

GNN receptive field. A GNN with kk layers can propagate information across paths of length k\leq k. If the graph diameter exceeds kk, some vertex pairs cannot exchange information. This is the "under-reaching" problem, complementary to the "over-smoothing" problem (too many layers cause all representations to converge).

Molecular connectivity. In molecular graphs, the number of connected components indicates whether the input is a single molecule or a mixture/complex. Graph-level prediction tasks (e.g., molecular property prediction) typically assume a single connected component.


6. Special Graph Families

6.1 Complete Graphs

Definition (Complete Graph KnK_n). The complete graph on nn vertices is the graph where every pair of distinct vertices is connected by an edge:

E=(V2),E=(n2)=n(n1)2E = \binom{V}{2}, \quad |E| = \binom{n}{2} = \frac{n(n-1)}{2}

Properties of KnK_n:

  • (n1)(n-1)-regular (every vertex has degree n1n-1)
  • Diameter =1= 1 (every pair is adjacent)
  • Number of edges: (n2)\binom{n}{2}
  • Number of triangles: (n3)\binom{n}{3}
  • Number of spanning trees: nn2n^{n-2} (Cayley's formula)
  • Chromatic number: χ(Kn)=n\chi(K_n) = n
  • Hamiltonian for n3n \geq 3, Eulerian for odd nn
COMPLETE GRAPHS
========================================================================

  K_1    K_2      K_3        K_4           K_5
   -    ----    -         -----       -------
                |\        |\ /|      /|\   /|
                | \       | x |    /  | \/  |
                ----      |/ \|   -----------
                          -----       \|/
  m=0   m=1    m=3       m=6     m=10

========================================================================

For AI: The fully-connected attention mechanism in transformers computes attention weights over KnK_n (where nn is the sequence length). Every token attends to every other token - this is exactly message passing on the complete graph. The O(n2)O(n^2) complexity of self-attention is the cost of processing KnK_n.

6.2 Bipartite Graphs

Definition (Bipartite Graph). A graph G=(V,E)G = (V, E) is bipartite if VV can be partitioned into two disjoint sets UU and WW such that every edge connects a vertex in UU to a vertex in WW: EU×WE \subseteq U \times W.

Theorem (Odd-Cycle Characterisation). A graph is bipartite if and only if it contains no odd-length cycle.

Proof sketch. (\Rightarrow) If GG is bipartite with parts U,WU, W, any cycle must alternate between UU and WW, requiring even length. (\Leftarrow) If GG has no odd cycle, pick any vertex vv, colour it by distance: U={u:d(v,u) even}U = \{u : d(v,u) \text{ even}\}, W={u:d(v,u) odd}W = \{u : d(v,u) \text{ odd}\}. No edge connects two vertices at the same distance (that would create an odd cycle). \square

Definition (Complete Bipartite Graph Km,nK_{m,n}). The bipartite graph with parts of size mm and nn where every vertex in one part is connected to every vertex in the other. It has m+nm + n vertices and mnmn edges.

BIPARTITE GRAPH
========================================================================

  General bipartite:        Complete bipartite K_3,_2:
  U: * --- o              U: *---o
     |   /                    |\ /|
     * /                      | x |
     |                        |/ \|
  W: o     o               W: *---o---*

  Partition: U (filled), W (open)

========================================================================

Examples:

  • User-item interaction graphs (recommender systems): users and items are two parts
  • Bipartite matching: job assignment (workers <-> tasks)
  • Dependency parse trees are bipartite when arcs alternate between heads and dependents

Non-example: K3K_3 (the triangle) is NOT bipartite - it contains an odd cycle of length 3.

Non-example: C5C_5 (the pentagon) is NOT bipartite - it is an odd cycle of length 5.

Testing bipartiteness. BFS provides an efficient O(n+m)O(n + m) bipartiteness test: start BFS from any vertex, alternating colours between levels. If any edge connects two same-colour vertices, the graph is not bipartite (and that edge, combined with the BFS tree paths, gives an odd cycle certificate). This is covered in 03 Graph Algorithms.

Theorem (Konig, 1931). In a bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover.

This is a foundational result in combinatorial optimisation. A matching is a set of edges with no shared vertices. A vertex cover is a set of vertices that includes at least one endpoint of every edge. Konig's theorem does NOT hold for non-bipartite graphs (e.g., K3K_3 has matching size 1 but minimum vertex cover size 2).

Bipartite adjacency matrix structure. The adjacency matrix of a bipartite graph with parts UU (mm vertices) and WW (nn vertices) has a block structure:

A=(0mBB0n)A = \begin{pmatrix} 0_m & B \\ B^\top & 0_n \end{pmatrix}

where B{0,1}m×nB \in \{0,1\}^{m \times n} is the biadjacency matrix. This block structure has important spectral consequences: the eigenvalues of AA are symmetric about 0 (if λ\lambda is an eigenvalue, so is λ-\lambda).

For AI: Recommender systems model user-item interactions as bipartite graphs. Matrix factorisation for recommendation (the Netflix Prize approach) is equivalent to low-rank approximation of the biadjacency matrix BB. Knowledge graphs with subject-predicate-object triples can be viewed as bipartite between entities and relations. The block structure of the bipartite adjacency matrix is exploited in bipartite GNN architectures for recommendation (PinSage, Ying et al., 2018).

6.3 Trees and Forests

Definition (Tree). A tree is a connected acyclic graph.

Definition (Forest). A forest is an acyclic graph (not necessarily connected). Each connected component of a forest is a tree.

Theorem (Tree Equivalences). For a graph GG on nn vertices, the following are equivalent:

  1. GG is a tree (connected and acyclic)
  2. GG is connected and has exactly n1n - 1 edges
  3. GG is acyclic and has exactly n1n - 1 edges
  4. There is exactly one path between every pair of vertices
  5. GG is connected, but removing any edge disconnects it (every edge is a bridge)
  6. GG is acyclic, but adding any edge creates exactly one cycle

Proof (1 \Leftrightarrow 2). (\Rightarrow) A connected graph has n1\geq n-1 edges. If it had n\geq n edges, by the pigeonhole principle there would be a cycle (contradiction with acyclic). So exactly n1n-1. (\Leftarrow) A connected graph with n1n-1 edges has no "spare" edges, so no cycles. \square

Proof (1 \Rightarrow 4). Suppose GG is a tree and there exist two distinct paths P1P_1 and P2P_2 from uu to vv. Since P1P2P_1 \neq P_2, they diverge at some vertex ww and rejoin later at some vertex xx. The subpaths wxw \to x via P1P_1 and wxw \to x via P2P_2 form a cycle - contradicting acyclicity. So the path is unique. \square

Proof (4 \Rightarrow 5). If there is exactly one path between every pair, then removing any edge {u,v}\{u,v\} destroys the unique uu-vv path, disconnecting uu from vv. So every edge is a bridge. \square

TREE EQUIVALENCES - VISUAL SUMMARY
========================================================================

  Connected + Acyclic        Exactly one       Connected, all
  = TREE                     path between      edges are bridges
    |                        any pair              |
    |    n-1 edges           |                     |
    v    (connected)         v                     v
  +------------------------------------------------------+
  |              ALL SIX CONDITIONS EQUIVALENT            |
  |       (any one implies all the others)                |
  +------------------------------------------------------+
    ^                        ^                     ^
    |    n-1 edges           |                     |
    |    (acyclic)      Adding any edge       Maximal acyclic
    |                   creates exactly       = minimal connected
                        one cycle

========================================================================

Theorem (Cayley's Formula, 1889). The number of labeled spanning trees of the complete graph KnK_n is nn2n^{n-2}.

nnKnK_nSpanning trees nn2n^{n-2}
1single vertex11
2single edge11
3triangle33
4K4K_41616
5K5K_5125125

Theorem (Kirchhoff's Matrix Tree Theorem). The number of spanning trees of any graph GG equals any cofactor of the Laplacian matrix L=DAL = D - A. Specifically, it equals det(L)\det(L') where LL' is LL with any one row and column deleted.

Definition (Spanning Tree). A spanning tree of a connected graph GG is a tree that includes all vertices of GG. Every connected graph has at least one spanning tree.

Definition (Rooted Tree). A tree with a designated root vertex. Edges are implicitly directed away from (or toward) the root. Rooted trees define a parent-child hierarchy.

Examples:

  • Parse trees in NLP: dependency trees, constituency trees
  • File system directory structures
  • Decision trees in machine learning
  • Binary search trees, B-trees, tries

For AI: Trees are fundamental structures in AI:

  • Decision trees (CART, Random Forest, XGBoost) - each internal node is a split, each leaf is a prediction
  • Abstract syntax trees (ASTs) - code represented as trees for program analysis and code generation
  • Spanning trees of graphs - GNN approaches sometimes use tree decomposition for efficient message passing
  • Hierarchical clustering dendrograms - trees representing cluster merging

6.4 DAGs (Directed Acyclic Graphs)

Definition (DAG). A directed acyclic graph is a directed graph with no directed cycles.

Theorem (Topological Ordering). A directed graph has a topological ordering if and only if it is a DAG.

A topological ordering is a linear ordering of vertices such that for every directed edge (u,v)(u, v), vertex uu comes before vv in the ordering. A DAG may have multiple valid topological orderings.

DAG AND TOPOLOGICAL ORDER
========================================================================

  DAG:                      Topological orderings:
  A ---> B ---> D             A, B, C, D, E
  |     |                   A, C, B, D, E
  down     down                   A, C, B, E, D   (if C->E exists)
  C ---> E

  Every directed path respects the ordering

========================================================================

For AI: DAGs are arguably the most important graph type in AI:

  1. Computation graphs. Every feedforward neural network is a DAG. Forward evaluation proceeds in topological order; backpropagation proceeds in reverse topological order.

  2. Bayesian networks. A Bayesian network is a DAG where vertices represent random variables and edges represent conditional dependencies. The joint distribution factorises according to the DAG structure: p(x1,,xn)=ip(xiparents(xi))p(x_1, \ldots, x_n) = \prod_i p(x_i \mid \text{parents}(x_i)).

  3. Causal graphs. Structural causal models (Pearl, 2009) represent causal relationships as DAGs. The direction of edges encodes causal direction: ABA \to B means "AA causes BB."

  4. Task scheduling. Build systems (Make, Bazel), data pipelines (Airflow), and training pipelines all model dependencies as DAGs.

Topological sort algorithm (Kahn, 1962):

KAHN'S TOPOLOGICAL SORT
========================================================================

  Input: DAG G = (V, E)
  1. Compute in-degree of every vertex
  2. Enqueue all vertices with in-degree 0
  3. While queue is non-empty:
       a. Dequeue vertex u, add to output
       b. For each successor v of u:
            reduce in-degree(v) by 1
            if in-degree(v) == 0: enqueue v
  4. If output has fewer than n vertices: G has a cycle (not a DAG)

  Time: O(n + m)     Space: O(n)

========================================================================

Unique topological ordering. A DAG has a unique topological ordering if and only if it has a Hamiltonian path - a path visiting all vertices. Equivalently, at every step of Kahn's algorithm, there is exactly one vertex with in-degree 0.

6.5 Planar Graphs

Definition (Planar Graph). A graph is planar if it can be drawn in the plane without edge crossings.

Theorem (Euler's Formula for Planar Graphs). For a connected planar graph with nn vertices, mm edges, and ff faces (including the outer face):

nm+f=2n - m + f = 2

Proof (by induction on mm). Base case: a tree has n1n-1 edges and f=1f=1 face (the outer face). So n(n1)+1=2n - (n-1) + 1 = 2. \checkmark Inductive step: take a connected planar graph with a cycle. Removing an edge that is on a cycle reduces mm by 1 and merges two faces, reducing ff by 1. The quantity nm+fn - m + f is unchanged. \square

Corollary. For a simple planar graph with n3n \geq 3: m3n6m \leq 3n - 6. This follows from Euler's formula plus the fact that each face is bounded by at least 3 edges, and each edge borders at most 2 faces.

Corollary (bipartite planar). For a simple bipartite planar graph with n3n \geq 3: m2n4m \leq 2n - 4. (Each face is bounded by 4\geq 4 edges, since bipartite graphs have no odd cycles.)

Theorem (Kuratowski, 1930). A graph is planar if and only if it does not contain a subdivision of K5K_5 or K3,3K_{3,3} as a subgraph.

Examples:

  • K4K_4 is planar (can be drawn as a triangle with a vertex inside). Check: n=4n=4, m=6m=6, f=4f=4. 46+4=24 - 6 + 4 = 2. \checkmark
  • K5K_5 is NOT planar. Check: n=5n=5, m=10m=10, but 10>3(5)6=910 > 3(5)-6 = 9. Euler's bound violated.
  • K3,3K_{3,3} is NOT planar. Check: n=6n=6, m=9m=9, bipartite bound 9>2(6)4=89 > 2(6)-4 = 8. Violated.
K_4 PLANAR EMBEDDING
========================================================================

  1                       Euler check:
  /|\                       n = 4 vertices
 / | \                      m = 6 edges
2--+--3                     f = 4 faces (3 triangles + outer)
 \ | /                      4 - 6 + 4 = 2  OK
  \|/
   4

========================================================================

For AI: Planar graphs arise in geographic information systems (road networks, administrative boundaries), VLSI circuit design (wires must not cross), and mesh-based physics simulations (finite element methods for weather prediction and fluid dynamics). Graph drawing algorithms that minimise edge crossings are used in knowledge graph visualisation. The planarity constraint is also useful as a sparsity prior - planar graphs are sparse (m3n6m \leq 3n-6), making them computationally tractable for exact GNN inference.

6.6 Hypergraphs

Definition (Hypergraph). A hypergraph is a pair H=(V,E)H = (V, \mathcal{E}) where E2V\mathcal{E} \subseteq 2^V is a set of hyperedges - each hyperedge is a subset of VV that may contain more than 2 vertices.

A standard graph is a hypergraph where every hyperedge has exactly 2 vertices.

Definition (kk-uniform hypergraph). A hypergraph where every hyperedge has exactly kk vertices. Standard graphs are 2-uniform hypergraphs.

Example: In a co-authorship network, a paper with 5 authors creates a 5-element hyperedge connecting all 5 authors simultaneously - this captures the "group" relationship that pairwise edges cannot.

Bipartite representation. Any hypergraph H=(V,E)H = (V, \mathcal{E}) can be represented as a bipartite graph B=(VE,{(v,e):ve})B = (V \cup \mathcal{E}, \{(v, e) : v \in e\}) - one part for vertices, one for hyperedges, with edges representing membership. This is the incidence graph of the hypergraph and enables standard graph algorithms to operate on hypergraphs.

For AI: Hypergraphs naturally model higher-order interactions:

  • Attention mechanisms compute interactions over sets of tokens - multi-head attention can be viewed as a learned hypergraph where each attention head defines hyperedges
  • Higher-order message passing in Hypergraph Neural Networks (Feng et al., 2019) propagates messages along hyperedges
  • Set functions like DeepSets (Zaheer et al., 2017) process unordered sets - each set is a hyperedge

Note: Hypergraph theory is a deep and active research area. This section provides only an introduction. The key takeaway is that standard graphs capture pairwise relationships; hypergraphs capture group relationships.


7. Subgraphs and Graph Operations

7.1 Subgraphs and Induced Subgraphs

Definition (Subgraph). A graph H=(V,E)H = (V', E') is a subgraph of G=(V,E)G = (V, E) if VVV' \subseteq V and EEE' \subseteq E. We write HGH \subseteq G.

Definition (Induced Subgraph). Given a vertex subset SVS \subseteq V, the induced subgraph G[S]G[S] is the subgraph with vertex set SS and all edges of GG whose both endpoints are in SS:

G[S]=(S,{eE:eS})G[S] = (S, \{e \in E : e \subseteq S\})

The key difference: a subgraph can choose any subset of edges, but an induced subgraph must include ALL edges between vertices in SS.

Definition (Spanning Subgraph). A subgraph HH is spanning if V=VV' = V - it has the same vertex set as GG but possibly fewer edges.

SUBGRAPH vs. INDUCED SUBGRAPH
========================================================================

  Original G:           Subgraph H:         Induced G[{A,B,C}]:
  A --- B               A --- B             A --- B
  |\    |               |                   |\
  |  \  |               |                   |  \
  D --- C               D     C             C
                                             (includes edge A-C
  All edges present      Missing edges OK     because both endpoints
                                              are in {A,B,C})

========================================================================

For AI: Mini-batch training for GNNs works by sampling induced subgraphs. GraphSAGE (Hamilton et al., 2017) samples kk-hop neighbourhoods as induced subgraphs. The choice between subgraph and induced subgraph affects whether all edges within the sample are preserved - induced subgraphs preserve all local structure.

7.2 Graph Complement

Definition (Graph Complement). The complement Gˉ\bar{G} of a simple graph G=(V,E)G = (V, E) is the graph on the same vertex set with complementary edge set:

Gˉ=(V,(V2)E)\bar{G} = (V, \binom{V}{2} \setminus E)

Two vertices are adjacent in Gˉ\bar{G} if and only if they are NOT adjacent in GG.

Properties:

  • E(Gˉ)=(n2)E(G)|E(\bar{G})| = \binom{n}{2} - |E(G)|
  • GG and Gˉ\bar{G} together form KnK_n: E(G)E(Gˉ)=E(Kn)E(G) \cup E(\bar{G}) = E(K_n)
  • Gˉ=G\overline{\bar{G}} = G (complement of complement is the original)
  • Kn=Kˉn\overline{K_n} = \bar{K}_n (complement of complete graph is empty graph)
  • A graph is self-complementary if GGˉG \cong \bar{G} (e.g., P4P_4, C5C_5)

For AI: Complement graphs appear in constraint satisfaction. If GG represents conflicts (vertices that cannot be assigned the same resource), then Gˉ\bar{G} represents compatibility. Finding a maximum clique in GG is equivalent to finding a maximum independent set in Gˉ\bar{G}.

7.3 Graph Union, Intersection, and Product

Definition (Graph Union). For graphs G1=(V1,E1)G_1 = (V_1, E_1) and G2=(V2,E2)G_2 = (V_2, E_2):

G1G2=(V1V2,E1E2)G_1 \cup G_2 = (V_1 \cup V_2, E_1 \cup E_2)

Definition (Graph Intersection). G1G2=(V1V2,E1E2)G_1 \cap G_2 = (V_1 \cap V_2, E_1 \cap E_2).

Definition (Cartesian Product G1G2G_1 \square G_2). Vertex set V1×V2V_1 \times V_2. Two vertices (u1,u2)(u_1, u_2) and (v1,v2)(v_1, v_2) are adjacent if:

  • u1=v1u_1 = v_1 and {u2,v2}E2\{u_2, v_2\} \in E_2, or
  • u2=v2u_2 = v_2 and {u1,v1}E1\{u_1, v_1\} \in E_1

Example: P2P3P_2 \square P_3 produces a 2×32 \times 3 grid graph. The hypercube Qn=K2K2K2Q_n = K_2 \square K_2 \square \cdots \square K_2 (nn times) has 2n2^n vertices and nn-bit binary strings as vertex labels.

CARTESIAN PRODUCT P_2 [] P_3  (2\times3 grid)
========================================================================

  P_2: A-B      P_3: 1-2-3

  Product vertices: (A,1), (A,2), (A,3), (B,1), (B,2), (B,3)

  (A,1)-(A,2)-(A,3)    <- copies of P_3 for each vertex of P_2
    |     |     |
  (B,1)-(B,2)-(B,3)    <- copies of P_2 for each vertex of P_3

  Result: 6 vertices, 7 edges (a 2\times3 grid)

========================================================================

Properties of graph products:

ProductSymbolAdjacency conditionV\lvert V \rvertE\lvert E \rvert
CartesianGHG \square HSame in one coord, adjacent in othernmnmmHn+mGmm_H n + m_G m
TensorG×HG \times HAdjacent in both coordsnmnm2mGmH2m_G m_H
StrongGHG \boxtimes HCartesian OR tensornmnmmHn+mGm+2mGmHm_H n + m_G m + 2m_G m_H
LexicographicG[H]G[H]Adjacent in GG, OR same in GG and adjacent in HHnmnmmGn2+nGmHm_G n^2 + n_G m_H

where n=VGn = \lvert V_G \rvert, m=VHm = \lvert V_H \rvert, mG=EGm_G = \lvert E_G \rvert, mH=EHm_H = \lvert E_H \rvert.

Definition (Tensor (Categorical) Product G1×G2G_1 \times G_2). Vertex set V1×V2V_1 \times V_2. Two vertices (u1,u2)(u_1, u_2) and (v1,v2)(v_1, v_2) are adjacent if {u1,v1}E1\{u_1, v_1\} \in E_1 AND {u2,v2}E2\{u_2, v_2\} \in E_2.

For AI: Graph products appear in constructing higher-order graph structures. The tensor product of a graph with itself encodes 2-walk patterns: (AA)(i,j),(k,l)=AikAjl(A \otimes A)_{(i,j),(k,l)} = A_{ik} \cdot A_{jl}, counting simultaneous walks. In the theory of GNN expressiveness, the kk-WL test colours kk-tuples of vertices - equivalent to operating on the kk-fold tensor product G×kG^{\times k}. Grid graphs (Cartesian products of paths) arise in image processing (pixels as vertices) and in mesh-based physics simulations.

7.4 Graph Minor and Subdivision

Definition (Subdivision). A subdivision of a graph GG is obtained by inserting new vertices into edges - replacing edge {u,v}\{u,v\} with a path uw1w2vu{-}w_1{-}w_2{-}\cdots{-}v.

Definition (Graph Minor). A graph HH is a minor of GG if HH can be obtained from GG by a sequence of vertex deletions, edge deletions, and edge contractions. Edge contraction merges two adjacent vertices into one.

Theorem (Robertson-Seymour Graph Minor Theorem, 2004). In any infinite sequence of finite graphs, one is a minor of another. Equivalently, every minor-closed graph property can be characterised by a finite set of forbidden minors.

This is one of the deepest results in graph theory, proved over 20 papers spanning 1983-2004. Kuratowski's planarity theorem is a special case: planarity is characterised by two forbidden minors (K5K_5 and K3,3K_{3,3}).

For AI: Graph minor theory is primarily of theoretical interest in AI, but it provides the foundation for treewidth - a measure of how "tree-like" a graph is. Many NP-hard graph problems become polynomial-time on graphs of bounded treewidth, which is exploited in exact inference for probabilistic graphical models.

7.5 Line Graphs

Definition (Line Graph L(G)L(G)). The line graph L(G)L(G) of a graph GG has one vertex for each edge of GG. Two vertices in L(G)L(G) are adjacent if the corresponding edges in GG share an endpoint.

V(L(G))=E(G),{e1,e2}E(L(G))    e1e2V(L(G)) = E(G), \quad \{e_1, e_2\} \in E(L(G)) \iff e_1 \cap e_2 \neq \emptyset

Example: If GG is the path 12341{-}2{-}3{-}4 with edges a={1,2}a = \{1,2\}, b={2,3}b = \{2,3\}, c={3,4}c = \{3,4\}, then L(G)L(G) has vertices {a,b,c}\{a,b,c\} with edges {a,b}\{a,b\} and {b,c}\{b,c\} - another path!

Properties:

  • V(L(G))=E(G)|V(L(G))| = |E(G)|
  • degL(G)(e)=degG(u)+degG(v)2\deg_{L(G)}(e) = \deg_G(u) + \deg_G(v) - 2 where e={u,v}e = \{u,v\}
  • L(Kn)L(K_n) is the Kneser graph complement / triangular graph with (n2)\binom{n}{2} vertices

For AI: Line graphs convert edge-level problems into vertex-level problems. Since most GNN frameworks operate on vertices, transforming a graph to its line graph enables edge-centric tasks (link prediction, edge classification) to be solved with standard vertex-classification GNNs.


8. Graph Isomorphism and Invariants

8.1 Graph Isomorphism

Definition (Graph Isomorphism). Two graphs G1=(V1,E1)G_1 = (V_1, E_1) and G2=(V2,E2)G_2 = (V_2, E_2) are isomorphic, written G1G2G_1 \cong G_2, if there exists a bijection ϕ:V1V2\phi: V_1 \to V_2 such that:

{u,v}E1    {ϕ(u),ϕ(v)}E2\{u, v\} \in E_1 \iff \{\phi(u), \phi(v)\} \in E_2

The bijection ϕ\phi is called an isomorphism. Intuitively, two graphs are isomorphic if they have the same structure - the same connectivity pattern - just with different vertex labels.

GRAPH ISOMORPHISM
========================================================================

  G_1:             G_2:              Isomorphism \phi:
  A --- B         1 --- 3          A -> 1
  |     |         |     |          B -> 3
  |     |         |     |          C -> 4
  D --- C         2 --- 4          D -> 2

  Same structure (cycle C_4), different labels

========================================================================

Verifying an isomorphism - worked example.

G_1:           G_2:           Claimed isomorphism \phi:
A --- B        1 --- 3        A -> 1,  B -> 3
|     |        |     |        C -> 4,  D -> 2
|     |        |     |
D --- C        2 --- 4

Verify: for each edge in G1G_1, check that its image is an edge in G2G_2:

Edge in G1G_1Image under ϕ\phiEdge in G2G_2?
{A,B}\{A,B\}{ϕ(A),ϕ(B)}={1,3}\{\phi(A),\phi(B)\} = \{1,3\}Yes
{B,C}\{B,C\}{3,4}\{3,4\}Yes
{C,D}\{C,D\}{4,2}\{4,2\}Yes
{D,A}\{D,A\}{2,1}\{2,1\}Yes

All 4 edges map correctly and ϕ\phi is a bijection. Therefore G1G2G_1 \cong G_2. Both are C4C_4.

Proving non-isomorphism. To show G1≇G2G_1 \not\cong G_2, it suffices to find any graph invariant that differs. The quickest method: check the degree sequence. If degree sequences differ, the graphs cannot be isomorphic. If they agree, check further invariants (girth, number of triangles, spectrum).

Computational complexity. The graph isomorphism problem is one of the few natural problems in NP that is neither known to be in P nor known to be NP-complete. Babai (2016) showed it can be solved in quasi-polynomial time exp(O(logcn))\exp(O(\log^c n)), but no polynomial-time algorithm is known.

For AI: Graph isomorphism is central to GNN theory. A GNN that is invariant to vertex permutation produces the same output for isomorphic graphs. This is a design requirement: the molecular property of caffeine should not depend on how we number its atoms. Formally, if ff is a GNN and Π\Pi is a permutation matrix, then f(ΠAΠ,ΠX)=f(A,X)f(\Pi A \Pi^\top, \Pi X) = f(A, X). Equivariance is a related concept: f(ΠAΠ,ΠX)=Πf(A,X)f(\Pi A \Pi^\top, \Pi X) = \Pi f(A, X). Invariance is needed for graph-level tasks; equivariance is needed for node-level tasks (the output at each node should transform consistently with input permutations).

8.2 Graph Invariants

A graph invariant is a property or quantity that is the same for all isomorphic graphs. Graph invariants provide necessary (but generally not sufficient) conditions for isomorphism.

InvariantDescriptionDistinguishes
Number of vertices nnV\lvert V \rvertGraphs of different order
Number of edges mmE\lvert E \rvertGraphs with different size
Degree sequenceSorted list of degreesMany non-isomorphic pairs
DiameterMax distanceSome pairs with same degree sequence
GirthShortest cycle lengthSome pairs with same degree/diameter
SpectrumEigenvalues of AAMost pairs (but not all!)
Chromatic numberMin colors for proper coloringSome pairs
Number of trianglestr(A3)/6\operatorname{tr}(A^3)/6Some pairs

Non-isomorphic graphs with the same spectrum (cospectral graphs): There exist non-isomorphic graphs with identical eigenvalue spectra. The smallest example has 6 vertices. This means the spectrum alone does not determine the graph up to isomorphism.

For AI: When a GNN computes a graph-level embedding, it is computing a learned graph invariant. The expressiveness question is: which non-isomorphic graphs can the GNN distinguish? This leads directly to the Weisfeiler-Leman connection.

8.3 The Weisfeiler-Leman Test

The Weisfeiler-Leman (WL) test is a classical algorithm for testing graph isomorphism. It is not always correct (it can fail to distinguish certain non-isomorphic graphs), but it is the theoretical benchmark for GNN expressiveness.

Algorithm (1-WL / Color Refinement):

  1. Initialise: Assign every vertex the same colour c0(v)=1c_0(v) = 1.
  2. Iterate: For each vertex, compute a new colour based on its current colour and the multiset of its neighbours' colours:
ct+1(v)=HASH(ct(v),{ ⁣{ct(u):uN(v)} ⁣})c_{t+1}(v) = \operatorname{HASH}\left(c_t(v), \{\!\{c_t(u) : u \in \mathcal{N}(v)\}\!\}\right)
  1. Terminate: When the colour partition stabilises (no further refinement occurs).
  2. Decide: If the multisets of colours differ between G1G_1 and G2G_2, they are NOT isomorphic. If they are the same, the test is inconclusive.

Here { ⁣{} ⁣}\{\!\{\cdot\}\!\} denotes a multiset (a set with multiplicities).

1-WL COLOR REFINEMENT EXAMPLE
========================================================================

  Graph G:                  Iteration 0:    Iteration 1:
  A --- B                   All vertices    A: (1, {1,1}) -> colour 2
  |     |                   colour 1        B: (1, {1,1}) -> colour 2
  |     |                                   C: (1, {1,1}) -> colour 2
  D --- C                                   D: (1, {1,1}) -> colour 2

  All vertices get the same colour -> 1-WL cannot distinguish
  vertices in a regular graph after iteration 1 (for C_4)

========================================================================

Theorem (Xu et al., 2019; Morris et al., 2019). Message-passing GNNs are AT MOST as powerful as the 1-WL test. That is, if 1-WL cannot distinguish two graphs, no message-passing GNN can distinguish them either.

Conversely, there exist GNN architectures (GIN - Graph Isomorphism Network, Xu et al., 2019) that are exactly as powerful as 1-WL.

The kk-WL hierarchy. The kk-dimensional WL test (kk-WL) colours kk-tuples of vertices rather than individual vertices. Higher kk gives strictly more distinguishing power:

1-WL<2-WL<3-WL<1\text{-WL} < 2\text{-WL} < 3\text{-WL} < \cdots

Higher-order GNNs (Morris et al., 2019) achieve kk-WL power but at O(nk)O(n^k) computational cost.

For AI: The WL-GNN equivalence is one of the most important theoretical results in graph learning. It tells us:

  • Message-passing GNNs have a fundamental expressiveness ceiling
  • Some graph structures (e.g., distinguishing regular graphs of the same degree) are provably beyond standard GNNs
  • To exceed 1-WL power, we need architectural innovations: higher-order message passing, random features, subgraph counting, or graph transformers

What 1-WL cannot distinguish - a concrete example.

Graph G_1 (two triangles sharing no vertex):    Graph G_2 (one 6-cycle):

  1 - 2       4 - 5                             1 - 2
  |   |       |   |                             |   |
  3 --+       6 --+                             6   3
                                                |   |
                                                5 - 4

Both graphs: 6 vertices, 6 edges, all degrees = 2 (2-regular). The 1-WL test assigns the same colour to all vertices in both graphs at every iteration - it cannot distinguish them. But they are NOT isomorphic: G1G_1 has two components (C3C3C_3 \cup C_3) while G2G_2 is connected (C6C_6). A standard message-passing GNN also cannot distinguish these two graphs!

The Graph Isomorphism Network (GIN, Xu et al., 2019). GIN achieves 1-WL expressiveness by using the aggregation:

hv(k)=MLP(k) ⁣((1+ε(k))hv(k1)+uN(v)hu(k1))\mathbf{h}_v^{(k)} = \operatorname{MLP}^{(k)}\!\left((1 + \varepsilon^{(k)}) \cdot \mathbf{h}_v^{(k-1)} + \sum_{u \in \mathcal{N}(v)} \mathbf{h}_u^{(k-1)}\right)

The sum aggregator (not mean or max) is critical - it preserves multiset information about neighbour counts, which is equivalent to the multiset hashing in 1-WL.

8.4 Graph Automorphisms

Definition (Automorphism). An automorphism of a graph GG is an isomorphism from GG to itself: a bijection ϕ:VV\phi: V \to V that preserves adjacency.

Definition (Automorphism Group). The set of all automorphisms of GG forms a group under composition, denoted Aut(G)\operatorname{Aut}(G).

Examples:

  • Aut(Kn)=Sn\operatorname{Aut}(K_n) = S_n (the symmetric group - any permutation preserves complete adjacency)
  • Aut(Cn)=Dn\operatorname{Aut}(C_n) = D_n (the dihedral group - rotations and reflections of the cycle)
  • Aut(Pn)=Z2\operatorname{Aut}(P_n) = \mathbb{Z}_2 for n2n \geq 2 (only the identity and reversal)

Definition (Orbit). The orbit of a vertex vv under Aut(G)\operatorname{Aut}(G) is {ϕ(v):ϕAut(G)}\{\phi(v) : \phi \in \operatorname{Aut}(G)\} - the set of vertices that vv can be mapped to. Vertices in the same orbit are "structurally equivalent."

For AI: Automorphisms represent graph symmetries. A GNN that is permutation-equivariant respects all graph automorphisms: if two vertices are in the same orbit (structurally identical), the GNN assigns them the same representation (in the absence of distinguishing node features). This is why 1-WL (and standard GNNs) assign the same colour to all vertices in a regular graph - they are all in the same automorphism orbit.

8.5 Graph Motifs

Definition (Subgraph Motif). A graph motif is a small, recurring subgraph pattern that appears in a network significantly more often than in a random graph with the same degree sequence. Motifs are the structural "building blocks" of complex networks.

The canonical small motifs:

Triangle (K_3)     3-Path (P_3)      4-Cycle (C_4)     Star (S_3)
  1 -- 2             1 - 2 - 3         1 - 2             1
  |  / |                               |   |            /|\
  | /  |                               4 - 3           2 3 4
  3
MotifVerticesEdgesSignificance
Triangle K3K_333Social triadic closure; clustering coefficient
3-path P3P_332Shortest paths; betweenness centrality
4-cycle C4C_444Bipartite structure; absent in trees
Star SkS_kk+1k+1kkHub nodes; scale-free networks
Wedge (open triangle)32Potential triangle; closure probability
4-clique K4K_446Dense community core

Triangle counting. The number of triangles in GG is:

Δ(G)=16tr(A3)\Delta(G) = \frac{1}{6}\operatorname{tr}(A^3)

since (A3)ii(A^3)_{ii} counts closed walks of length 3 starting and ending at vertex ii, and each triangle is counted twice per vertex (clockwise and counterclockwise), hence the factor 1/61/6.

Local clustering coefficient. The clustering coefficient of vertex vv measures the fraction of vv's neighbour pairs that are themselves connected:

C(v)={euw:u,wN(v), euwE}(deg(v)2)C(v) = \frac{|\{e_{uw} : u,w \in N(v),\ e_{uw} \in E\}|}{\binom{\deg(v)}{2}}

When deg(v)1\deg(v) \leq 1, define C(v)=0C(v) = 0. The global clustering coefficient (transitivity) is:

Cglobal=3×number of trianglesnumber of connected triples=tr(A3)i(A2)ii(deg(vi)1)/C_{\text{global}} = \frac{3 \times \text{number of triangles}}{\text{number of connected triples}} = \frac{\operatorname{tr}(A^3)}{\sum_{i}(A^2)_{ii}(\deg(v_i)-1)/\ldots}

(A more convenient form: Cglobal=tr(A3)/1(A2diag(A2))1C_{\text{global}} = \operatorname{tr}(A^3) / \mathbf{1}^\top (A^2 - \operatorname{diag}(A^2))\mathbf{1}.)

Motif frequency vector as a graph fingerprint. The counts of each motif type in a graph form a motif frequency vector m(G)Zr\mathbf{m}(G) \in \mathbb{Z}^r. These vectors:

  • Are invariant to vertex relabelling (graph invariants).
  • Capture structural information beyond degree sequences.
  • Can distinguish many non-isomorphic graphs that stumped simpler invariants.

For AI: Motif counting is directly connected to GNN expressiveness. Standard message-passing GNNs (1-WL power) can count stars around each node but cannot count triangles or 4-cycles in the local neighbourhood without additional architecture. This was a key finding in Bouritsas et al. (2022) "Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting", which proposed encoding motif counts as additional node features to boost GNN expressiveness beyond 1-WL. Similarly, the GraphSAGE and GAT architectures implicitly learn to weight neighbour aggregations, but are blind to whether those neighbours are mutually connected (triangle detection requires at least a 2-hop view).

Practical implication. When designing a GNN for tasks where triangles matter (social network communities, protein binding sites, knowledge graph triples), consider (a) adding pre-computed triangle counts as node features, (b) using a higher-order GNN (k-WL, k2k \geq 2), or (c) using a subgraph GNN that routes information through induced subgraphs.


9. Graph Coloring

9.1 Vertex Coloring and Chromatic Number

Definition (Proper Coloring). A proper vertex coloring of a graph GG is a function c:V{1,2,,k}c: V \to \{1, 2, \ldots, k\} such that c(u)c(v)c(u) \neq c(v) whenever {u,v}E\{u, v\} \in E. Adjacent vertices must receive different colours.

Definition (Chromatic Number). The chromatic number χ(G)\chi(G) is the minimum number of colours needed for a proper colouring of GG.

Computing χ(G)\chi(G): Determining the chromatic number is NP-hard in general. For specific graph families:

Graphχ(G)\chi(G)Reason
KnK_nnnEvery pair adjacent - all need different colours
Kˉn\bar{K}_n (empty)11No adjacencies - one colour suffices
CnC_n (nn even)22Bipartite - alternate two colours
CnC_n (nn odd)33Not bipartite - need one extra colour
Tree22Bipartite (no odd cycles)
Petersen graph333-regular, girth 5
Planar graph4\leq 4Four colour theorem

Greedy Coloring Algorithm. Process vertices in some order. Assign each vertex the smallest colour not used by any of its already-coloured neighbours. This uses at most Δ(G)+1\Delta(G) + 1 colours, where Δ(G)=maxvdeg(v)\Delta(G) = \max_v \deg(v).

GREEDY COLORING EXAMPLE (path 1-2-3-4-5)
========================================================================

  Process in order 1,2,3,4,5:
  Vertex 1: neighbours = {}        -> colour 1    (first available)
  Vertex 2: neighbours = {1:red}   -> colour 2    (red used)
  Vertex 3: neighbours = {2:blue}  -> colour 1    (blue used, red free)
  Vertex 4: neighbours = {3:red}   -> colour 2    (red used)
  Vertex 5: neighbours = {4:blue}  -> colour 1    (blue used, red free)

  Result: 2 colours (optimal - path is bipartite)

========================================================================

Lower bounds on χ(G)\chi(G):

  • Clique number: χ(G)ω(G)\chi(G) \geq \omega(G) where ω(G)\omega(G) is the size of the largest clique. (A clique requires all-different colours.)
  • Fractional chromatic number: χ(G)χf(G)\chi(G) \geq \chi_f(G) where χf\chi_f is a relaxation that can be computed in polynomial time.
  • Spectrum: χ(G)1λmax/λmin\chi(G) \geq 1 - \lambda_{\max}/\lambda_{\min} where λmax,λmin\lambda_{\max}, \lambda_{\min} are the largest and smallest eigenvalues of AA.

Theorem (Brooks, 1941). For any connected graph GG that is not a complete graph or an odd cycle: χ(G)Δ(G)\chi(G) \leq \Delta(G).

For AI: Graph coloring is one of the foundational constraint satisfaction problems (CSPs). Many scheduling, timetabling, and resource allocation problems reduce to graph coloring: if two tasks conflict (share a resource), they must receive different "colours" (time slots). Solving CSPs with neural approaches (e.g., reinforcement learning for combinatorial optimization) often operates on the conflict graph. The clique-cover lower bound χ(G)ω(G)\chi(G) \geq \omega(G) connects coloring to clique detection, which is used in feature selection (finding maximally correlated feature groups).

9.2 Chromatic Polynomial

Definition (Chromatic Polynomial). The chromatic polynomial P(G,k)P(G, k) counts the number of proper colourings of GG using at most kk colours.

Theorem (Deletion-Contraction). For any edge e={u,v}e = \{u,v\}:

P(G,k)=P(Ge,k)P(G/e,k)P(G, k) = P(G - e, k) - P(G / e, k)

where GeG - e is GG with edge ee deleted and G/eG / e is GG with edge ee contracted.

Examples:

  • P(Kn,k)=k(k1)(k2)(kn+1)=k(n)P(K_n, k) = k(k-1)(k-2)\cdots(k-n+1) = k^{(n)} (falling factorial)
  • P(Kˉn,k)=knP(\bar{K}_n, k) = k^n (no constraints - each vertex independently chooses from kk colours)
  • P(T,k)=k(k1)n1P(T, k) = k(k-1)^{n-1} for any tree TT on nn vertices

Note: χ(G)=min{k:P(G,k)>0}\chi(G) = \min\{k : P(G, k) > 0\} - the chromatic number is the smallest kk for which the chromatic polynomial is positive.

Worked example - computing P(C3,k)P(C_3, k) by deletion-contraction.

C3C_3 is the triangle graph with vertices {1,2,3}\{1,2,3\} and edges e12,e23,e13e_{12}, e_{23}, e_{13}.

Step 1. Delete edge e12e_{12}: the resulting graph C3e12C_3 - e_{12} is the path P3P_3 (a tree on 3 vertices).

P(P3,k)=k(k1)2P(P_3, k) = k(k-1)^2

Step 2. Contract edge e12e_{12}: vertices 1 and 2 merge into a single vertex vv^*, giving K2K_2 (a single edge {v,3}\{v^*, 3\}).

P(K2,k)=k(k1)P(K_2, k) = k(k-1)

Step 3. Apply deletion-contraction:

P(C3,k)=P(P3,k)P(K2,k)=k(k1)2k(k1)=k(k1)(k2)P(C_3, k) = P(P_3, k) - P(K_2, k) = k(k-1)^2 - k(k-1) = k(k-1)(k-2)

Verification. For k=3k = 3: 321=63 \cdot 2 \cdot 1 = 6 proper 3-colourings of a triangle. Each of the 3!=63! = 6 permutations of colours gives a valid colouring, confirming the answer.

Verification. For k=2k = 2: 210=02 \cdot 1 \cdot 0 = 0. A triangle cannot be 2-coloured (it contains an odd cycle), consistent with χ(C3)=3\chi(C_3) = 3.

Coefficients encode structure. The chromatic polynomial P(G,k)P(G,k) is always a monic polynomial in kk of degree n=Vn = |V| with integer coefficients. The coefficient of kn1k^{n-1} equals E-|E|, and the signs alternate. Reading off the polynomial immediately reveals: χ(G)\chi(G) (smallest positive root), the number of edges, and whether GG is bipartite (all positive coefficients     \iff bipartite).

GraphP(G,k)P(G, k)χ\chi
K1K_1kk11
K2K_2k(k1)k(k-1)22
P3P_3k(k1)2k(k-1)^222
C3C_3k(k1)(k2)k(k-1)(k-2)33
C4C_4(k1)4+(k1)(k-1)^4 + (k-1)22
K3K_3k(k1)(k2)k(k-1)(k-2)33
K4K_4k(k1)(k2)(k3)k(k-1)(k-2)(k-3)44

AI connection. Chromatic polynomials appear in probabilistic graphical models: the partition function of the zero-temperature Potts model on a graph equals P(G,q)P(G, q). Computing P(G,k)P(G, k) is #P-hard in general, motivating approximate inference methods (belief propagation, variational methods) that are central to Bayesian deep learning.

9.3 The Four Color Theorem

Theorem (Appel and Haken, 1976). Every planar graph can be properly coloured with at most 4 colours: χ(G)4\chi(G) \leq 4 for all planar GG.

This theorem is historically significant for two reasons:

  1. It was the first major mathematical theorem proved with computer assistance - the proof required checking ~1,500 configurations by computer.
  2. It resolved a conjecture open since 1852 (124 years!).

The proof was controversial when published because mathematicians could not verify it by hand. A simplified computer-assisted proof was given by Robertson, Sanders, Seymour, and Thomas (1997), and the result has been formally verified in the Coq proof assistant (Gonthier, 2008).

Proof idea (Kempe chains, 1879 - flawed but instructive). Kempe attempted to prove the theorem using "Kempe chains" - maximal connected subgraphs in which only two colours appear. His proof had a flaw (found by Heawood in 1890), but Kempe chains remain a key tool in coloring proofs.

Five Color Theorem (Kempe/Heawood, 1890). Every planar graph is 5-colorable. This weaker result is provable by hand:

Proof sketch. By induction on nn. A planar graph on n6n \geq 6 vertices has a vertex vv of degree 5\leq 5 (since m3n6m \leq 3n - 6 implies average degree <6< 6). Remove vv, 5-color the rest by induction, then try to restore vv. If its 5\leq 5 neighbours use 4\leq 4 distinct colours, vv gets the 5th colour. The key case (5 neighbours using all 5 colours) is handled by a Kempe chain argument. \square

Note: 4 colours are sometimes necessary - K4K_4 is planar and needs 4 colours. But three colours suffice for most "practical" planar graphs (outerplanar graphs, for example, are always 3-colorable).

9.4 Coloring in AI

Graph coloring appears in AI and systems in several important contexts:

Register allocation in compilers. Variables that are simultaneously "live" conflict and cannot share a register. The conflict graph has variables as vertices and edges between simultaneously-live variables. Coloring this graph with kk colours assigns kk registers. The LLVM and GCC compilers use graph coloring for register allocation.

Frequency assignment. In wireless networks, adjacent cells cannot use the same frequency (interference). The interference graph's chromatic number determines the minimum number of frequencies needed.

Map colouring / geographic visualisation. The original motivation for the four-colour theorem: colouring regions of a map so adjacent regions have different colours. This is equivalent to colouring the dual graph (faces \to vertices, shared borders \to edges).

Constraint satisfaction. Many AI planning and scheduling problems reduce to graph coloring. Exam scheduling: students taking the same two courses cannot have those exams at the same time. Course conflict graph \to minimum number of time slots =χ(G)= \chi(G).

GNN feature augmentation. Adding graph coloring information (e.g., the colour assigned by a greedy algorithm) as node features can improve GNN expressiveness beyond the 1-WL limit, because coloring breaks vertex symmetry.


10. Preview: Representations, Algorithms, and Spectral Methods

This section provides brief pointers to the remaining subsections in the Graph Theory chapter. Each topic listed below has its own dedicated section with full treatment.

10.1 Graph Representations (Preview)

Preview: Graph Representations

Graphs can be stored in memory using several data structures, each with different space-time trade-offs. The adjacency matrix A{0,1}n×nA \in \{0,1\}^{n \times n} enables matrix operations (spectral methods, GCN) but uses O(n2)O(n^2) space. The adjacency list uses O(n+m)O(n + m) space and enables fast neighbour enumeration. CSR format (Compressed Sparse Row) combines the benefits of both for sparse graphs and is the standard in large-scale GNN frameworks.

-> Full treatment: Graph Representations

10.2 Graph Algorithms (Preview)

Preview: Graph Algorithms

Classical algorithms compute the properties defined in this section. BFS computes shortest paths in unweighted graphs and identifies connected components. DFS detects cycles, finds bridges, and computes strongly connected components. Dijkstra's algorithm finds shortest paths in weighted graphs. Minimum spanning tree algorithms (Prim, Kruskal) find the cheapest connected spanning subgraph.

-> Full treatment: Graph Algorithms

10.3 Spectral Graph Theory (Preview)

Preview: Spectral Graph Theory

The eigenvalues and eigenvectors of graph matrices (adjacency matrix AA, Laplacian L=DAL = D - A, normalised Laplacian Lsym=ID1/2AD1/2L_{\text{sym}} = I - D^{-1/2}AD^{-1/2}) reveal deep structural properties. The number of zero eigenvalues of LL equals the number of connected components. The Fiedler vector (second-smallest eigenvector of LL) provides the optimal graph bipartition, forming the basis of spectral clustering. The Cheeger inequality connects the algebraic spectrum to combinatorial expansion.

-> Full treatment: Spectral Graph Theory

10.4 Graph Neural Networks (Preview)

Preview: Graph Neural Networks

GNNs learn vertex and graph-level representations by iterating a message-passing scheme: hv(k+1)=UPDATE(hv(k),AGG({hu(k):uN(v)}))\mathbf{h}_v^{(k+1)} = \text{UPDATE}(\mathbf{h}_v^{(k)}, \text{AGG}(\{\mathbf{h}_u^{(k)} : u \in \mathcal{N}(v)\})). The Graph Convolutional Network (GCN, Kipf & Welling, 2017) uses the normalised adjacency matrix: H(l+1)=σ(D~1/2A~D~1/2H(l)W(l))H^{(l+1)} = \sigma(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}H^{(l)}W^{(l)}). Graph Attention Networks (GAT) learn edge weights dynamically. The expressiveness of message-passing GNNs is bounded by the 1-WL test (8.3).

-> Full treatment: Graph Neural Networks

10.5 Random Graphs (Preview)

Preview: Random Graphs

Random graph models generate graphs probabilistically. The Erdos-Renyi model G(n,p)G(n, p) includes each edge independently with probability pp, producing Poisson degree distributions. The Barabasi-Albert model generates scale-free graphs with power-law degree distributions via preferential attachment. The Watts-Strogatz model produces small-world graphs with high clustering and short diameters.

-> Full treatment: Random Graphs


11. Common Mistakes

#MistakeWhy It's WrongFix
1Confusing "path" and "walk"A walk allows vertex repetition; a path does not. Claiming a walk is a path overstates the constraint.Use "walk" for general traversals, "path" only when no vertex is repeated.
2Assuming all graphs are simpleReal-world graphs often have self-loops (user follows self) or parallel edges (multiple bus routes). GCN explicitly adds self-loops.State "simple graph" explicitly when assuming no self-loops or parallel edges.
3Forgetting that edges in undirected graphs are unorderedWriting (u,v)(u,v) for undirected edges suggests direction. {u,v}={v,u}\{u,v\} = \{v,u\} but (u,v)(v,u)(u,v) \neq (v,u).Use {u,v}\{u,v\} for undirected, (u,v)(u,v) for directed.
4Claiming degree sequence determines the graphMany non-isomorphic graphs share the same degree sequence. Example: C6C_6 and K3,3MK_{3,3} \setminus M (perfect matching removed).Degree sequence is a necessary but not sufficient invariant for isomorphism.
5Assuming bipartite     \iff 2-colorableThese are actually equivalent! The common mistake is failing to recognise this equivalence and testing bipartiteness separately from 2-coloring.Bipartite \Leftrightarrow no odd cycle \Leftrightarrow χ(G)2\chi(G) \leq 2. Use BFS 2-coloring to test bipartiteness.
6Confusing "connected" with "strongly connected" for digraphsA weakly connected digraph may have vertex pairs with no directed path between them. Strong connectivity requires directed paths in both directions.Always specify "weakly" or "strongly" for directed graphs.
7Treating graph complement as "flipping edges"The complement removes existing edges AND adds all missing edges. It's not "toggle each edge" unless you mean exactly that (which IS the complement).Gˉ\bar{G} has edge {u,v}\{u,v\} iff GG does NOT have edge {u,v}\{u,v\}.
8Assuming GNNs can distinguish any two non-isomorphic graphsMessage-passing GNNs are bounded by 1-WL. They fail on regular graphs of the same degree and many other cases.Use 1-WL as the upper bound. For tasks requiring more power, consider higher-order GNNs or graph transformers.
9Counting edges in KnK_n as n2n^2KnK_n has (n2)=n(n1)/2\binom{n}{2} = n(n-1)/2 edges, not n2n^2. The adjacency matrix has n2n^2 entries but is symmetric with zero diagonal.Use the formula (n2)\binom{n}{2} for undirected, n(n1)n(n-1) for directed complete graphs.
10Forgetting the handshaking lemma when debugging graph codeIf your computed degree sum is odd, the graph is invalid - odd degree sums are impossible.Always verify $\sum \deg(v) = 2

12. Exercises

Exercise 1 * - Handshaking Lemma Verification

Construct three specific graphs (your choice of vertex/edge sets) and verify the handshaking lemma deg(v)=2E\sum \deg(v) = 2|E| for each. (a) A graph with 5 vertices and 7 edges. (b) A directed graph with 4 vertices - verify deg+=deg=E\sum \deg^+ = \sum \deg^- = |E|. (c) A bipartite graph K3,4K_{3,4} - compute degrees of both sides.

Exercise 2 * - Degree Sequences and Graphicality

(a) Determine whether each sequence is graphic (can be realised as a simple graph): (4,3,3,2,2)(4, 3, 3, 2, 2), (3,3,3,1)(3, 3, 3, 1), (5,3,2,2,2,2)(5, 3, 2, 2, 2, 2). (b) For each graphic sequence, construct a graph that realises it. (c) Implement the Erdos-Gallai test programmatically.

Exercise 3 * - Paths, Cycles, and Distance

Given the Petersen graph (look up its edge list): (a) Find all shortest paths from vertex 0 to vertex 5. (b) Compute the diameter and radius. (c) Verify the girth is 5 by finding a shortest cycle. (d) Show that the Petersen graph has no Hamiltonian cycle (explain why, or try exhaustive search).

Exercise 4 ** - Tree Equivalences

(a) Prove that if GG is connected and has exactly n1n - 1 edges, then GG is acyclic. (b) Prove that in a tree, there is exactly one path between every pair of vertices. (c) Compute the number of spanning trees of K4K_4 using Kirchhoff's matrix tree theorem (det\det of any cofactor of LL) and verify against Cayley's formula 442=164^{4-2} = 16.

Exercise 5 ** - Bipartiteness Testing

(a) Implement a BFS-based algorithm to test whether a graph is bipartite. (b) If bipartite, output the bipartition (U,W)(U, W). (c) If not bipartite, output an odd cycle as a certificate. (d) Test on: C6C_6 (bipartite), C7C_7 (not bipartite), Petersen (not bipartite), K3,3K_{3,3} (bipartite).

Exercise 6 ** - Graph Distance and Eccentricity

(a) Compute the all-pairs shortest path distance matrix for a given graph using BFS. (b) From the distance matrix, compute eccentricity, diameter, radius, and center. (c) Verify: for any connected graph, rad(G)diam(G)2rad(G)\operatorname{rad}(G) \leq \operatorname{diam}(G) \leq 2 \cdot \operatorname{rad}(G).

Exercise 7 *** - Weisfeiler-Leman Test and GNN Expressiveness

(a) Implement the 1-WL color refinement algorithm. (b) Run it on two non-isomorphic 3-regular graphs on 6 vertices. Does 1-WL distinguish them? (c) Construct a pair of non-isomorphic graphs that 1-WL CANNOT distinguish (hint: use two regular graphs with the same degree sequence and spectrum). (d) Explain why this means a standard message-passing GNN cannot distinguish them either.

Exercise 8 *** - Real-World Graph Analysis

Model a real-world system as a graph and analyse it: (a) Choose a domain: social network (use Zachary's karate club), citation network, molecular graph, or small knowledge graph. (b) Compute: number of vertices/edges, degree distribution, diameter, connected components, clustering coefficient. (c) Test whether the graph is bipartite, planar, or has any special structure. (d) Discuss: what would a 2-layer GNN "see" on this graph? Which pairs of vertices can exchange information?


13. Why This Matters for AI (2026 Perspective)

ConceptAI Impact
Graph definition G=(V,E)G = (V, E)The foundational data structure for GNNs, knowledge graphs, and computation graphs in all major frameworks (PyTorch Geometric, DGL, JAX-based graph libraries)
Directed graphsComputation graphs (autograd DAGs), causal graphs (structural causal models), citation networks, dependency parse trees
Weighted graphsAttention weights in transformers define weighted digraphs; similarity graphs (kk-NN) use distance weights; Markov chains use transition probabilities
Degree and degree distributionDetermines GNN aggregation balance; power-law distributions cause over-squashing; degree-based normalisation (D1/2AD1/2D^{-1/2}AD^{-1/2}) is standard in GCN
Paths and distanceGNN receptive field is bounded by path length; graph diameter determines minimum GNN depth; shortest-path features improve GNN expressiveness
ConnectivityDisconnected components are processed independently by message-passing GNNs; virtual nodes connect components artificially
Trees and DAGsDecision trees (XGBoost), parse trees (NLP), computation graphs (autograd), Bayesian networks, causal DAGs
Bipartite graphsUser-item recommendation (matrix factorization), bipartite matching (assignment), entity-relation graphs
Graph isomorphismGNN permutation invariance; graph-level classification requires isomorphism-invariant readout functions
Weisfeiler-Leman testProvable upper bound on message-passing GNN expressiveness (Xu et al., 2019); drives research into more expressive architectures (Graph Transformers, subgraph GNNs)
Graph coloringScheduling, resource allocation, register allocation in compilers; CSP solving with neural methods
HypergraphsHigher-order attention (multi-head attention as learned hyperedges); set function learning (DeepSets); group interaction modeling
Planar graphsGeographic ML, mesh-based physics simulations (weather prediction, fluid dynamics with GNNs)

14. Conceptual Bridge

Looking Back

This section builds on the mathematical foundations established in earlier chapters:

  • Sets and functions (Ch. 01) provide the language for defining vertex sets, edge sets, and the mappings (isomorphisms, colourings) between them
  • Matrix operations (Ch. 02) connect through the adjacency matrix AA - the handshaking lemma is 1A1=2E\mathbf{1}^\top A \mathbf{1} = 2|E|, and walk counting uses matrix powers AkA^k
  • Eigenvalues (Ch. 03) will become central in the next section - the spectrum of AA and LL encodes deep structural information about the graph

Looking Forward

The vocabulary and theory developed here is the foundation for the rest of the Graph Theory chapter:

  • 02 Graph Representations takes the abstract objects defined here (adjacency, degree, connectivity) and asks: how do we store them in memory?
  • 03 Graph Algorithms provides efficient algorithms for computing the properties defined here: BFS for distance, DFS for connectivity, Dijkstra for weighted shortest paths
  • 04 Spectral Graph Theory analyses the eigenvalues of AA, DD, and LL - connecting the combinatorial properties (connectivity, bipartiteness, clustering) to algebraic properties (spectrum)
  • 05 Graph Neural Networks builds learnable functions on graphs, with the message-passing paradigm directly implementing walk-based aggregation on the structures defined here
  • 06 Random Graphs asks: what happens when edges are drawn randomly? The degree sequences, connectivity thresholds, and component structure we defined here become probabilistic phenomena

The Big Picture

GRAPH BASICS IN THE CURRICULUM
========================================================================

  Ch.01 Sets & Logic -----------> Vertex sets, edge sets, mappings
           |
  Ch.02 Linear Algebra ---------> Adjacency matrix, degree matrix
           |
           v
  +---------------------------------------------------------+
  |                  01 GRAPH BASICS                        |
  |   Definitions * Degree * Paths * Connectivity * Trees    |
  |   Bipartite * Coloring * Isomorphism * WL Test           |
  +-------+-----------+-----------+-----------+-------------+
          |           |           |           |
          v           v           v           v
       02          03        04         05
    Represent.   Algorithms   Spectral     GNNs
    (storage)   (BFS, DFS)  (eigenvals)  (learning)
                                           |
                                           v
                                    06 Random Graphs
                                    (probabilistic)

========================================================================

15. Quick Computational Reference

The table below collects the key formulas and algorithms from this section for fast lookup during implementation.

15.1 Degree and Edge Counting

QuantityFormulaPython (NetworkX)
Total edgesm=12vdeg(v)m = \tfrac{1}{2}\sum_v \deg(v)G.number_of_edges()
Max degreeΔ(G)=maxvdeg(v)\Delta(G) = \max_v \deg(v)max(d for _,d in G.degree())
Min degreeδ(G)=minvdeg(v)\delta(G) = \min_v \deg(v)min(d for _,d in G.degree())
Average degreedˉ=2m/n\bar{d} = 2m/n2*G.number_of_edges()/G.number_of_nodes()
Degree sequenceSorted list of degreessorted([d for _,d in G.degree()], reverse=True)

15.2 Adjacency Matrix Operations

OperationMatrix expressionInterpretation
Degree of viv_i(A1)i(A\mathbf{1})_iRow sum
Number of walks of length kk(Ak)ij(A^k)_{ij}Matrix power
Number of triangles16tr(A3)\tfrac{1}{6}\operatorname{tr}(A^3)Trace of cube
Graph LaplacianL=DAL = D - AD=diag(A1)D = \operatorname{diag}(A\mathbf{1})
Normalised LaplacianLsym=ID1/2AD1/2L_{\text{sym}} = I - D^{-1/2}AD^{-1/2}Eigenvalues in [0,2][0, 2]

15.3 Graph Properties - Decision Table

PropertyAlgorithmTime complexity
ConnectivityBFS/DFS from any vertexO(n+m)O(n + m)
Strong connectivityKosaraju or TarjanO(n+m)O(n + m)
BipartitenessBFS 2-colouringO(n+m)O(n + m)
Euler circuitCheck all degrees even + connectedO(n+m)O(n + m)
Spanning treeBFS/DFS treeO(n+m)O(n + m)
Minimum spanning treeKruskal or PrimO(mlogn)O(m \log n)
Shortest path (unweighted)BFSO(n+m)O(n + m)
Shortest path (weighted)Dijkstra (non-negative)O((n+m)logn)O((n+m)\log n)
Topological sort (DAG)Kahn's algorithmO(n+m)O(n + m)
Planarity testBoyer-MyrvoldO(n)O(n)

15.4 Graph Invariant Checklist

When you need to test whether two graphs G1,G2G_1, G_2 might be isomorphic, check these invariants in order from cheapest to most expensive:

  1. V|V| equal? (O(1))
  2. E|E| equal? (O(1))
  3. Degree sequences equal? (O(n log n))
  4. Degree distribution equal? (O(n))
  5. Number of triangles equal? (O(n^3) or O(n^{2.37}) with fast matrix multiply)
  6. Characteristic polynomial of AA equal? (O(n^3))
  7. Chromatic polynomial equal? (#P-hard in general)
  8. Run VF2 or Nauty for exact isomorphism check.

If any invariant differs, the graphs are definitively not isomorphic. If all agree, they are likely isomorphic but not guaranteed (cospectral non-isomorphic graphs exist).

Implementation note. In practice, NetworkX provides nx.is_isomorphic(G1, G2) using VF2, and nx.graph_atlas_g() for enumeration. For large graphs ($n > 10^4$), use approximate fingerprinting (degree sequence + triangle count + eigenvalue moments) before exact checks.


16. Section Summary

This section covered the full vocabulary of graph theory needed to read modern GNN papers and implement graph algorithms from scratch.

Core objects defined:

  • A graph G=(V,E)G = (V, E) is a pair of a vertex set and an edge set; variants include directed, weighted, simple, multi, and hypergraphs.
  • The degree of a vertex counts its incident edges; the handshaking lemma deg(v)=2E\sum \deg(v) = 2|E| is the universal sanity check.
  • Walks, trails, paths, and cycles form a hierarchy of traversal objects; Eulerian and Hamiltonian structures live at opposite ends of algorithmic tractability.

Core structural theorems:

  • Connectivity is characterised by BFS/DFS reachability; bridges and articulation points are the fragile points; Menger's theorem equates disjoint paths with cuts.
  • Trees admit six equivalent definitions and are the minimum connected structures; spanning trees compress any connected graph.
  • Bipartite graphs are exactly those with no odd cycles, equivalent to 2-colourability, and support Konig's matching theorem.
  • Planar graphs satisfy Euler's formula nm+f=2n - m + f = 2 and are characterised by Kuratowski's forbidden minors.

Core AI connections:

  • The Weisfeiler-Leman test (1-WL) is the provable upper bound on the expressiveness of all message-passing GNNs, showing that structural indistinguishability in WL implies indistinguishability by GNNs.
  • Graph motifs (triangles, stars, paths) are the structural units that standard GNNs cannot count, motivating expressiveness research.
  • Graph isomorphism, automorphisms, and invariants formalise what it means for a graph function to be permutation-invariant - the foundational requirement for graph-level prediction.

The theory notebook works through all matrix computations, proofs, and visualisations in executable Python. The exercises notebook provides graded practice from basic degree calculations to proving WL indistinguishability of specific graph pairs.

The single most important takeaway: graphs are simultaneously combinatorial objects (studied via degree, paths, cycles), algebraic objects (studied via matrices and polynomials), and computational objects (studied via algorithms and complexity). GNNs sit at the intersection of all three perspectives - they are learned functions that respect the algebraic symmetries of graphs while computing efficiently on the combinatorial structure. Every concept in this section reappears in that context.


17. Further Reading

Foundational Texts

  • Diestel, R. Graph Theory (5th ed., 2017) - The standard graduate reference. Free PDF at diestel-graph-theory.com. Chapters 1-2 cover the material in this section with full proofs.
  • West, D. Introduction to Graph Theory (2nd ed., 2001) - Excellent undergraduate text with many exercises. Chapters 1 (Fundamental Concepts) and 2 (Trees and Distance) map directly to 01.
  • Bondy, J. A. and Murty, U. S. R. Graph Theory (2008, Springer GTM) - Comprehensive coverage including extremal graph theory and Ramsey theory.

For the AI/ML Connection

  • Hamilton, W. L. Graph Representation Learning (2020, Synthesis Lectures) - The definitive ML-focused introduction. Chapter 2 covers graph basics from an ML perspective. Free draft at cs.mcgill.ca/~wlh/grl_book/.
  • Xu, K. et al. "How Powerful are Graph Neural Networks?" ICLR 2019 - Establishes the 1-WL upper bound on message-passing GNNs. Essential reading before implementing GNNs.
  • Bronstein, M. et al. "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges" arXiv 2021 - Places graphs in the broader context of geometric priors in ML (symmetry, equivariance).

Historical Papers

  • Euler, L. "Solutio problematis ad geometriam situs pertinentis" (1736) - The Konigsberg bridges paper: the first graph theory result.
  • Erdos, P. and Renyi, A. "On Random Graphs I" Publicationes Mathematicae (1959) - Founding paper of random graph theory; introduces the G(n,p)G(n,p) model studied in 06.
  • Appel, K. and Haken, W. "Every planar map is four colorable" Illinois J. Math. (1977) - The first major computer-assisted proof in mathematics.

<- Back to Graph Theory | Next: Graph Representations ->


Appendix A: Graph Theory Notation Reference

A quick-reference table for the notation used throughout this section and the rest of the chapter.

A.1 Graph Notation

SymbolMeaning
G=(V,E)G = (V, E)Graph with vertex set VV and edge set EE
n=Vn = \lvert V \rvertOrder (number of vertices)
m=Em = \lvert E \rvertSize (number of edges)
uvu \sim vVertices uu and vv are adjacent
N(v)\mathcal{N}(v)Neighbourhood of vv: all vertices adjacent to vv
N[v]\mathcal{N}[v]Closed neighbourhood: N(v){v}\mathcal{N}(v) \cup \{v\}
deg(v)\deg(v)Degree of vertex vv in an undirected graph
deg+(v)\deg^+(v)Out-degree of vv in a digraph
deg(v)\deg^-(v)In-degree of vv in a digraph
δ(G)\delta(G)Minimum degree: minvVdeg(v)\min_{v \in V}\deg(v)
Δ(G)\Delta(G)Maximum degree: maxvVdeg(v)\max_{v \in V}\deg(v)
d(u,v)d(u, v)Graph distance (shortest path length) from uu to vv
diam(G)\operatorname{diam}(G)Diameter: maxu,vd(u,v)\max_{u,v} d(u,v)
rad(G)\operatorname{rad}(G)Radius: minvmaxud(u,v)\min_v \max_u d(u,v)
χ(G)\chi(G)Chromatic number
κ(G)\kappa(G)Vertex connectivity
λ(G)\lambda(G)Edge connectivity
G[S]G[S]Induced subgraph on vertex subset SS
GvG - vGraph with vertex vv and its incident edges removed
GeG - eGraph with edge ee removed
G/eG / eGraph with edge ee contracted
Gˉ\bar{G}Complement of GG
L(G)L(G)Line graph of GG
Aut(G)\operatorname{Aut}(G)Automorphism group of GG

A.2 Special Graph Families

SymbolNameDefinition
KnK_nComplete graphnn vertices, all (n2)\binom{n}{2} edges
Km,nK_{m,n}Complete bipartitem+nm + n vertices, mnmn edges in bipartite form
CnC_nCyclenn vertices in a single cycle
PnP_nPathnn vertices in a single path
Kˉn\bar{K}_nEmpty graphnn vertices, no edges
SnS_nStarOne center vertex connected to n1n-1 leaves
WnW_nWheelCnC_n plus one central vertex connected to all
QnQ_nHypercube2n2^n vertices, nn-bit binary strings as vertices

A.3 Graph Matrices

MatrixSymbolDefinitionSize
AdjacencyAAAij=1A_{ij} = 1 if {i,j}E\{i,j\} \in En×nn \times n
DegreeDDD=diag(deg(v1),,deg(vn))D = \operatorname{diag}(\deg(v_1), \ldots, \deg(v_n))n×nn \times n diagonal
LaplacianLLL=DAL = D - An×nn \times n
Normalised LaplacianLsymL_{\text{sym}}ID1/2AD1/2I - D^{-1/2}AD^{-1/2}n×nn \times n
IncidenceBBBve=1B_{ve} = 1 if vev \in e (undirected)n×mn \times m
DistanceD\mathcal{D}Dij=d(vi,vj)\mathcal{D}_{ij} = d(v_i, v_j)n×nn \times n

Appendix B: Key Theorems Summary

TheoremStatementSignificance
Handshaking Lemmavdeg(v)=2E\sum_{v}\deg(v) = 2\lvert E \rvertFundamental constraint; useful for validation
Euler's TheoremEulerian circuit     \iff all degrees evenFirst theorem of graph theory (1736)
Odd-Cycle TheoremBipartite     \iff no odd cyclesConnects structure to coloring
Tree Equivalences6 equivalent definitions of a treeFoundational for spanning trees
Cayley's Formulann2n^{n-2} spanning trees in KnK_nCounts labeled trees
Matrix Tree TheoremSpanning trees = any cofactor of LLConnects combinatorics to linear algebra
Euler's Planar Formulanm+f=2n - m + f = 2Topological constraint on planar graphs
Kuratowski's TheoremPlanar     \iff no K5K_5 or K3,3K_{3,3} subdivisionCharacterises planarity
Four Color Theoremχ(G)4\chi(G) \leq 4 for planar GGComputer-assisted proof (1976/2008)
Brooks' Theoremχ(G)Δ(G)\chi(G) \leq \Delta(G) (non-complete, non-odd-cycle)Upper bound on chromatic number
Konig's TheoremMax matching = min vertex cover (bipartite)Foundation of bipartite combinatorics
Menger's TheoremMax disjoint paths = min cutMax-flow min-cut for graphs
WL-GNN EquivalenceMPNN power \leq 1-WL test (Xu et al., 2019)Expressiveness bound for GNNs
Robertson-SeymourEvery minor-closed property has finite forbidden setDeepest result in structural graph theory

Appendix C: Common Graph Families - Properties at a Glance

GraphnnmmRegular?Bipartite?Planar?χ\chiDiameter
KnK_nnnn(n1)/2n(n{-}1)/2(n1)(n{-}1)-regn2n \leq 2n4n \leq 4nn11
Kn,nK_{n,n}2n2nn2n^2nn-regYesn2n \leq 22222
CnC_nnnnn22-regnn evenYesn/2\lfloor n/2 \rfloorn/2\lfloor n/2 \rfloor
PnP_nnnn1n{-}1NoYesYes22n1n{-}1
Tree (any)nnn1n{-}1No (gen.)YesYes22n1\leq n{-}1
Petersen1010151533-regNoNo3322
QnQ_n (hypercube)2n2^nn2n1n2^{n-1}nn-regYesn3n \leq 322nn