Heming Zhang's Blogs

Machine Learning with Graphs

view on github

1. Introduction and Structure of Graphs
2. Properties of Networks and Random Graph Models
3. Motifs and Structure Roles in Networks
4. Graph Representation Learning
5. Graph Neural Network

History Papers
- A new model earning in graph domains
Course Material
- Course CS 224w: Machine Learning with Graphs
- Course Video Link

1. Introduction and Structure of Graphs

1.1 Components of Graph

Objects: nodes, vertices $N$
Interactions: links, edges $E$
System: network, graph $G(N,E)$

1.2 Features of Graph

Avg Degree $\bar{k}=\frac{1}{N}{\sum_{i=1}^{N}k_i}=\frac{2E}{N}$
Adjacency Matrix: Undirected and Directed Graphs

1.3 Edge Attributes

Weight(e.g. frequency of communication)
Ranking(best friend, second best friend)
Type(friend, relative, co-worker)
Sign(Friend vs. Foe, Trust vs. Distrust)

1.4 Types of Graphs

Unweighted; Weighted
Self-edges; Multigraph

2. Properties of Networks and Random Graph Models

2.1 Notations

Degree Distribution: $P(k)$
Path Length: $h$
Clustering Coefficient: $C$
Connected Components: $s$

2.2 Paths in a Graph

Distance in a Graph
Network Diameter: The maximum(shortest path) distance between any pair of nodes in a graph
Average path length;

2.3 Clustering Coefficient $C_i$

Undirected Graph Clustering Coefficient
- How connected are $i$’s neighbours to each other?
- Node $i$ with degree $k_i$
- $C_i = \frac{2e_i}{k_i(k_i-1)}$, denominator show the maximum edge connections for neighbour nodes of node $i$

2.4 Connectivity

Size of largest connected component
Largest component = Giant Component

2.5 Propertities of $G_{np}$

Degree distribution: $p(k)=C_{n-1}^{k}p^k(1-p)^{n-1-k}$
Clustering Coefficient of $G_{np}$: $C=p=\bar{k}/n$
Averge Path Length: $O(\log{n})$

2.5 Small-World Model

Can we have high clustering while also having short paths?

3. Motifs and Structure Roles in Networks

3.1 Subnetwork

3.2 Network Motifs

Motif: Recurring, Significant, Patterns of interconnections
And motif occur in the real network more often than the random network.
$Z_i$ captures the siginificance of motif i
Network Significance Profile(SP):

3.3 Graphlets

Graph Degree Vector
Automorphism Orbits

3.4 Graph Isomophism

Example: Are $G$ and $H$ topologically equivalent?

4. Graph Representation Learning

Following fields are Graph Representation Learning focused on:

Node Classification
Link Prediction

4.1 Feature Learning in Graphs

Feature Representation Embedding
Task: Map each node in a network into a low-dimensional space

4.2 Node Embedding

CNN for fixed-size images/grids
RNNs or word2vec for text/sequences

4.3 Embedding Nodes Task

Hence, we can analyse the similarity of those nodes in space, and we have many approaches to measure the distance like Eucliden Distance, Cos Vector etc. In this way, we can just use the

4.4 Random Walk Approaches to Node Embeddings

$z_u^{T}z_v$ probability that $v$ and $u$ co-occur on a random walk over the network

4.5 Unsupervised Feature Learning

5. Graph Neural Network

Lecture video

5.1 Nodes Embeddings

Two Key Components:
- Encoder
- Similarity Function

5.2 Basics of Deep Learning for Graphs

Idea: Neighbourhood Aggregation

Final layer $h^{K}{v}$ is embedding of $\mathbf{z}{v}$
Train the Model
- $\mathbf{W}_{k}$
- $\mathbf{B}_{k}$
Inductive Capability
So far, the GraphNN aggregate the neighbour messages by taking their(weighted) average

5.3 GraphSAGE Graph Neural Network Architecture

Concatenece:
- Concatenate neighbour embedding and self embedding
- Unlike Graph Convolution with adding itself, we just concatenate itself features then activate with non-linearity function
Aggregation:
- Use generalized aggregation function
- Unlike Graph Convolution with just average

Aggregation Variants
- Generally, there are several ways to implement aggregate
- Mean
- Pool (mean or max across a coordinate)
- LSTM (make model much deeper with LSTM)

Hints: we can apply different pooling startegies

5.4 Implementation

Notation:
- $D$ is degree matrix
- $A$ is adjeacent matrix
- $H^{k-1}$ is message matrix from previous layer
$D^{-1}$ matrix acts as a mean function in this formula.
$AH^{k-1}$ is aimed to sum all neighbour features

5.5 Graph Attention Network (GAT)

Simple Neighbourhood Aggregation in Graph Convolution
- Use coefficient of ${\alpha}_{vu}$
- All neighbour $u \in {N}(v)$ are equally important to node $v$

Attention Mechanism
- Use $e_{vu}$ as coefficient
- Mechanism $a$ may achieve in different ways including Simple Single-Layer Neural Network