Machine Learning with Graphs

view on github

1. Introduction and Structure of Graphs


1.1 Components of Graph

  • Objects: nodes, vertices $N$
  • Interactions: links, edges $E$
  • System: network, graph $G(N,E)$

1.2 Features of Graph

  • Avg Degree $\bar{k}=\frac{1}{N}{\sum_{i=1}^{N}k_i}=\frac{2E}{N}$
  • Adjacency Matrix: Undirected and Directed Graphs

1.3 Edge Attributes

  • Weight(e.g. frequency of communication)
  • Ranking(best friend, second best friend)
  • Type(friend, relative, co-worker)
  • Sign(Friend vs. Foe, Trust vs. Distrust)

1.4 Types of Graphs

  • Unweighted; Weighted
  • Self-edges; Multigraph




2. Properties of Networks and Random Graph Models


2.1 Notations

  • Degree Distribution: $P(k)$
  • Path Length: $h$
  • Clustering Coefficient: $C$
  • Connected Components: $s$

2.2 Paths in a Graph

  • Distance in a Graph
  • Network Diameter: The maximum(shortest path) distance between any pair of nodes in a graph
  • Average path length;

2.3 Clustering Coefficient $C_i$

  • Undirected Graph Clustering Coefficient
    • How connected are $i$’s neighbours to each other?
    • Node $i$ with degree $k_i$
    • $C_i = \frac{2e_i}{k_i(k_i-1)}$, denominator show the maximum edge connections for neighbour nodes of node $i$

2.4 Connectivity

  • Size of largest connected component
  • Largest component = Giant Component

2.5 Propertities of $G_{np}$

  • Degree distribution: $p(k)=C_{n-1}^{k}p^k(1-p)^{n-1-k}$
  • Clustering Coefficient of $G_{np}$: $C=p=\bar{k}/n$
  • Averge Path Length: $O(\log{n})$

2.5 Small-World Model

  • Can we have high clustering while also having short paths?




3. Motifs and Structure Roles in Networks


3.1 Subnetwork

3.2 Network Motifs

  • Motif: Recurring, Significant, Patterns of interconnections
  • And motif occur in the real network more often than the random network.
  • $Z_i$ captures the siginificance of motif i
  • Network Significance Profile(SP):

3.3 Graphlets

  • Graph Degree Vector
  • Automorphism Orbits

3.4 Graph Isomophism

  • Example: Are $G$ and $H$ topologically equivalent?




4. Graph Representation Learning


Following fields are Graph Representation Learning focused on:

  • Node Classification
  • Link Prediction

4.1 Feature Learning in Graphs

  • Feature Representation Embedding
  • Task: Map each node in a network into a low-dimensional space

4.2 Node Embedding

  • CNN for fixed-size images/grids
  • RNNs or word2vec for text/sequences

4.3 Embedding Nodes Task

Hence, we can analyse the similarity of those nodes in space, and we have many approaches to measure the distance like Eucliden Distance, Cos Vector etc. In this way, we can just use the

4.4 Random Walk Approaches to Node Embeddings

  • $z_u^{T}z_v$ probability that $v$ and $u$ co-occur on a random walk over the network

4.5 Unsupervised Feature Learning




5. Graph Neural Network


Lecture video

5.1 Nodes Embeddings

  • Two Key Components:
    • Encoder
    • Similarity Function

5.2 Basics of Deep Learning for Graphs

  • Idea: Neighbourhood Aggregation
  • Final layer $h^{K}{v}$ is embedding of $\mathbf{z}{v}$
  • Train the Model
    • $\mathbf{W}_{k}$
    • $\mathbf{B}_{k}$
  • Inductive Capability
  • So far, the GraphNN aggregate the neighbour messages by taking their(weighted) average

5.3 GraphSAGE Graph Neural Network Architecture

  • Concatenece:
    • Concatenate neighbour embedding and self embedding
    • Unlike Graph Convolution with adding itself, we just concatenate itself features then activate with non-linearity function
  • Aggregation:
    • Use generalized aggregation function
    • Unlike Graph Convolution with just average
  • Aggregation Variants
    • Generally, there are several ways to implement aggregate
    • Mean
    • Pool (mean or max across a coordinate)
    • LSTM (make model much deeper with LSTM)

Hints: we can apply different pooling startegies

5.4 Implementation

  • Notation:
    • $D$ is degree matrix
    • $A$ is adjeacent matrix
    • $H^{k-1}$ is message matrix from previous layer
  • $D^{-1}$ matrix acts as a mean function in this formula.
  • $AH^{k-1}$ is aimed to sum all neighbour features

5.5 Graph Attention Network (GAT)

  • Simple Neighbourhood Aggregation in Graph Convolution
    • Use coefficient of ${\alpha}_{vu}$
    • All neighbour $u \in {N}(v)$ are equally important to node $v$
  • Attention Mechanism
    • Use $e_{vu}$ as coefficient
    • Mechanism $a$ may achieve in different ways including Simple Single-Layer Neural Network

5.6 Papers on GNN