Title: Edwin R' Hancock and Richard Wilson
1The University of York
Recent Progress on Learning with Graph
Representations
Edwin R. Hancockand Richard Wilson With help
from Bai Xiao Bin Luo, Antonio Robles-Kelly and
Andrea Torsello. University of YorkComputer
Science DepartmentYORK Y010 5DD,
UK. erh_at_cs.york.ac.uk
2Outline
- Motivation Background
- Graphs from images
- Spectral invariants
- Lifting cospectrality
- Generative models and description length
- Conclusions
3Motivation
4Problem
In computer vision graph-structures are used to
abstract image structure. However, the algorithms
used to segment the image primitives are not
reliable. As a result there are both additional
and missing nodes (due to segmentation error) and
variations in edge-structure. Hence image
matching and recognition can not be reduced to a
graph isomorphism or even a subgraph isomorphism
problem. Instead inexact graph matching methods
are needed.
5Problem
In computer vision graph-structures are used to
abstract image structure. However, the algorithms
used to segment the image primitives are not
reliable. As a result there are both additional
and missing nodes (due to segmentation error) and
variations in edge-structure. Hence image
matching and recognition can not be reduced to a
graph isomorphism or even a subgraph isomorphism
problem. Instead inexact graph matching methods
are needed.
6Problem
In computer vision graph-structures are used to
abstract image structure. However, the algorithms
used to segment the image primitives are not
reliable. As a result there are both additional
and missing nodes (due to segmentation error) and
variations in edge-structure. Hence image
matching and recognition can not be reduced to a
graph isomorphism or even a subgraph isomorphism
problem. Instead inexact graph matching methods
are needed.
7Measuring similarity of graphs
- Early work on graph-matching is vision ( Barrow
and Popplestone) introduced association graph and
showed how it could be used to locate maximum
common subgraph, - Work on syntactic and structural pattern
recognition in 1980s unearthed problems with
inexact matching (SanfeliuEshera and Fu.
Haralick and Shapiro, Wong etc) and extended
concept of edit distance from strings to graphs. - Recent work has aimed to develop probability
distributions for graph matching (Christmas,
Kittler and Petrou, Wilson and Hancock, Seratosa
and Sanfeliu) and match using advanced
optmisation methods(Simic, Gold and Rangarjan). - Renewed interest in placing classical methods
such as edit distance (Bunke) and max-clique
(Pelillo) on a more rigorous footing.
8Viewed from the perspective of learning
This work has shown how to measure the similarity
of graphs. It can be used to locate inexact
matches when significant levels of structural
error are present. May also provide a means by
which modes of structural variation can be
assessed.
9Learning with graphs (circa 2000)
- Learn class structure Assign graphs to classes.
Need a distance measure or vector of graph
characteristics. Central clustering is possible
with characteristics but difficult when number of
nodes and edges varies and correspondences are
not known. Easier to perform pairwise
clustering. (Bunke, Buhman). - Embed graphs in a low dimensional space
Correspondences are again needed, but spectral
methods may offer a solution. Can apply standard
statistical and geometric learning methods to
graph-vectors. - Learn modes of structural variation Understand
how edge (connectivity) structure varies for
graphs belonging to the same class.
(Dickinson,Williams) - Build generative model Borrow ideas from
graphical models (Langley, Friedman, Koller).
10Why is structural learning difficult
- Graphs are not vectors There is no natural
ordering of nodes and edges. Correspondences must
be used to establish order. - Structural variations Numbers of nodes and
edges are not fixed. They can vary due to
segmentation error. - Not easily summarised Since they do not reside
in a vector space, mean and covariance hard to
characterise.
11Structural Variations
12Contributions
- Permutation invariant graph characteristics from
Laplacian spectrum (Wilson, Hancock, Luo PAMI
2005). - Computation of edit distance between graphs and
spectral clustering (Robles-Kelly, Torsello and
Hancock IJCV 2007, Robles-Kelly and Hancock
PAMI 2005). - Embedding based on properties of random walk and
geometric characterisation of embedded nodes (Qiu
and Hancock, PAMI 2007). - Spectral embedding of graphs (Luo, Wilson and
Hancock Patt. Rec.2004). - Learn generative model of tree structure using
description length (Torsello and Hancock PAMI
2006).
13Spectral Methods
Use eigenvalues and eigenvectors of adjacency
graph (or Laplacian matrix) - Biggs, Cvetokovic,
Fan Chung
- Singular value methods for exact graph-matching
and point-set alignment). (Umeyama) - Singular value methods for point-set
correspondence (Scott and Longuet-Higgins,
Shapiro and Brady). - Use of eigenvalues for image segmentation (Shi
and Malik) and for perceptual grouping (Freeman
and Perona, Sarkar and Boyer). - Graph-spectral methods for indexing shock-trees
(Dickinson and Shakoufandeh)
14Graph (structural) representations of shape
- Region adjacency graphs ( Popplestone etc,,
Worthington, Pizlo, Rosenfeld) - View graphs (Freeman, Ponce)
- Aspect graphs (Dickisnon)
- Trees (Forsyth, Geiger).
- Shock graphs (Siddiqi, Zucker, Kimia).
Idea is to segment shape primitives from image
data and to abstract them using a graph. Shape
recognition becomes a problem of graph matching.
However, statistical learning of modes of shape
variation becomes difficult since available
methodology is limited.
15Delaunay Graph
16MOVI Sequence
17Shock graphs
Type 1 shock(monotonically increasing radius)
Type 2 shock(minimum radius)
Type 3 shock(constant radius)
Type 4 shock(maximum radius)
18Graph characteristics
- Laplacian spectrum provides natural permutation
invariant for graph, but discards information in
eigensystem. - Symmetric polynomials over spectral matrix give
rich family of invariants. - Can be extended to attributed graphs using
complex number encoding and Hermitian extension
of Laplacian. - Recent work has shown how invariants are linked
to moments from Mellin transform of heat kernel.
19Pairwise clustering
- Compute tree/graph similarity using edit
distance. - Simplifying structure can simplify process
(convert graph to string). - Extract pairwise clusters using EM algorithm and
eigevectors of an affinity matrix between graphs. - Applied to learn shape classes.
20Embeddings
- Embed nodes of a graph into a vector space so as
to preserve node affinity properties of graph. - Examples include Laplacian eigenmap, diffusion
map. - Have shown how commute time leads to embedding
that is robust to modifications in edge structure.
21Generative model
- In structural domain model can be learned using
EM algorithm to fit mixture over classes to
sample of trees. - Each class is characterised by a prototype from
which trees belonging to class can be obtained
through tree edit operations. - Prototypes formed by merging trees. Merging
criterion is description length. - Edit distance between trees is linked to
description length advantage, and entropy
associated with ML node probabilities.
22Spectral Generative Model
- Embed nodes of graph in vector space using
heat-kernel. - Align embedded node positions using Procrustes
alignment. - Compute covariance matrix for node positions.
- Deform node positions in directions of
eigenvectors of covariance matrix.
23Algebraic graph theory (PAMI 2005)
- Use symmetric polynomials to construct
permutation invariants from spectral matrix
24.joint work with Richard Wilson
25Spectral Representation
- Compute Laplacian matrix LD-A, where A is the
adjacency matrix and D is the matrix with the
node degree on the diagonal. - Perform spectral decomposition on the Laplacian
matrix - Construct spectral matrix
26Properties of the Laplacian
- Eigenvalues are positive and smallest eigenvalue
is zero - Multiplicity of zero eigenvalue is number
connected components of graph. - Zero eigenvalue is associated with all-ones
vector. - Eigenvector associated with the second smallest
eigenvector is Fiedler vector. - Fiedler vector can be used to perform clustering
of nodes of graph by recursive bisection .
27Eigenvalue spectrum
Vector of ordered eigevectors is permutation
invariant
28Eigenvalues are invariant to permutations of the
Laplacian.
- ..would like to construct family of permutation
invariants from full spectral matrix.
29Why
- According to perturbation analysis eigenvalues
are relatively stable to noise. - Eigenvectors are not stable to noise and undergo
large rotations for small additions of noise.
30Symmetric polynomials
31Power symmetric polynomials
32Symmetric polynomials on spectral matrix
- Symmetric polynomials and power symmetric
polynomials related by Newton Giraud formula
33Spectral Feature Vector
- Construct a matrix of permutation invariants by
applying symmetric polynomials to elements in
columns of the spectral matrix. Use entropy
measure to flatten distribution - Stack columns of F to form a long-vector B.
- Set of graphs represented by data-matrix
34extend to weighted attributed graphs.
35Complex Representation
- Encode attributes as complex numbers.
- Off-diagonal elements. Edge weights (W) as
modulus and normalised attributes as phase (y) - Diagonal elements encode node attributes (x) and
ensure H is positive semi-definite
36Spectral analysis
- Perform spectral analysis on H. Real eigenvalues
and complex eigenvectors - Construct spectral matrix of scaled complex
eigenvectors - Complex Laplacian
37Pattern Spaces
- PCA Project long vectors onto leading
eigenvectors of covariance matrix - MDS Embed graphs in low dimensional space
spanned by eigenvectors of distance matrix - LLP Locally linear projection (Niyogi) perform
eigenvector analysis on weighted covariance
matrix (mixture of PCA and MDS). PCA/MDS hybrid.
38Manifold learning methods
- ISOMAP construct neighbourhood graph on pairwise
geodesic distance between data-points. Low
distortion embedding by applying MDS to weighted
graph (Tennenbaum). - Locally linear embedding apply variant of PCA to
data (Roweiss Saul) - Locally linear projection use interpoint
distances to compute weighted covariance matrix,
and apply PCA (HeNiyogi).
39Separation under structural error
Mahalanobis distance between feature vectors for
noise corrupted graph and remaining graphs
Distance between graph and edge-edited variants
Distance between graph and random graphs of same
size and edge density
40Variation under structural error (MDS)
MDS applied to Mahalanobis distances between
feature vectors.
41CMU Sequence
42MOVI Sequence
43YORK Sequence
44Visualisation (LLPLaplacian Polynomials)
45Cospectrality problem for trees
- Classical random walks are determined by spectrum
of Laplacian matrix. Gives path-length
distribution, hitting and commute times. - Non-isomorphic graphs can have the same spectra
(co-spectrality). This problem is severe for
trees. - Turn to quantum walks to overcome this problem
and develop new algorithms for graph analysis
based on random walks.
46Cospectral trees
- Nearly every tree has a (adjacency,
laplacian,...) cospectral partner. - Such trees can be easily generated.
- The spectrum of S(U3) distinguished all such
trees it was tested on.
pairs of cospectral trees
47Overcome using quantum random walk
- The unitary operator governing the evolution of
the walk can be written in matrix form as - where the basis states are the set of all
ordered pairs (i,j) such that -
- Eigenvalues of U are
48The positive support of a matrix
- For a real valued matrix, M, define its positive
support, S(M) by
- S(Ur)ij is non-zero if and only if the sum of
all the paths of length r from state i to state j
is positive. - Interference effects on the quantum walk ensure
that S(Ur)ij gives useful information about the
graph when classical analogues do not.
49Cospectral Trees
Spectrum of positive support for UxUxU not
determined by spectrum of L and lifts
cospectrality problem
50Stongly regular graphs
- There is no method proven to be able to decide
whether two SRGs are isomorphic in polynomial
time. - There are large families of strongly regular
graphs that we can test the method on.
MDS embeddings of the SRGs with parameters
(25,12,5,6)-red, (26,10,3,4)-blue,
(29,14,6,7)-black, (40,12,2,4)-green using the
adjacency spectrum (top) and the spectrum of
S(U3) (bottom).
51Generative Tree Union Model
- Probability distribution over the union tree
52..work with Andrea Torsello
53Ingredients
- Set of tree unions
- Set of node observation probabilities for each
node (probability of observing ith
node of union c). - Set of node correspondences
54Illustration
55Cluster structure
- Cluster indicator
- Number of trees assigned to cluster c
- Number of nodes in union c
56Model
- Describe data using a mixture of tree unions
- Where N is the node-set and O is the order
relation of the tree union and is the set
of node probabilities.
57Union as tree distribution
- For each node in the union we know how often it
is encountered in the sampled trees. - We can generate new trees by sampling with the
node probability equal to the normalised sample
frequency. - The union represents a generative model for a
distribution of trees.
58Generative Model
- Aim is to make maximum likelihood estimate of
the model . - Problem we do not know how sample-nodes map to
model-nodes. - Let node observation probability depend on
correspondence map M (determined later).
59Max-likelihood parameters
- Log-likelihood
- Given M, L is maximized by any T consistent with
the hierarchies and by
60Description length
- Model coding cost of encoding k-dimensional
parameterisation of an m-dimensional
sample-vector is
Expected value of data log likelihod given best
fit model parameters
Cost of coding model (parametersstructure)
61Expectation on observation density
depends on node entropy
62Tree Union
- Cost of describing tree union
Negative likelihood of data given model
Cost of encoding node probabilities
Cost of encoding mixture
Cost of encoding tree structure
63Simplified Description Cost
- Cost of describing tree union
64Description Length Gain
- Which nodes should be merged?
- The description advantage obtained by merging
nodes v and v - Set of merges M that minimizes descriptor length
maximizes - Edit distance linked to node entropy
65Unattributed
Pairwise clustering of tree edit distance.
Mixture of tree unions
66Future
- Links between spectral geometry and
graph-spectra. - MDL in spectral domain.