Title: Large Graph Mining: Power Tools and a Practitioner
1Large Graph MiningPower Tools and a
Practitioners Guide
- Christos Faloutsos
- Gary Miller
- Charalampos (Babis) Tsourakakis
- CMU
2Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
3Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
4Matrix Representations of G(V,E)
- Associate a matrix to a graph
- Adjacency matrix
- Laplacian
- Normalized Laplacian
Main focus
5Matrix as an operator
The image of the unit circle (sphere) under any
mxn matrix is an ellipse (hyperellipse).
e.g.,
6More Reminders
- Let M be a symmetric nxn matrix.
? eigenvaluex eigenvector
7More Reminders
- 1-Dimensional Invariant Subspaces
Diagonal No rotation
y
x
(?,u)
Ax
Ay
8Keep in mind!
- For the rest of slides we are talking for square
nxn matrices and unless noticed symmetric
ones, i.e,
9Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Cheeger Inequality and Sparsest Cut
- Derivation, intuition
- Example
- Normalized Laplacian
10Adjacency matrix
Undirected
4
1
A
2
3
11Adjacency matrix
Undirected Weighted
4
10
1
4
0.3
A
2
3
2
12Adjacency matrix
Directed
4
1
ObservationIf G is undirected,A AT
2
3
13Spectral Theorem
- Theorem Spectral Theorem
- If MMT, thenwhere
0
0
Reminder 2 xi i-th principal
axis ?i length of i-th principal
axis
Reminder 1 xi,xj orthogonal
?i
?j
xi
xj
14Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
15Bipartite Graphs
Any graph with no cycles of odd length is
bipartite
e.g., all trees are bipartite
K3,3
1
4
2
5
Can we check if a graph is bipartitevia its
spectrum?Can we get the partition of the
verticesin the two sets of nodes?
3
6
16Bipartite Graphs
Adjacency matrix
K3,3
1
4
where
2
5
3
6
- Why ?1-?23?Recall Ax?x, (?,x)
eigenvalue-eigenvector
Eigenvalues
?3,-3,0,0,0,0
17Bipartite Graphs
1
1
1
1
2
3
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
Repeat same argument for the other nodes
18Bipartite Graphs
1
-1
-1
-1
-2
-3
-3(-3)x1
1
4
1
4
1
-1
-1
2
5
5
1
-1
-1
3
6
6
Repeat same argument for the other nodes
19Bipartite Graphs
- Observationu2 gives the partition of the nodes
in the two sets S, V-S!
S
V-S
Question Were we just lucky?
Answer No
Theorem ?2-?1 iff G bipartite. u2 gives the
partition.
20Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
21Walks
- A walk of length r in a directed graphwhere a
node can be used more than once. - Closed walk when
4
4
1
1
Closed walk of length 3 2-1-3-2
Walk of length 2 2-1-4
2
3
2
3
22Walks
- Theorem G(V,E) directed graph, adjacency matrix
A. The number of walks from node u to node v in G
with length r is (Ar)uv - Proof
- Induction on k. See Doyle-Snell, p.165
(i, i1),(i1,j)
(i,i1),..,(ir-1,j)
(i,j)
23Walks
4
1
2
3
4
i3, j3
4
i2, j4
1
1
2
3
2
3
24Walks
4
1
2
3
Always 0,node 4 is a sink
4
1
3
2
25Walks
- Corollary
- A adjacency matrix of undirected G(V,E) (no
self loops), e edges and t triangles. Then the
following holda) trace(A) 0 b) trace(A2)
2ec) trace(A3) 6t
1
Computing Ar is a bad idea High memory
requirements,expensive!
1
2
1
2
3
26Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
27Laplacian
4
1
L D-A
2
3
Diagonal matrix, diidi
28Weighted Laplacian
4
10
1
4
0.3
2
3
2
29Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
30Connected Components
- LemmaLet G be a graph with n vertices and c
connected components. If L is the Laplacian of G,
then rank(L)n-c. - Proof see p.279, Godsil-Royle
31Connected Components
G(V,E)
L
1
2
3
6
4
zeros components
7
5
eig(L)
32Connected Components
G(V,E)
L
1
2
3
0.01
6
4
zeros components
Indicates a good cut
7
5
eig(L)
33Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
34Adjacency vs. Laplacian Intuition
V-S
Let x be an indicator vector
S
Consider now yLx
k-th coordinate
35Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
36Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
37Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
k
Laplacian connectivity, Adjacency paths
38Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
39Why Sparse Cuts?
- Clustering, Community Detection
- And more Telephone Network Design, VLSI layout,
Sparse Gaussian Elimination, Parallel Computation
cut
4
8
1
5
9
2
3
6
7
40Quality of a Cut
- Edge expansion/Isoperimetric number f
4
1
2
3
41Quality of a Cut
- Edge expansion/Isoperimetric number f
4
1
and thus
2
3
42Why ?2?
V-S
Characteristic Vector x
S
Edges across cut
Then
43Why ?2?
S
V-S
cut
4
8
1
5
9
2
3
6
7
x1,1,1,1,0,0,0,0,0T
xTLx2
44Why ?2?
Ratio cut
Sparsest ratio cut
NP-hard
Relax the constraint
?
Normalize
45Why ?2?
Sparsest ratio cut
NP-hard
Relax the constraint
?2
Normalize
because of the Courant-Fisher theorem (applied to
L)
46Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Node id
Eigenvector value
47Why ?2?
Fundamental mode of vibration along the
separator
48Cheeger Inequality
- Step 1 Sort vertices in non-decreasing order
according to their assigned by the second
eigenvector value. - Step 2 Decide where to cut.
- Bisection
- Best ratio cut
Two common heuristics
49Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
50Example Spectral Partitioning
dumbbell graph
A zeros(1000) A(1500,1500)ones(500)-eye(500
) A(5011000,5011000) ones(500)-eye(500)
myrandperm randperm(1000) B
A(myrandperm,myrandperm)
In social network analysis, such clusters are
called communities
51Example Spectral Partitioning
- This is how adjacency matrix of B looks
spy(B)
52Example Spectral Partitioning
- This is how the 2nd eigenvector of B looks like.
L diag(sum(B))-Bu v eigs(L,2,'SM')plot(u
(,1),x)
Not so much information yet
53Example Spectral Partitioning
- This is how the 2nd eigenvector looks if we sort
it.
ign ind sort(u(,1))plot(u(ind),'x')
But now we see the two communities!
54Example Spectral Partitioning
- This is how adjacency matrix of B looks now
spy(B(ind,ind))
Community 1
Cut here!
Observation Both heuristics are equivalent for
the dumbell
Community 2
55Outline
- Reminders
- Adjacency matrix
- Intuition behind eigenvectors Bipartite Graphs
- Walks of length k
- Laplacian
- Connected Components
- Intuition Adjacency vs. Laplacian
- Sparsest Cut and Cheeger Inequality
- Derivation, intuition
- Example
- Normalized Laplacian
56Where does it go from here?
- Normalized Laplacian
- Ng, Jordan, Weiss Spectral Clustering
- Laplacian Eigenmaps for Manifold Learning
- Computer Vision and many more applications
Standard reference Spectral Graph
TheoryMonograph by Fan Chung Graham
57Why Normalized Laplacian
The onlyweightededge!
Cut here
Cut here
f
f
gt
So, f is not good here
58Why Normalized Laplacian
The onlyweightededge!
Cut here
Cut here
f
f
Optimize Cheegerconstant h(G), balanced cuts
gt
where
59Conclusions
- Spectrum tells us a lot about the graph.
- What to remember
- What is an eigenvector (fNodes Reals)
- Adjacency Paths
- Laplacian Sparsest Cut and Intuition
- Normalized Laplacian Normalized cuts, tend to
avoid unbalanced cuts
60References
- A list of references is on the web site of the
tutorial - www.cs.cmu.edu/ctsourak/kdd09.htm
61Thank you!