Large Graph Mining: Power Tools and a Practitioner - PowerPoint PPT Presentation

About This Presentation

Title:

Large Graph Mining: Power Tools and a Practitioner

Description:

Large Graph Mining: Power Tools and a Practitioner s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU – PowerPoint PPT presentation

Number of Views:152

Avg rating:3.0/5.0

Slides: 62

Provided by: cts112

Learn more at: https://www.math.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Large Graph Mining: Power Tools and a Practitioner

1
Large Graph MiningPower Tools and a
Practitioners Guide

Christos Faloutsos
Gary Miller
Charalampos (Babis) Tsourakakis
CMU

2
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

3
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

4
Matrix Representations of G(V,E)

Associate a matrix to a graph
Adjacency matrix
Laplacian
Normalized Laplacian

Main focus
5
Matrix as an operator
The image of the unit circle (sphere) under any
mxn matrix is an ellipse (hyperellipse).
e.g.,
6
More Reminders

Let M be a symmetric nxn matrix.

? eigenvaluex eigenvector
7
More Reminders

1-Dimensional Invariant Subspaces

Diagonal No rotation
y
x
(?,u)
Ax
Ay
8
Keep in mind!

For the rest of slides we are talking for square
nxn matrices and unless noticed symmetric
ones, i.e,

9
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Cheeger Inequality and Sparsest Cut
Derivation, intuition
Example
Normalized Laplacian

10
Adjacency matrix
Undirected
4
1
A
2
3
11
Adjacency matrix
Undirected Weighted
4
10
1
4
0.3
A
2
3
2
12
Adjacency matrix
Directed
4
1
ObservationIf G is undirected,A AT
2
3
13
Spectral Theorem

Theorem Spectral Theorem
If MMT, thenwhere

0
0
Reminder 2 xi i-th principal
axis ?i length of i-th principal
axis
Reminder 1 xi,xj orthogonal
?i
?j
xi
xj
14
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

15
Bipartite Graphs
Any graph with no cycles of odd length is
bipartite
e.g., all trees are bipartite
K3,3
1
4
2
5
Can we check if a graph is bipartitevia its
spectrum?Can we get the partition of the
verticesin the two sets of nodes?
3
6
16
Bipartite Graphs
Adjacency matrix
K3,3
1
4
where
2
5
3
6

Why ?1-?23?Recall Ax?x, (?,x)
eigenvalue-eigenvector

Eigenvalues
?3,-3,0,0,0,0
17
Bipartite Graphs
1
1
1
1
2
3
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
Repeat same argument for the other nodes
18
Bipartite Graphs
1
-1
-1
-1
-2
-3
-3(-3)x1
1
4
1
4
1
-1
-1
2
5
5
1
-1
-1
3
6
6
Repeat same argument for the other nodes
19
Bipartite Graphs

Observationu2 gives the partition of the nodes
in the two sets S, V-S!

S
V-S
Question Were we just lucky?
Answer No
Theorem ?2-?1 iff G bipartite. u2 gives the
partition.
20
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

21
Walks

A walk of length r in a directed graphwhere a
node can be used more than once.
Closed walk when

4
4
1
1
Closed walk of length 3 2-1-3-2
Walk of length 2 2-1-4
2
3
2
3
22
Walks

Theorem G(V,E) directed graph, adjacency matrix
A. The number of walks from node u to node v in G
with length r is (Ar)uv
Proof
Induction on k. See Doyle-Snell, p.165

(i, i1),(i1,j)
(i,i1),..,(ir-1,j)
(i,j)
23
Walks
4
1
2
3
4
i3, j3
4
i2, j4
1
1
2
3
2
3
24
Walks
4
1
2
3
Always 0,node 4 is a sink
4
1
3
2
25
Walks

Corollary
A adjacency matrix of undirected G(V,E) (no
self loops), e edges and t triangles. Then the
following holda) trace(A) 0 b) trace(A2)
2ec) trace(A3) 6t

1
Computing Ar is a bad idea High memory
requirements,expensive!
1
2
1
2
3
26
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

27
Laplacian
4
1
L D-A
2
3
Diagonal matrix, diidi
28
Weighted Laplacian
4
10
1
4
0.3
2
3
2
29
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

30
Connected Components

LemmaLet G be a graph with n vertices and c
connected components. If L is the Laplacian of G,
then rank(L)n-c.
Proof see p.279, Godsil-Royle

31
Connected Components
G(V,E)
L
1
2
3
6
4
zeros components
7
5
eig(L)
32
Connected Components
G(V,E)
L
1
2
3
0.01
6
4
zeros components
Indicates a good cut
7
5
eig(L)
33
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

34
Adjacency vs. Laplacian Intuition
V-S
Let x be an indicator vector
S
Consider now yLx
k-th coordinate
35
Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
36
Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
37
Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
k
Laplacian connectivity, Adjacency paths
38
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

39
Why Sparse Cuts?

Clustering, Community Detection
And more Telephone Network Design, VLSI layout,
Sparse Gaussian Elimination, Parallel Computation

cut
4
8
1
5
9
2
3
6
7
40
Quality of a Cut

Edge expansion/Isoperimetric number f

4
1
2
3
41
Quality of a Cut

Edge expansion/Isoperimetric number f

4
1
and thus
2
3
42
Why ?2?
V-S
Characteristic Vector x
S
Edges across cut
Then
43
Why ?2?
S
V-S
cut
4
8
1
5
9
2
3
6
7
x1,1,1,1,0,0,0,0,0T
xTLx2
44
Why ?2?
Ratio cut
Sparsest ratio cut
NP-hard
Relax the constraint
?
Normalize
45
Why ?2?
Sparsest ratio cut
NP-hard
Relax the constraint
?2
Normalize
because of the Courant-Fisher theorem (applied to
L)
46
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Node id
Eigenvector value
47
Why ?2?
Fundamental mode of vibration along the
separator
48
Cheeger Inequality

Step 1 Sort vertices in non-decreasing order
according to their assigned by the second
eigenvector value.
Step 2 Decide where to cut.
Bisection
Best ratio cut

Two common heuristics
49
Outline

Reminders
Adjacency matrix
Intuition behind eigenvectors Bipartite Graphs
Walks of length k
Laplacian
Connected Components
Intuition Adjacency vs. Laplacian
Sparsest Cut and Cheeger Inequality
Derivation, intuition
Example
Normalized Laplacian

50
Example Spectral Partitioning

K500

K500

dumbbell graph
A zeros(1000) A(1500,1500)ones(500)-eye(500
) A(5011000,5011000) ones(500)-eye(500)
myrandperm randperm(1000) B
A(myrandperm,myrandperm)
In social network analysis, such clusters are
called communities
51
Example Spectral Partitioning