Large Graph Mining: Power Tools and a Practitioner - PowerPoint PPT Presentation

About This Presentation
Title:

Large Graph Mining: Power Tools and a Practitioner

Description:

Large Graph Mining: Power Tools and a Practitioner s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 62
Provided by: cts112
Learn more at: https://www.math.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Large Graph Mining: Power Tools and a Practitioner


1
Large Graph MiningPower Tools and a
Practitioners Guide
  • Christos Faloutsos
  • Gary Miller
  • Charalampos (Babis) Tsourakakis
  • CMU

2
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

3
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

4
Matrix Representations of G(V,E)
  • Associate a matrix to a graph
  • Adjacency matrix
  • Laplacian
  • Normalized Laplacian

Main focus
5
Matrix as an operator
The image of the unit circle (sphere) under any
mxn matrix is an ellipse (hyperellipse).
e.g.,
6
More Reminders
  • Let M be a symmetric nxn matrix.

? eigenvaluex eigenvector
7
More Reminders
  • 1-Dimensional Invariant Subspaces

Diagonal No rotation
y
x
(?,u)
Ax
Ay
8
Keep in mind!
  • For the rest of slides we are talking for square
    nxn matrices and unless noticed symmetric
    ones, i.e,

9
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Cheeger Inequality and Sparsest Cut
  • Derivation, intuition
  • Example
  • Normalized Laplacian

10
Adjacency matrix
Undirected
4
1
A
2
3
11
Adjacency matrix
Undirected Weighted
4
10
1
4
0.3
A
2
3
2
12
Adjacency matrix
Directed
4
1
ObservationIf G is undirected,A AT
2
3
13
Spectral Theorem
  • Theorem Spectral Theorem
  • If MMT, thenwhere

0
0
Reminder 2 xi i-th principal
axis ?i length of i-th principal
axis
Reminder 1 xi,xj orthogonal
?i
?j
xi
xj
14
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

15
Bipartite Graphs
Any graph with no cycles of odd length is
bipartite
e.g., all trees are bipartite
K3,3
1
4
2
5
Can we check if a graph is bipartitevia its
spectrum?Can we get the partition of the
verticesin the two sets of nodes?
3
6
16
Bipartite Graphs
Adjacency matrix
K3,3
1
4
where
2
5
3
6
  • Why ?1-?23?Recall Ax?x, (?,x)
    eigenvalue-eigenvector

Eigenvalues
?3,-3,0,0,0,0
17
Bipartite Graphs
1
1
1
1
2
3
33x1
1
4
1
4
1
1
1
2
5
5
1
1
1
3
6
6
Repeat same argument for the other nodes
18
Bipartite Graphs
1
-1
-1
-1
-2
-3
-3(-3)x1
1
4
1
4
1
-1
-1
2
5
5
1
-1
-1
3
6
6
Repeat same argument for the other nodes
19
Bipartite Graphs
  • Observationu2 gives the partition of the nodes
    in the two sets S, V-S!

S
V-S
Question Were we just lucky?
Answer No
Theorem ?2-?1 iff G bipartite. u2 gives the
partition.
20
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

21
Walks
  • A walk of length r in a directed graphwhere a
    node can be used more than once.
  • Closed walk when

4
4
1
1
Closed walk of length 3 2-1-3-2
Walk of length 2 2-1-4
2
3
2
3
22
Walks
  • Theorem G(V,E) directed graph, adjacency matrix
    A. The number of walks from node u to node v in G
    with length r is (Ar)uv
  • Proof
  • Induction on k. See Doyle-Snell, p.165

(i, i1),(i1,j)
(i,i1),..,(ir-1,j)
(i,j)
23
Walks
4
1
2
3
4
i3, j3
4
i2, j4
1
1
2
3
2
3
24
Walks
4
1
2
3
Always 0,node 4 is a sink
4
1
3
2
25
Walks
  • Corollary
  • A adjacency matrix of undirected G(V,E) (no
    self loops), e edges and t triangles. Then the
    following holda) trace(A) 0 b) trace(A2)
    2ec) trace(A3) 6t

1
Computing Ar is a bad idea High memory
requirements,expensive!
1
2
1
2
3
26
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

27
Laplacian
4
1
L D-A
2
3
Diagonal matrix, diidi
28
Weighted Laplacian
4
10
1
4
0.3
2
3
2
29
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

30
Connected Components
  • LemmaLet G be a graph with n vertices and c
    connected components. If L is the Laplacian of G,
    then rank(L)n-c.
  • Proof see p.279, Godsil-Royle

31
Connected Components
G(V,E)
L
1
2
3
6
4
zeros components
7
5
eig(L)
32
Connected Components
G(V,E)
L
1
2
3
0.01
6
4
zeros components
Indicates a good cut
7
5
eig(L)
33
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

34
Adjacency vs. Laplacian Intuition
V-S
Let x be an indicator vector
S
Consider now yLx
k-th coordinate
35
Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
36
Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
37
Adjacency vs. Laplacian Intuition
G30,0.5
S
Consider now yLx
k
k
Laplacian connectivity, Adjacency paths
38
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

39
Why Sparse Cuts?
  • Clustering, Community Detection
  • And more Telephone Network Design, VLSI layout,
    Sparse Gaussian Elimination, Parallel Computation

cut
4
8
1
5
9
2
3
6
7
40
Quality of a Cut
  • Edge expansion/Isoperimetric number f

4
1
2
3
41
Quality of a Cut
  • Edge expansion/Isoperimetric number f

4
1
and thus
2
3
42
Why ?2?
V-S
Characteristic Vector x
S
Edges across cut
Then
43
Why ?2?
S
V-S
cut
4
8
1
5
9
2
3
6
7
x1,1,1,1,0,0,0,0,0T
xTLx2
44
Why ?2?
Ratio cut
Sparsest ratio cut
NP-hard
Relax the constraint
?
Normalize
45
Why ?2?
Sparsest ratio cut
NP-hard
Relax the constraint
?2
Normalize
because of the Courant-Fisher theorem (applied to
L)
46
Why ?2?
OSCILLATE
x1
xn
Each ball 1 unit of mass
Node id
Eigenvector value
47
Why ?2?
Fundamental mode of vibration along the
separator
48
Cheeger Inequality
  • Step 1 Sort vertices in non-decreasing order
    according to their assigned by the second
    eigenvector value.
  • Step 2 Decide where to cut.
  • Bisection
  • Best ratio cut

Two common heuristics
49
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

50
Example Spectral Partitioning
  • K500
  • K500

dumbbell graph
A zeros(1000) A(1500,1500)ones(500)-eye(500
) A(5011000,5011000) ones(500)-eye(500)
myrandperm randperm(1000) B
A(myrandperm,myrandperm)
In social network analysis, such clusters are
called communities
51
Example Spectral Partitioning
  • This is how adjacency matrix of B looks

spy(B)
52
Example Spectral Partitioning
  • This is how the 2nd eigenvector of B looks like.

L diag(sum(B))-Bu v eigs(L,2,'SM')plot(u
(,1),x)
Not so much information yet
53
Example Spectral Partitioning
  • This is how the 2nd eigenvector looks if we sort
    it.

ign ind sort(u(,1))plot(u(ind),'x')
But now we see the two communities!
54
Example Spectral Partitioning
  • This is how adjacency matrix of B looks now

spy(B(ind,ind))
Community 1
Cut here!
Observation Both heuristics are equivalent for
the dumbell
Community 2
55
Outline
  • Reminders
  • Adjacency matrix
  • Intuition behind eigenvectors Bipartite Graphs
  • Walks of length k
  • Laplacian
  • Connected Components
  • Intuition Adjacency vs. Laplacian
  • Sparsest Cut and Cheeger Inequality
  • Derivation, intuition
  • Example
  • Normalized Laplacian

56
Where does it go from here?
  • Normalized Laplacian
  • Ng, Jordan, Weiss Spectral Clustering
  • Laplacian Eigenmaps for Manifold Learning
  • Computer Vision and many more applications

Standard reference Spectral Graph
TheoryMonograph by Fan Chung Graham
57
Why Normalized Laplacian
  • K500
  • K500

The onlyweightededge!
Cut here
Cut here
f
f
gt
So, f is not good here
58
Why Normalized Laplacian
  • K500
  • K500

The onlyweightededge!
Cut here
Cut here
f
f
Optimize Cheegerconstant h(G), balanced cuts
gt
where
59
Conclusions
  • Spectrum tells us a lot about the graph.
  • What to remember
  • What is an eigenvector (fNodes Reals)
  • Adjacency Paths
  • Laplacian Sparsest Cut and Intuition
  • Normalized Laplacian Normalized cuts, tend to
    avoid unbalanced cuts

60
References
  • A list of references is on the web site of the
    tutorial
  • www.cs.cmu.edu/ctsourak/kdd09.htm

61
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com