Title: Daniel A. Spielman
1Fast, Randomized Algorithms for Partitioning,
Sparsification, and the Solution of Linear
Systems
Joint work with Shang-Hua Teng (Boston University)
2Papers
Nearly-Linear Time Algorithms for Graph
Partitioning, Graph Sparsification, and Solving
Linear Systems
S-Teng 04
The Mixing Rate of Markov Chains, An
Isoperimetric Inequality, and Computing The
Volume
Lovasz-Simonovits 93
The Eigenvalues of Random Symmetric Matrices
Füredi-Komlós 81
3Overview
Preconditioning to solve linear system
Find B that approximates A, such that solving
is easy
Three techniques Augmented Spanning Trees
Vaidya 90 Sparsification
Approximate graph by sparse graph
Partitioning Runtime proportional to
nodes removed
4Outline
- Linear system solvers
- Sparsification using partitioning and
- random sampling
- Graph partitioning by truncated random walks
Combinatorial and Spectral Graph Theory
Eigenvalues of random graphs
5Diagonally Dominant Matrices
Solve where diags of A at least
sum of abs of others in row
6Complexity of Solving Ax b A positive
semi-definite
General Direct Methods
Gaussian Elimination (Cholesky)
Fast matrix inversion
Conjugate Gradient
n is dimension m is number non-zeros
7Complexity of Solving Ax b A positive
semi-definite, structured
Express A L LT
Path forward and backward elimination
Tree like path, work up from leaves
Planar Nested dissection Lipton-Rose-Tarjan
79
8Iterative Methods
Preconditioned Conjugate Gradient
Find easy B that approximates A
Solve in time
Quality of approximation
Time to solve By c
9Main Result
For symmetric, diagonally dominant A
Iteratively solve in time
General
Planar
10Vaidyas Subgraph Preconditioners
Precondition A by the Laplacian of a subgraph, B
A
B
11History
Vaidya 90 Relate to graph embedding
cong/dil use MST augmented MST
planar
Gremban, Miller 96 Steiner vertices, many
details
planar
Joshi 97, Reif 98 Recursive, multi-level
approach
Bern, Boman, Chen, Gilbert, Hendrickson, Nguyen,
Toledo All the details, algebraic approach
12History
Maggs, Miller, Parekh, Ravi, Woo 02 after
preprocessing
Boman, Hendrickson 01 Low-stretch spanning
trees
S-Teng 03 Augmented low-stretch trees
Clustering, Partitioning, Sparsification,
Recursion Lower-stretch spanning trees
13The relative condition number
if A is positive semi-definite
if A-B is positive semi-definite, that is,
For A Laplacian of graph with edges E
14Bounding
For B subgraph of A,
and so
15Fundamental Inequality
1
8
16Fundamental Inequality
17Application to eigenvalues of graphs
0 for Laplacian matrices, and
Example For complete graph on n nodes,
all non-zero eigs n
For path,
-7
-5
-3
-1
1
3
5
7
18Lower bound on
Gauttery-Leighton-Miller 97
19Lower bound on
So
And
20Preconditioning with a Spanning Tree B
A
B
Every edge of A not in B has unique path in B
21When B is a Tree
22Low-Stretch Spanning Trees
Theorem (Boman-Hendrickson 01)
where
Theorem (Alon-Karp-Peleg-West 91)
Theorem (Elkin-Emek-S-Teng 04)
23Vaidyas Augmented Spanning Trees
B a spanning tree plus s edges, in
total
24Adding Edges to a Tree
25Adding Edges to a Tree
Partition tree into t sub-trees, balancing
stretch
26Adding Edges to a Tree
Partition tree into t sub-trees, balancing
stretch For sub-trees connected in A, add one
such bridge edge, carefully
(and in blue)
27Adding Edges to a Tree
Theorem
t sub-trees
in general
if planar
28Sparsification Feder-Motwani 91, Benczur-Karger
96 D. Eppstein, Z. Galil, G.F. Italiano, and T.
Spencer. 93 D. Eppstein, Z. Galil, G.F.
Italiano, and A. Nissenzweig 97
All graphs can be well-approximated by a sparse
graph
29Sparsification
Benczur-Karger 96 Can find sparse
subgraph H s.t.
(edges are subset, but different weights)
We need
H has edges
30Example Complete Graph
If A is Laplacian of Kn, all non-zero
eigenvalues are n
If B is Laplacian of Ramanujan expander all
non-zero eigenvalues satisfy
And so
31Example Dumbell
A
Kn
If B does not contain middle edge,
32Example Grid plus edge
(m-1)2
1
1
1
1
- Random sampling not sufficient.
- Cut approximation not sufficient
(m-1)2 k(m-1)
33Conductance
Cut partition of vertices
S
Conductance of S
Conductance of G
34Conductance
S
S
S
35Conductance and Sparsification
If conductance high (expander) can
precondition by random sampling If conductance
low can partition graph by removing few
edges Decomposition Partition of vertex set
remove few edges graph on each
partition has high conductance
36Graph Decomposition
Lemma Exists partition of vertices
Each Vi has large
At most half the edges cross partition
sample these edges
recurse on these edges
Alg
37Graph Decomposition Exists
Each Vi has At most half edges cross
Proof Let S be largest set s.t.
If
then
38Graph Decomposition Exists
Proof Let S be largest set s.t.
If
then
S
39Graph Decomposition Exists
Proof Let S be largest set s.t.
If
then
If S big, V-S not too big
If S small only recurse in S
S
S
V - S
Bounded recursion depth
40Sparsification from Graph Decomposition
sample these edges
Alg
recurse on these edges
Theorem Exists B with edges lt
Need to find the partition in nearly-linear time
41Sparsification
Thm Given G, find subgraph H s.t.
edges(H) lt
in time
Thm For all A, find B edges
General solve time
42Cheegers Inequality
Sinclair-Jerrum 89
For Laplacian L, and D diagonal matrix with
degree of node i
43Random Sampling
Given a graph will randomly sample to get
So that is
small, where is diagonal matrix of
degrees in
Useful if is big
44Useful if is big
If
and
Then, for all x orthogonal to (mutual) nullspace
45But, dont want D in there
D has no impact
So
implies
46Work with adjacency matrix
weight of edge from i to j, 0 if i j
No difference, because
47Random Sampling Rule
Choose param governing sparsity of
Keep edge (i,j) with prob
If keep edge, raise weight by
48Random Sampling Rule
guarantees
And,
guarantees expect at most edges in
49Random Sampling Theorem
Theorem
Useful for
Will prove in unweighted case.
From now on, all edges have weight 1
50Analysis by Trace
Trace sum of diagonal entries sum
of eigenvalues
For even k
51Analysis by Trace
Main Lemma
Proof of Theorem
Markov
k-th root
52Expected Trace
3
?2,3
?1,2
?3,4
2
1
?2,1
?4,2
4
53Expected Trace
Most terms zero because
So, sum is zero unless each edge appears at
least twice. Will code such walks to count them.
54Coding walks
For sequence
where S i edge not used
before, either way
the index of the time edge was
used before, either way
55Coding walks Example
step 0, 1, 2, 3, 4, 5, 6, 7, 8 vert 1, 2, 3,
4, 2, 3, 4, 2, 1
S 1, 2, 3, 4
? 1 ? 2 2 ? 3 3 ? 4 4 ? 2
Now, do a few slides pointing out what it meas
56Coding walks Example
step 0, 1, 2, 3, 4, 5, 6, 7, 8 vert 1, 2, 3,
4, 2, 3, 4, 2, 1
S 1, 2, 3, 4
? 1 ? 2 2 ? 3 3 ? 4 4 ? 2
Now, do a few slides pointing out what it meas
57Coding walks Example
step 0, 1, 2, 3, 4, 5, 6, 7, 8 vert 1, 2, 3,
4, 2, 3, 4, 2, 1
S 1, 2, 3, 4
? 1 ? 2 2 ? 3 3 ? 4 4 ? 2
Now, do a few slides pointing out what it meas
58Coding walks Example
step 0, 1, 2, 3, 4, 5, 6, 7, 8 vert 1, 2, 3,
4, 2, 3, 4, 2, 1
S 1, 2, 3, 4
? 1 ? 2 2 ? 3 3 ? 4 4 ? 2
Now, do a few slides pointing out what it meas
59Coding walks Example
step 0, 1, 2, 3, 4, 5, 6, 7, 8 vert 1, 2, 3,
4, 2, 3, 4, 2, 1
S 1, 2, 3, 4
? 1 ? 2 2 ? 3 3 ? 4 4 ? 2
Now, do a few slides pointing out what it meas
60Valid ?s
For each i in S is a neighbor
of with a probability of being chosen lt
1 That is,
can be non-zero
For each i not in S can take edge
indicated by That is
61Expected Trace by from Code
62Expected Trace from Code
Are ways to choose given
63Expected Trace from Code
Are ways to choose given
64Random Sampling Theorem
Theorem
Useful for
Random sampling sparsifies graphs of high
conductance
65Graph Partitioning Algorithms
SDP/LP too slow Spectral one cut quickly,
but can be unbalanced many
runs Multilevel (Chaco/Metis) cant analyze,
miss small
sparse cuts New Alg Based on truncated random
walks. Approximates optimal balance,
Can be use to decompose
Run time
66Lazy Random Walk
At each step stay put with prob ½
otherwise, move to neighbor according to weight
Diffusing probability mass keep ½ for
self, distribute rest among neighbors
67Lazy Random Walk
Diffusing probability mass keep ½ for
self, distribute rest among neighbors
2
Or, with self-loops, distribute evenly over
edges
0
1
0
1
3
0
2
68Lazy Random Walk
Diffusing probability mass keep ½ for
self, distribute rest among neighbors
2
Or, with self-loops, distribute evenly over
edges
0
1/2
1/2
1
3
0
2
69Lazy Random Walk
Diffusing probability mass keep ½ for
self, distribute rest among neighbors
2
Or, with self-loops, distribute evenly over
edges
1/12
1/3
1/2
1
3
1/12
2
70Lazy Random Walk
Diffusing probability mass keep ½ for
self, distribute rest among neighbors
2
Or, with self-loops, distribute evenly over
edges
7/48
1/4
11/24
1
3
7/48
2
71Lazy Random Walk
Diffusing probability mass keep ½ for
self, distribute rest among neighbors
2
2/8
1/8
3/8
1
3
In limit, prob mass degree
2/8
2
72Why lazy?
Otherwise, might be no limit
0
1
1
0
73Why so lazy to keep ½ mass?
Diagonally dominant matrices
weight of edge from i to j
degree of node i, if i j
74Rate of convergence
Cheegers inequality related to conductance
If low conductance, slow convergence
If start uniform on S, prob leave at each step
After steps, at least ¾ of mass
still in S
75Lovasz-Simonovits Theorem
If slow convergence, then low conductance And,
can find the cut from highest probability nodes
.137
.135
.094
.134
.112
.129
76Lovasz-Simonovits Theorem
From now on, every node has same degree, d
For all vertex sets S, and all t T
Where
k verts with most prob at step t
77Lovasz-Simonovits Theorem
If start from node in set S with small
conductance will output a set of small
conductance
Extension mostly contained in S
78Speed of Lovasz-Simonovits
If want cut of conductance can run for
steps.
Want run-time proportional to vertices
removed Local clustering on massive graph
Problem most vertices can have non-zero mass
when cut is found
79Speeding up Lovasz-Simonovits
Round all small entries of to zero
Algorithm Nibble input start vertex v,
target set size k start step
all values lt map to
zero
Theorem if v in conductance lt
size k set output a set with conductance lt
mostly overlapping
80The Potential Function
For integer k
Linear in between these points
81Concave slopes decrease
For
82Easy Inequality
83Fancy Inequality
84Fancy Inequality
85Chords with little progress
86Dominating curve, makes progress
87Dominating curve, makes progress
88Proof of Easy Inequality
Order vertices by probability mass
time t-1
time t
If top k at time t-1 only connect to top k at
time t get equality
89Proof of Easy Inequality
Order vertices by probability mass
time t-1
time t
If top k at time t-1 only connect to top k at
time t get equality
Otherwise, some mass leaves, and get inequality
90External edges from Self Loops
Lemma For every set S, and every set R of same
size At least frac of
edges from S dont hit R
Tight example
3
3
3
3
?(S) 1/2
3
3
S
R
91External edges from Self Loops
Lemma For every set S, and every set R of same
size At least ?(S) frac of edges
from S dont hit R
Proof
When RS, is by definition. Each edge in R-S can
absorb d edges from S. But each vertex of S-R has
d self-loops that do not go to R.
S
R
92Proof of Fancy Inequality
time t-1
top
out
in
time t
It(k) top in top (in out)/2
top/2 (top in out)/2
93Local Clustering
Theorem If S is set of conductance lt
v is random vertex of S Then output set of
conductance lt mostly in S, in
time proportional to size of output.
94Local Clustering
Theorem If S is set of conductance lt
v is random vertex of S Then output set of
conductance lt mostly in S, in
time proportional to size of output.
Can it be done for all conductances???
95Experimental code ClusTree
1. Cluster, crudely
2. Make trees in clusters
3. Add edges between trees, optimally
No recursion on reduced matrices
96Implementation
ClusTree in java timing not included
could increase total time by 20 PCG Cholesky dr
optol Inc Chol Vaidya
TAUCS Chen, Rotkin, Toledo
Orderings amd, genmmd, Metis, RCM
Intel Xeon 3.06GHz, 512k L2 Cache, 1M L3 Cache
972D grid, Neumann bdry
run to residual error 10-8
98Impact of boundary conditions
2d Grid Dirichlet
2D grid, Neumann
992D Unstructured Delaunay Mesh
Dirichlet
Neumann
100Future Work
Practical Local Clustering More Sparsification
Other physical problems Cheegers
inequalty Implications for combinatorics, and
Spectral Hypergraph Theory
101To learn more
Nearly-Linear Time Algorithms for Graph
Paritioning, Sparsification, and Solving Linear
Systems (STOC 04, Arxiv)
Will be split into two papers numerical and
combinatorial
My lecture notes for Spectral Graph Theory and
its Applications