Title: Graph Partitioning and its Application to the NodeRank Mapping Problem
1Graph Partitioning and its Application to the
Node-Rank Mapping Problem
-
Gaurav Khanna -
Dr. Rahul Garg -
Dr. Nisheeth Vishnoi
2RoadMap
- Node-Rank Mapping Problem
- Graph Partitioning
- KRV A new algorithm for graph partitioning
- Graph Partitioning Results
- Approaches for the node-rank mapping
- Mapping Results
- Conclusions/Future Work
3Node-Rank Mapping Problem
M
a
p
p
p
p
i
0
1
P
P
0
1
n
g
P
P
2
3
p
p
2
3
Parallel
Processors
program
4Node-Rank Mapping Goals
- Processor Utilization
- Ideally all processors have equal computation and
communication costs - Minimization of Inter-processor communication
- Can be posed as a graph partitioning problem
- Graph G (V, E)
- Each task is represented by a vertex
- Vertex weights represents computational costs
- Each edge denotes communication between a pair of
tasks - Edge weights represent communication costs
- Goal
- Partition vertices into P parts such that each
partition has equal vertex weights - Minimize the weight of edges cut
- Problem is NP hard
5Node-Rank Mapping / Graph Partitioning
P0
P1
P0
P1
P0
P1
- Load Balance and Minimizing
communication - are often competing forces
6Graph Partitioning - Metis
- A widely used partitioning tool.
- Goal
- Minimize edge cut
- Balance the sum of vertex weights in each
partition as much as possible - Uses Multilevel partitioning algorithm.
- Coarsening Phase.
- Initial Partitioning Phase.
- Uncoarsening Phase.
- KL-type refinement algorithm.
7Graph Partitioning The Sparsest Cut Problem
Find a cut that minimizes the ratio of the
weight of edges across and the size of the
smaller side
TV \ S
S
W(S,T)
Minimize (S,T)
min S,T
Sparsity ?(S)
8KRV algorithm for the sparsest cut problem
- Khandekar-Rao-Vazirani (KRV) STOC 2006
- Graph Partitioning using single commodity flows
- O(log2 n) approximation to sparsity using O(log2
n) single commodity max-flow computations - Key idea
- Expanders
- Single Commodity flows
- Runtime complexity of O(n3/2)
- Comparison to previous approaches
- Leighton-Rao88 based on multi-commodity flows O(
? log n ) - Alon-Milman85 based on spectral methods O(v ?)
- Yield better approximations but take O(n2) time.
- KRV is faster
- Yields poly-logarithmic complexity
9Main Theorem
- Given a graph G(V,E) on n vertices and a 1,
there exists an algorithm that - either outputs a cut of sparsity at most a,
- or proves that every cut has sparsity at least
. - Procedure to Output a cut
- Employ a binary search on the sparsity value a
- Start with a middle of a
- If cut found with sparsity a, lower a
- If no cut found in O(log(sqr(n))) iterations,
increase a - Output the cut with least value of a
a
log2 n
10KRV algorithm Pseudo code
Procedure KRV(G(V,E))
- H (V,Empty)
- While(amax gt amin)
-
- a (amax amin)/2
- num_iterations O(log(sqr(n))
- for(i0 i lt num_Iterations i)
-
- Vector GenRandomOrthogonalVector(V)
- S FindBisection(H,Vector, V)
- F CreateFlowNetwork(G,S, a)
- flow MaxFlow(F)
- If(flow MAXFLOW)
-
- M GenerateMatching(F,flow)
- Add Matching to H
- continue
-
- else
-
Procedure FindBisection (H, Vector, V)
for each matching M in H) For each
pair ij which belongs to M)
Vi Vj (Vi Vj)/2
Output the indexes of n/2 smallest values of V
11KRV
G(V,E)
H(V,F,w)
n/2
n/2
- Assign each edge corresponding to the original
graph a capacity 1/ a - Assign each dotted edge a weight of 1
12KRV algorithm Pseudo code
Procedure KRV(G(V,E))
- H empty
- While(amax gt amin)
-
- a (amax amin)/2
- num_Iterations O(log(sqr(n))
- for(i0 I lt num_Iterations i)
-
- Vector GenRandomOrthogonalVector(V)
- S FindBisection(H,Vector, V)
- F CreateFlowNetwork(G,S, a)
- flow MaxFlow(F)
- If(flow MAXFLOW)
-
- M GenerateMatching(F,flow)
- Add Matching to H
- continue
-
- else
-
13KRV
G(V,E)
H(V,F,w)
14KRV algorithm Pseudo code
Procedure KRV(G(V,E))
- H empty
- While(amax gt amin)
-
- a (amax amin)/2
- num_Iterations O(log(sqr(n))
- for(i0 I lt num_Iterations i)
-
- Vector GenRandomOrthogonalVector(V)
- S FindBisection(H,Vector, V)
- F CreateFlowNetwork(G,S, a)
- flow MaxFlow(F)
- If(flow MAXFLOW)
-
- M GenerateMatching(F,flow)
- Add Matching to H
- continue
-
- else
-
15KRV
G(V,E)
H(V,F,w)
16KRV algorithm Pseudo code
Procedure KRV(G(V,E))
- While(amax gt amin)
-
- a (amax amin)/2
- Num_Iterations O(log(sqr(n))
- for(i0 I lt Num_Iterations i)
-
- Vector Generate_Random_Orthogonal_Vector(V
) - S FindBisection(Vector, V)
- F CreateFlowNetwork(G,S, a)
- flow MaxFlow(F)
- If(flow MAXFLOW)
-
- M Generate_Matching(F,flow)
- continue
-
- else
-
- amax a
17KRV
G(V,E)
H(V,F,w)
18KRV
G(V,E)
H(V,F,w)
19KRV algorithm Pseudo code
Procedure KRV(G(V,E))
- While(amax gt amin)
-
- a (amax amin)/2
- Num_Iterations O(log(sqr(n))
- for(i0 I lt Num_Iterations i)
-
- Vector Generate_Random_Orthogonal_Vector(V
) - S FindBisection(Vector, V)
- F CreateFlowNetwork(G,S, a)
- flow MaxFlow(F)
- If(flow MAXFLOW)
-
- M Generate_Matching(F,flow)
- continue
-
- else
-
- amax a
20KRV
G(V,E)
H(V,F,w)
G(V,E)
H(V,F,w)
21KRV
Cut-size n/2 k l E(S,T) lt n/2
S
E(S,T) lt k l S
min S,T
k
E(S,T) ( cut-edges) / a Sparsity of cut
(cut-edges1)/ min
S,T Therefore, Sparsity of cut lt a
l
T
Assume S T
22Our Implementation of KRV
- Employs Dinics maxflow algorithm for computing
maximum flows - O(n2m) complexity
- Matching Generation algorithm
- Greedy Approach
- Iteratively, Find Paths from source to sink with
non-zero flow and match the corresponding vertex
from both the partitions - KRV yields cuts while trying to minimize sparsity
- But, we need balanced cuts
- More applicability
- e.g. parallel computing, VLSI layouts, sparse
linear system solving - For eventual application to node-rank mapping
problem - Run-time reduces significantly
- KRV_Balanced
- Yields 1/3 2/3 balanced cut
- Both partitions have at least n/3 vertices
- Call KRV recursively each time on the bigger
partition
23Graph Partitioning Results
- Comparison across three schemes
- KRV_balanced
- Metis ( default balance of 1/2-1/2)
- Metis ( input with the balance obtained by
KRV_balanced) - Classes of Graphs
- Benchmark graphs obtained from the graph
partitioning archive - http//staffweb.cms.gre.ac.uk/c.walshaw/partition
/ - Graphs based on power-law degree distributions
- R-MAT A recursive model for Graph Mining
- Degree distributions of the internet
- Graphs representing dense components connected
sparsely
24Benchmark Graphs
25Benchmark Graphs
26Graphs based on powerlaw degree distributions
27Graphs based on powerlaws degree distributions
28Graphs- Dense components connected sparsely
29Graphs- Dense components connected sparsely
30Approaches to solve the Node-Rank Mapping Problem
- Goal
- Obtain a map of the graph vertices onto a torus
- Minimize the cost function k,j C(k,j)
H(m(k),m(j)) - Two Phase Approach
- Linear 1-D arrangement of the graph vertices
- Embedding the linear arrangement onto a
d-dimensional torus - Linear Arrangement of Vertices
- Employ the KRV algorithm
- Apply recursively on each smaller sub-graph
- Eventually, Obtain a one-dimensional ordering of
vertices - Vertices closer together in the ordering have
high communication - Minimize a metric which has a similar form as the
designated cost function
31Approaches to solve the Node-Rank Mapping Problem
- Map the vertices onto the torus using a
space-filling curve - Generate a curve through a d-dimensional mesh
- Any two vertices differing by a distance k along
the curve are at a distance O(dk1/d) in the mesh - Map the linear ordering of vertices onto this
curve - Each vertex is mapped onto its corresponding
point - Cost of embedding in the mesh is atmost O(d)
times the cost in the linear arrangement
32Illustration
Original Graph G
4
3
2
1
7
5
6
Sub-graph G2
Sub-graph G1
1
4
3
2
5
6
7
Sub-graph G12
Sub-graph G11
Sub-graph G22
5
6
1
Sub-graph G21
4
3
2
7
Final Map
6
1
5
7
2
4
3
33Illustration contd..
Final Map
1
5
7
2
4
3
6
3
7
6
2
4
1
5
34Results
- Existing Work
- Optimizing task layout on the BlueGene/L
Supercomputer - Bhanot et.al. IBM J. Res. Dev. Vol 49 No. 2/3
March/May 2005 - SA (Simulated Annealing) based approach to
optimize job layout - Mapping m(j) torus node location where MPI task
j is mapped. - H(m(k),m(j)) Number of Hops on torus between
mapping of task k and task j - F Free Energy k,j C(k,j) H(m(k),m(j))
- Minimize F by a series of swaps between randomly
chosen torus positions - KRV space-filling based scheme does not perform
as good as SA - SA directly tries to optimize the cost function.
Therefore, performs better. - KRV space-filling does it in a two-step fashion
- Moreover, the bounds proved for space-filling
have been proved for d-dim meshes - The cost function is based on a torus topology
- Another variant
- Use the map obtained by KRV space-filling as an
initial map - Apply annealing
35Alternate Approach
- Another approach employed in Bhanot et.al.
- Apply a graph partitioner to divide the original
graph into sub-graphs - Choose a sub-graph and apply SA on it
- Map the sub-graph onto the torus
- Repeat the same procedure with the next sub-graph
and the remaining available torus - Uses Metis to Partition
- Idea is to reduce the runtime without significant
degradation in performance
36Alternate Approach
- Graph Partitioning followed by annealing
- Compare Metis Vs KRV applied to the above
approach - Annealing optimizes each sub-graph separately
- Oblivious to the edges crossing the cut
- Map needs to be optimized furthur
- We apply a hill-climbing style local search
heuristic - In iteration i
- Swap neighbors separated by distance i
- If the cost improves, commit this move
- Else, reject the move
- Local optimization heuristic
37Benchmarks
- NAS parallel benchmarks
- SP, BT, LU, CG, MG
- Standard communication benchmarks
- Smg2000
- Parallel semi-coarsening multi-grid solver for
the linear systems - Umt2k
- 3D, deterministic, multi-group, photon transport
code for unstructured meshes - Collecting the communication matrices
- All applications were linked to the MPI tracing
library - Run on BlueGene/l bgd machines
- Generates communication matrices in dense format
38Results
39Conclusions/Future Work
- Graph Partitioning
- Empirically results show that KRV algorithm
outperforms Metis in terms of cut quality - For certain benchmark graphs, power-law graphs
- Holds good promise
- Explore the utility of KRV in other problem
domains - Node-Rank Mapping
- For the cases when KRV cuts are better, the
resulting maps obtained are also better - Runtime of KRV is significantly higher
- Choice of the Max-Flow algorithm
- Avoiding running O(log(sqr(n)) iterations by
checking if the graph H is already an expander - Graph sparsification Benczur et. al
- Issues in improving the mapping scheme furthur
- Choice of the local search heuristic
- Choice of the objective function
40