Title: Dynamic Load Balancing Repartitioning
1Workshop on Combinatorial Scientific Computing
Petascale Simulations 2008 June 10-13, 2008,
Santa Fe, NM
Dynamic Load Balancing (Repartitioning) Matrix
Partitioning
Ümit V. Çatalyürek Associate Professor Department
of Biomedical Informatics Department of
Electrical Computer Engineering The Ohio State
University
2OSUs CSCAPES Contributions
- Load Balancing
- Parallel Static Load Balancing
- Parallel Dynamic Load Balancing
- Parallel Graph Coloring
- Distance-1 coloring
- Distance-2 coloring
- talk by Bozdag Friday morning
- Parallel Matrix Partitioning
- Parallel Matrix Ordering
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
2
CSCAPES Workshop, June 10, 2008
3Roadmap
- Dynamic Load Balancing
- Motivation
- Background
- Classification of Repartitioning Techniques
- Graph and Hypergraph Approaches
- New Hypergraph Model for Dynamic Load Balancing
- Parallel Multilevel Hypergraph Partitioning with
Fixed Vertices - Experimental Results Summary
- Matrix Partitioning
- 1D Hypergraph-based Methods Row-wise and
Column-wise - 2D Hypergraph-based Methods Fine-grain,
Jagged-Like, Checkerboard - Experimental Results Summary
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
3
CSCAPES Workshop, June 10, 2008
4Partitioning and Load Balancing
- Goal assign data to processors to
- minimize application runtime
- maximize utilization of computing resources
- Metrics
- minimize processor idle time (balance workloads)
- keep inter-processor communication costs low
- Impacts performance of a wide range of
simulations
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
4
5Dynamic Load Balancing/Repartitioning
- Applications with workload or locality that
changes during simulation require dynamic load
balancing (a.k.a. repartitioning) - Adaptive mesh refinement
- Particle methods
- Contact detection
- Repartitioning has additional cost
- Moving data from old to new decomposition
- executionT iter x ( computationT
communicationT) repartT migrationT
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
5
6Roadmap
- Dynamic Load Balancing
- Motivation
- Background
- Classification of Repartitioning Techniques
- Graph and Hypergraph Approaches
- New Hypergraph Model for Dynamic Load Balancing
- Parallel Multilevel Hypergraph Partitioning with
Fixed Vertices - Experimental Results Summary
- Matrix Partitioning
- 1D Hypergraph-based Methods Row-wise and
Column-wise - 2D Hypergraph-based Methods Fine-grain,
Jagged-Like, Checkerboard - Experimental Results Summary
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
6
CSCAPES Workshop, June 10, 2008
7Classification of Dynamic Load Balancing
Approaches
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
7
CSCAPES Workshop, June 10, 2008
8Graph and Hypergraph Partitioning
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
8
9Impact of Hypergraph Models(Where Graph is not
Sufficient)
- Greater expressiveness ? Greater applicability
- Structurally non-symmetric systems
- circuits, biology
- Rectangular systems
- linear programming, least-squares methods
- Non-homogeneous, highly connected topologies
- circuits, nanotechnology, databases
- Multiple models for different granularity
partitioning - Owner compute, fine-grain, checkerboard/cartesian,
Mondriaan - Accurate communication model ? lower application
communication costs
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
9
CSCAPES Workshop, June 10, 2008
10Roadmap
- Dynamic Load Balancing
- Motivation
- Background
- Classification of Repartitioning Techniques
- Graph and Hypergraph Approaches
- New Hypergraph Model for Dynamic Load Balancing
- Parallel Multilevel Hypergraph Partitioning with
Fixed Vertices - Experimental Results Summary
- Matrix Partitioning
- 1D Hypergraph-based Methods Row-wise and
Column-wise - 2D Hypergraph-based Methods Fine-grain,
Jagged-Like, Checkerboard - Experimental Results Summary
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
10
CSCAPES Workshop, June 10, 2008
11Hypergraph Model
- parts edge ei connects
- Cut
- Cut total comm volume
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
11
12Hypergraph Repartitioning
- Start with application hypergraph
- Add
- one partition vertex for each partition
- migration edges connecting application vertices
to their partition vertices - Weight the hyperedges
- Migration edge weight size of application
objects (migration size) - Application edge weight size of communication
elements - Scale application edge weights by ? number of
application communications between repartitions
(iter) - Perform hypergraph partitioning with partition
vertices fixed
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
12
13Hypergraph Repartitioning
- Start with application hypergraph
- Add
- one partition vertex for each partition
- migration edges connecting application vertices
to their partition vertices - Weight the hyperedges
- Migration edge weight size of application
objects (migration size) - Application edge weight size of communication
elements - Scale application edge weights by ? number of
application communications between repartitions
(iter) - Perform hypergraph partitioning with partition
vertices fixed
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
13
14Hypergraph Repartitioning
- Start with application hypergraph
- Add
- one partition vertex for each partition
- migration edges connecting application vertices
to their partition vertices - Weight the hyperedges
- Migration edge weight size of application
objects (migration size) - Application edge weight size of communication
elements - Scale application edge weights by ? number of
application communications between repartitions
(iter) - Perform hypergraph partitioning with partition
vertices fixed
executionT iter x ( computationT
communicationT) repartT migrationT
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
14
15Hypergraph Repartitioning
- Start with application hypergraph
- Add
- one partition vertex for each partition
- migration edges connecting application vertices
to their partition vertices - Weight the hyperedges
- Migration edge weight size of application
objects (migration size) - Application edge weight size of communication
elements - Scale application edge weights by ? number of
application communications between repartitions
(iter) - Perform hypergraph partitioning with partition
vertices fixed
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
15
16Roadmap
- Dynamic Load Balancing
- Motivation
- Background
- Classification of Repartitioning Techniques
- Graph and Hypergraph Approaches
- New Hypergraph Model for Dynamic Load Balancing
- Parallel Multilevel Hypergraph Partitioning with
Fixed Vertices - Experimental Results Summary
- Matrix Partitioning
- 1D Hypergraph-based Methods Row-wise and
Column-wise - 2D Hypergraph-based Methods Fine-grain,
Jagged-Like, Checkerboard - Experimental Results Summary
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
16
CSCAPES Workshop, June 10, 2008
17Implementation of Hypergraph Repartitioning
- Implemented in Zoltan toolkit
- Based on parallel multilevel parallel hypergraph
partitioner with recursive bisection (IPDPS06) - Automatically construct augmented hypergraph
-
- with added capability for handling fixed
vertices.
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
17
CSCAPES Workshop, June 10, 2008
18Experimental Results
- Experiments on
- OSU-RI cluster
- 64 compute nodes connected with Infiniband
- Dual 2.4 GHz AMD Opteron processors with 8 GB RAM
- Sandia-Thunderbird cluster
- 4,480 compute nodes connected with Infiniband
- Dual 3.6 GHz Intel EM64T processors with 6 GB RAM
- Zoltan v3 (alpha) hypergraph partitioner
ParMETIS v3.1 graph partitioner - Test problems
- 2DLipid density functional theory 4K x 4K 5.6M
nonzeros - Xyce ASIC Stripped 680K x 680K 2.3M nonzeros
- Cage14 DNA Electrophoresis 1.5M x 1.5M 27M
nonzeros
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
18
19Communication Volume
Xyce
2DLipid
Cage14
- Hypergraph is better
- Zoltan-repart trades comm with migration to min
tot cost - Scratch methods are comparable for large alpha
(iter)
20Dynamic Graph Partitioning Time on T-bird
2DLipid
Xyce
Cage14
21Summary of Dynamic Load Balancing
- A novel hypergraph model for dynamic load
balancing - Single hypergraph that incorporates both
communication volume in the application and data
migration cost - Performs better or comparable to graph-based
dynamic load balancing - A parallel dynamic load balancing tool
- Essential for peta-scale applications
- Scales similar to those of graph-based tools
- Future Work
- There is always room for improvement speed
and/or quality - Direct k-way refinement
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
21
CSCAPES Workshop, June 10, 2008
22Roadmap
- Dynamic Load Balancing
- Motivation
- Background
- Classification of Repartitioning Techniques
- Graph and Hypergraph Approaches
- New Hypergraph Model for Dynamic Load Balancing
- Parallel Multilevel Hypergraph Partitioning with
Fixed Vertices - Experimental Results Summary
- Matrix Partitioning
- 1D Hypergraph-based Methods Row-wise and
Column-wise - 2D Hypergraph-based Methods Fine-grain,
Jagged-Like, Checkerboard - Experimental Results Summary
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
22
CSCAPES Workshop, June 10, 2008
23Matrix Partitioning
- Hypergraph Models for Sparse-Matrix Partitioning
- 1D
- row-wise
- column-wise
- 2D
- Fine-grain
- Jagged-like
- Checkerboard
- Serial Tool PaToH Matlab interface
- Matrix Partitioning
- Partitioned Matrix Display
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
23
241D Partitioning
- M x N matrices with K processors
- Worst case
- Total Volume (K-1) x N words or (K-1) x M
words - Total Number Messages K x (K-1)
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
24
CSCAPES Workshop, June 10, 2008
252D PartitioningJagged-Like
- M x N matrices with KPxQ processors
- Worst case
- Total Volume (K-P) x N (Q-1) x M
- Total Number Messages K x (K-Q) K x (Q-1) K
x (K-1)
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
25
CSCAPES Workshop, June 10, 2008
262D Partitioning Checkerboard
- M x N matrices with KPxQ processors
- Worst case
- Total Volume (P-1) x N (Q-1) x M
- Total Number Messages PQ-2
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
26
CSCAPES Workshop, June 10, 2008
27cage5
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
27
CSCAPES Workshop, June 10, 2008
28cage5
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
28
CSCAPES Workshop, June 10, 2008
29cage5
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
29
CSCAPES Workshop, June 10, 2008
30Experimental Results
- Tested 1,413 matrices (out of 1,877) from UFL
Collection - rows gt 500 and columns gt 500
- non-zeros lt 10,000,000
- K-way partitioning for K 4, 16, 64 and 256
- If 50 x K gt max rows, columns
- Partitioning instance matrix K
- For each partitioning instance we run RW, CW, JL,
CH, FG methods - Linux Cluster
- 64 dual 2.4GHz Opteron CPUs, 8GB ram
-
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
30
CSCAPES Workshop, June 10, 2008
31Experimental Results Total Communication Volume
Performance Profiles
All Instances (4040)
Square Symmetric (2231)
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
31
CSCAPES Workshop, June 10, 2008
32Experimental Results Total Communication Volume
Square Non-symmetric (1102)
Rectangular (707) NgtM (662) ? CW better than
RW MgtN (45)
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
32
CSCAPES Workshop, June 10, 2008
33Experimental Results Total Number of Messages
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
33
CSCAPES Workshop, June 10, 2008
34Experimental Results Execution Time
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
34
CSCAPES Workshop, June 10, 2008
35Summary of Matrix Partitioning
- Hypergraph models for Matrix Partitioning
- Well.. some are not new but not have been adopted
by applications yet. Why? (Information
dissemination problem? Tool?) - More hypergraph-based methods are being
developed! - Corner-Model
- Hybrid Mondrian with Fine-Grain
- Matlab interface to PaToH for Matrix Partitioning
- Currently supports RW, CW, JL, CH, FG
- Will be available soon
- Work in progress
- Parallel Matrix Partitioning via Zoltan
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
CSCAPES Workshop, June 10, 2008
35
36Thanks
- Contact Info
- umit_at_bmi.osu.edu
- http//bmi.osu.edu/umit
- Also
- http//www.cs.sandia.gov/Zoltan/
- http//www.cscapes.org/
Umit Catalyurek "Dynamic Load Bal. Matrix
Partitioning"
36
CSCAPES Workshop, June 10, 2008