Title: Adaptive Parallelization Strategies using Data-driven Objects
1Adaptive Parallelization Strategies using
Data-driven Objects
- Laxmikant Kale
- David Padua
2Outline
- Quench and solidification codes
- Coarse grain parallelization of the quench code
- Adaptive parallelization techniques
- Dynamic variations
- Adaptive load balancing
- Finite element framework with adaptivity
- Preliminary results
3OpenMP
4(No Transcript)
5(No Transcript)
6Coarse grain parallelization
- Structure of current sequential quench code
- 2-D array of elements (each independently
refined) - Within row dependence
- Independent rows, but
- share global variables
- Parallelization using Charm
- 3 hours effort (after a false start)
- about 20 lines of change to F90 code
- A 100 line Charm wrapper
7Performance results
Contributors Engineering N. Sobh, R.
Haber Computer Science M. Bhandarkar, R.
Liu, L. Kale
8Adaptive Strategies
- Advanced codes model dynamic and irregular
behavior - Solidification adaptive grid refinement
- Quench
- Complex dependencies,
- Parallelization within elements
- To parallelize these effectively,
- adaptive runtime strategies are necessary
9Multi-partition decomposition using objects
- Idea decompose the problem into a number of
partitions, - independent of the number of processors
- Partitions gt Processors
- The system maps partitions to processors
- The system should be able to map and re-map
objects as needed
10Charm
- A parallel C library
- Supports data driven objects
- singleton objects, object arrays, groups,
- Many objects per processor, with method execution
scheduled with availability of data - System supports automatic instrumentation and
object migration - Works with other paradigms MPI, openMP, ..
11Data driven executionin Charm
Scheduler
Scheduler
Message Q
Message Q
12Load Balancing Framework
- Aimed at handling ...
- Continuous (slow) load variation
- Abrupt load variation (refinement)
- Workstation clusters in multi-user mode
- Measurement based
- Exploits temporal persistence of computation and
communication structures - Very accurate (compared with estimation)
- instrumentation possible via Charm/Converse
13Object balancing framework
14Utility of the framework workstation clusters
- Cluster of 8 machines,
- One machine gets another job
- Parallel job slows down on all machines
- Using the framework
- Detection mechanism
- Migrate objects away from overloaded processor
- Restored almost original throughput!
15Higher level framework
Automatic Conversion from MPI
Cross module interpolation
Structured
FEM
MPI-on-Charm
Irecv
Frameworkpath
Load database balancer
Migration path
Charm
Converse
16Example application
- Crack propagation
- (P. Geubelle et al)
- Similar in structure to Quench components
- 1900 lines of F90
- Rewritten using FEM framework in C
- 1000 lines of C code
- Parallelization completely by the framework
17Crack Propagation code, C version, with 70k
elements
18Crack propagation preliminary results
Artificially emulates load imbalance created by
extra work near the crack (e.g. due to adaptive
refinement) Data obtained on 8 processors of
Origin 2000
19Summary and Planned Research
- Use the adaptive FEM framework
- To parallelize Quench code further
- Quad tree based solidification code
- First phase parallelize each phase separately
- Parallelize across refinement phases
- Refine the FEM framework
- Use feedback from applications
- Support for implicit solvers and multigrid
20(No Transcript)