Adaptive Parallelization Strategies using Data-driven Objects

About This Presentation

Title:

Adaptive Parallelization Strategies using Data-driven Objects

Description:

Coarse grain parallelization of the quench code. Adaptive parallelization ... Artificially emulates load imbalance created by extra work near the crack (e.g. ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 21

Provided by: laxmika

Learn more at: http://charm.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Parallelization Strategies using Data-driven Objects

1
Adaptive Parallelization Strategies using
Data-driven Objects

Laxmikant Kale
David Padua

2
Outline

Quench and solidification codes
Coarse grain parallelization of the quench code
Adaptive parallelization techniques
Dynamic variations
Adaptive load balancing
Finite element framework with adaptivity
Preliminary results

3
OpenMP
4
(No Transcript)
5
(No Transcript)
6
Coarse grain parallelization

Structure of current sequential quench code
2-D array of elements (each independently
refined)
Within row dependence
Independent rows, but
share global variables
Parallelization using Charm
3 hours effort (after a false start)
about 20 lines of change to F90 code
A 100 line Charm wrapper

7
Performance results
Contributors Engineering N. Sobh, R.
Haber Computer Science M. Bhandarkar, R.
Liu, L. Kale
8
Adaptive Strategies

Advanced codes model dynamic and irregular
behavior
Solidification adaptive grid refinement
Quench
Complex dependencies,
Parallelization within elements
To parallelize these effectively,
adaptive runtime strategies are necessary

9
Multi-partition decomposition using objects

Idea decompose the problem into a number of
partitions,
independent of the number of processors
Partitions gt Processors
The system maps partitions to processors
The system should be able to map and re-map
objects as needed

10
Charm

A parallel C library
Supports data driven objects
singleton objects, object arrays, groups,
Many objects per processor, with method execution
scheduled with availability of data
System supports automatic instrumentation and
object migration
Works with other paradigms MPI, openMP, ..

11
Data driven executionin Charm
Scheduler
Scheduler
Message Q
Message Q
12
Load Balancing Framework

Aimed at handling ...
Continuous (slow) load variation
Abrupt load variation (refinement)
Workstation clusters in multi-user mode
Measurement based
Exploits temporal persistence of computation and
communication structures
Very accurate (compared with estimation)
instrumentation possible via Charm/Converse

13
Object balancing framework
14
Utility of the framework workstation clusters

Cluster of 8 machines,
One machine gets another job
Parallel job slows down on all machines
Using the framework
Detection mechanism
Migrate objects away from overloaded processor
Restored almost original throughput!

15
Higher level framework
Automatic Conversion from MPI
Cross module interpolation
Structured
FEM
MPI-on-Charm
Irecv
Frameworkpath
Load database balancer
Migration path
Charm
Converse
16
Example application

Crack propagation
(P. Geubelle et al)
Similar in structure to Quench components
1900 lines of F90
Rewritten using FEM framework in C
1000 lines of C code
Parallelization completely by the framework

17
Crack Propagation code, C version, with 70k
elements
18
Crack propagation preliminary results
Artificially emulates load imbalance created by
extra work near the crack (e.g. due to adaptive
refinement) Data obtained on 8 processors of
Origin 2000
19
Summary and Planned Research