Adaptive MPI - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive MPI

Description:

Title: Adaptive Load Balancing for MPI Programs Author: Milind A. Bhandarkar Last modified by: Milind Bhandarkar Created Date: 5/21/2001 7:23:36 PM – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 25
Provided by: MilindABh7
Category:
Tags: mpi | adaptive | phillips

less

Transcript and Presenter's Notes

Title: Adaptive MPI


1
Adaptive MPI
  • Milind A. Bhandarkar
  • (milind_at_cs.uiuc.edu)

2
Motivation
  • Many CSE applications exhibit dynamic behavior
  • Adaptive mesh refinement (AMR)
  • Pressure-driven crack propagation in solid
    propellants
  • Also, non-dedicated supercomputing platforms such
    as clusters affect processor availability
  • These factors cause severe load imbalance
  • Can the parallel language / runtime system help ?

3
Load Imbalance in Crack-propagation Application
(More and more cohesive elements activated
between iterations 35 to 40.)
4
Load Imbalance in an AMR Application
(Mesh is refined at 25th time step)
5
Multi-partition Decomposition
  • Basic idea
  • Decompose the problem into a number of
    partitions,
  • Independent of the number of processors
  • Partitions gt processors
  • The system maps partitions to processors
  • The system maps and re-maps objects as needed
  • Re-mapping strategies help adapt to dynamic
    variations
  • To make this work, need
  • Load balancing framework runtime support
  • But, isnt there a high overhead of
    multi-partitioning ?

6
Overhead of Multi-partition Decomposition
(Crack Propagation code, with 70k elements)
7
Charm
  • Supports data driven objects.
  • Singleton objects, object arrays, groups, ..
  • Many objects per processor, with method execution
    scheduled with availability of data.
  • Supports object migration, with automatic
    forwarding.
  • Excellent results with highly irregular dynamic
    applications.
  • Molecular dynamics application NAMD, speedup of
    1250 on 2000 processors of ASCI-red.
  • Brunner, Phillips, Kale. Scalable molecular
    dynamics, Gordon Bell finalist, SC2000.

8
Charm System Mapped Objects
9
Load Balancing Framework
10
However
  • Many CSE applications are written in Fortran, MPI
  • Conversion to a parallel object-based language
    such as Charm is cumbersome
  • Message-driven style requires split-phase
    transactions
  • Often results in a complete rewrite
  • How to convert existing MPI applications without
    extensive rewrite ?

11
Solution
  • Each partition implemented as a user-level thread
    associated with a message-driven object
  • Communication library for these threads same in
    syntax and semantics as MPI
  • But what about the overheads associated with
    threads ?

12
AMPI Threads Vs MD Objects (1D Decomposition)
13
AMPI Threads Vs MD Objects (3D Decomposition)
14
Thread Migration
  • Thread stacks may contain references to local
    variables
  • May not be valid upon migration to a different
    address space
  • Solution thread stacks should span the same
    virtual address space on any processor where they
    may migrate (Isomalloc)
  • Split the virtual space into per-processor
    allocation pool
  • Scalability issues
  • Not important on 64-bit processors
  • Constrained load balancing (limit the threads
    migratability to fewer processors)

15
AMPI Issues Thread-safety
  • Multiple threads mapped to each processor
  • Process data to be localized
  • Make them instance variables of a class
  • All subroutines become instance methods of this
    class
  • AMPIzer A source-to-source translator
  • Based on Polaris front-end
  • Recognize all global variables
  • Put them in a thread-private area

16
AMPI Issues Data Migration
  • Thread-private data needs to be migrated with the
    thread
  • Developer has to write subroutines for packing
    and unpacking data
  • Writing separate subroutines is error-prone
  • Puppers (puppack-unpack)
  • A subroutine to show the data to the runtime
    system
  • Fortran90 generic procedures make writing the
    pupper easy

17
AMPI Other Features
  • Automatic checkpoint and restart
  • On different number of processors
  • Number of chunks remain the same, but can be
    mapped to different number of processors
  • No additional work is needed
  • Same pupper used for migration is also used for
    checkpointing and restart

18
Adaptive Multi-MPI
  • Integration of multiple MPI-based modules
  • Example integrated rocket simulation
  • ROCFLO, ROCSOLID, ROCBURN, ROCFACE
  • Each module gets its own MPI_COMM_WORLD
  • All COMM_worlds form MPI_COMM_UNIVERSE
  • Point to point communication between different
    MPI_COMM_worlds using the same AMPI functions
  • Communication across modules is also considered
    while balancing load

19
Experimental Results
20
AMR Application With Load Balancing
(Load balancer is activated at time steps 20, 40,
60, and 80.)
21
AMPI Load Balancing on Heterogeneous Clusters
(Experiment carried out on a cluster of Linux
workstations.)
22
AMPI Vs MPI
(This is a scaled problem.)
23
AMPI Overhead
24
AMPI Status
  • Over 70 commonly used functions from MPI 1.1
  • All point-to-point communication functions
  • All collective communications functions
  • User-defined MPI data types
  • C, C, and Fortran (77/90) bindings
  • Tested on Origin 2000, IBM SP, Linux and Solaris
    clusters
  • Should run on any platform supported by Charm
    that has mmap
Write a Comment
User Comments (0)
About PowerShow.com