HARNESS and fault tolerant MPI - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

HARNESS and fault tolerant MPI

Description:

re-order processes to make a contiguous communicator. ... Users need to re-compile to libftmpi and start application with ftmpirun command ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 31
Provided by: WSE9169
Category:

less

Transcript and Presenter's Notes

Title: HARNESS and fault tolerant MPI


1
  • HARNESS and fault tolerant MPI
  • Reviewed by Saswati Swami

2
Outline
  • 1. Motivation
  • 2. FT-MPI definition
  • 3. FT-MPI semantics
  • 4. FT-MPI implementation
  • 5. Applications
  • 6. Limitations
  • 7. References

3
Motivation
  • - As application and machine sizes grow the MTBF
    is less than the application run time.
  • - Current MPI implementations either just abort
    everything OR they use check-pointing to roll
    back which is expensive.
  • - All communication is via a communicator. MPI
    standard is based on a static model so any
    decrease in tasks leads to corrupted
    communicator.
  • - Develop MPI plug-in that takes advantage of
    Harness
  • robustness to allow a range of recovery
    alternatives
  • to an MPI application. Not just another
    MPI implementation.

4
Motivation
  • Harness will give us basic functionality of
    starting tasks, some basic comms between them,
    some attribute storage and some indication of
    errors and failures.

Application
TCP/IP basic link
Harness run-time
Pipes / sockets
TCP/IP
HARNESS Daemon
5
FT-MPI definition
  • 1. FT-MPI is a fault tolerant MPI system
    developed under the DOE HARNESS project.
  • 2. FT-MPI extends MPI and allows applications to
    decide what to do when an error occurs
  • restarting a failed node
  • continuing with a lesser number of nodes
  • 3. Under FT-MPI when a member of a communicator
    dies
  • the communicator state changes to indicate
    a problem
  • messages transfers can continue if safe
    or be stopped / ignored
  • to continue The users application can
    fix the communicators or
  • abort.

6
FT-MPI semantics
  • FT-MPI
  • 1. Communicator states
  • - FT_OK,
  • - FT_DETECTED,
  • - FT_RECOVER,
  • - FT_RECOVERED,
  • - FT_FAILED
  • 2. Process states
  • - OK,
  • - UNAVAILABLE
  • - JOINING
  • - FAILED
  • MPI
  • 1. Communicator states
  • - VALID
  • - INVALID
  • 2. Process states
  • - OK,
  • - FAILED

7
FT-MPI semantics communicator
failure handling
  • Communicators are invalidated if there is a
    failure detected.
  • underlying system sends a state update to all
    processes for that communicator.
  • for communication error, all communicators are
    not updated.
  • for process exit, all communicators that included
    this process are changed.
  • system behavior depends on the communicator
    failure mode chosen.
  • - modes set using MPI attribute calls.

8
FT-MPI semantics five failure modes
  • 1. SHRINK
  • re-order processes to make a contiguous
    communicator.
  • on a rebuild this forces the missing process to
    disappear from the communicator.
  • size changes, also process rank may change.
  • users need the communicators size to match its
    extent i.e. when using home grown collectives

9
FT-MPI semantics five failure modes
10
FT-MPI semantics five failure modes
11
FT-MPI semantics five failure modes
12
FT-MPI semantics five failure modes
  • 2. Blank
  • rebuild the communicator so that gaps are
    allowed.
  • but make sure collectives do the right thing
    afterwards.
  • MPI_COMM_SIZE returns the extent of the
    communicator, not the no. of valid processes in
    it.
  • P2P operations to a gap fail.
  • good for parameter sweeps / Monte Carlo
    simulations where process loss only means
    resending of data.

13
FT-MPI semantics five failure modes
  • 3. REBUILD
  • automatic recovery when a communicator that has
    died is rebuilt.
  • new process is inserted either in gap or at end.
  • new process is notified by return value from
    MPI_init.
  • used for applications that need a constant number
    of processes as in power of two FFT solvers.

14
FT-MPI semantics five failure modes
  • 4. REBUILD ALL
  • same as REBUILD except rebuilds all
    communicators, groups and resets all key values
    etc.
  • does a lot more work than REBUILD behind the
    scenes.
  • Useful for applications where there is multiple
    communicators (for each dimension) and SOME of
    key values etc.
  • Slower and has slightly higher overhead due to
    extra state it has to distribute.

15
FT-MPI semantics five failure modes
  • 5. ABORT
  • default MPI behavior
  • user unable to trap graceful exit

16
FT-MPI semantics message modes
  • 1. NO-OP (NOP)
  • no user level message operations allowed on
    error.
  • all operations return an error code.
  • - User will re-post all operations after
    recovery.
  • 2. CONTINUE (CONT)
  • all messages that can be sent are sent.
  • - you always get to receive if a message is
    waiting for you.
  • all operations which returned MPI_SUCCESS will be
    finished after recovery.

17
FT-MPI semantics P2P vs. collective
correctness
  • 1. Collective operations are dealt with
    differently than P2P.
  • - will return only if operation would have
    given the same answer as if no failure occurred
    for the surviving members.
  • 2. Two classes of collective operations
  • - broadcast / scatter
  • succeed if the non root node fails
  • - gather / reduce
  • fail if there is an error

18
FT-MPI semantics FT-MPI basic usage
  • Simple FT-MPI send usage
  • do
  • rc MPI_Send (. com )
  • If (rcMPI_ERR_OTHER)
  • MPI_Comm_dup (com, newcom )
  • MPI_Comm_free (com)
  • com newcom / continue /
  • while (rc!MPI_SUCCESS)
  • Checking every call is not always necessary. SPMD
    master-worker codes only need error checking in
    the master code if the user is willing to accept
    the master as the only point of failure.

19
FT-MPI implementation
  • 1. Built in multiple layers
  • 2. Has tuned collectives and user derived data
    type handling.
  • 3. Users need to re-compile to libftmpi and start
    application with ftmpirun command
  • 4. Can be run both with and without a HARNESS
    core.

20
FT-MPI overall implementation structure
Collective Library
21
Derived Data Types

22
Derived Data Types
23
FT-MPI DDT performance
24
FT-MPI DDT performance
25
FT-MPI DDT performance
26
Reordering of a collective topology
27
Applications
  • 2 Example applications
  • 1. SCALAPAK
  • Non modified just to check we can handle a
    pre-existing standard MPI application.
  • 2. PSTSWM (Parallel Spectral Transform Shallow
    Water Model
  • Two versions
  • - standard to test performance
  • - user level checkpoint with rebuild

28
Limitations
  • Applications needs to be designed to use FT-MPI
    by including the extended APIs.
  • Changing legacy applications not feasible.
  • Additional user directed check-pointing needed
    for applications that need a higher level of
    fault tolerance
  • Semantics of existing MPI objects and functions
    are time-tested.
  • Also, the paper needs better logical organization

29
References
  • 1. Building and using a Fault Tolerant MPI
    implementation, Graham E Fagg, Jack J Dongarra
  • 2. Fault Tolerant MPI in High Performance
    Computing Semantics and Applications, Graham E.
    Fagg, Edgar Gabriel, and Jack J. Dongarra
  • 3. Making of the holy grail or a YAMI that is
    FT, Graham E Fagg

30
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com