Managing Heterogeneous MPI Application Interoperation and Execution. - PowerPoint PPT Presentation

About This Presentation
Title:

Managing Heterogeneous MPI Application Interoperation and Execution.

Description:

Allow intercommunication between different MPI implementations or instances of ... soon condor and lsf. PVMPI2 Internals. I.e. PVM. Uses PVM as a communications layer ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 31
Provided by: Fagg
Learn more at: https://icl.utk.edu
Category:

less

Transcript and Presenter's Notes

Title: Managing Heterogeneous MPI Application Interoperation and Execution.


1
Managing Heterogeneous MPI Application
Interoperation and Execution.
  • From PVMPI to SNIPE based MPI_Connect()
  • Graham E. Fagg, Kevin S. London, Jack J.
    Dongarra and Shirley V. Browne.

University of Tennessee and Oak Ridge National
Laboratory
contact at fagg_at_cs.utk.edu
2
Project Targets
  • Allow intercommunication between different MPI
    implementations or instances of the same
    implementation on different machines
  • Provide heterogeneous intercommunicating MPI
    applications with access to some of the MPI-2
    process control and parallel I/O features
  • Allow use of optimized vendor MPI
    implementations while still permitting
    distributed heterogeneous parallel computing in a
    transparent manner.

3
MPI-1 Communicators
  • All processes in an MPI-1 application belong to a
    global communicator called MPI_COMM_WORLD
  • All other communicators are derived from this
    global communicator.
  • Communication can only occur within a
    communicator.
  • Safe communication

4
MPI InternalsProcesses
  • All process groups are derived from the
    membership of the MPI_COMM_WORLD communicator.
  • i.e.. no external processes.
  • MPI-1 process membership is static not dynamic
    like PVM or LAM 6.X
  • simplified consistency reasoning
  • fast communication (fixed addressing) even across
    complex topologies.
  • interfaces well to simple run-time systems as
    found on many MPPs.

5
MPI-1 Application
MPI_COMM_WORLD
Derived_Communicator
6
Disadvantages of MPI-1
  • Static process model
  • If a process fails, all communicators it belongs
    to become invalid. I.e. No fault tolerance.
  • Dynamic resources either cause applications to
    fail due to loss of nodes or make applications
    inefficient as they cannot take advantage of new
    nodes by starting/spawning additional processes.
  • When using a dedicated MPP MPI implementation you
    cannot usually use off-machine or even off-
    partion nodes.

7
MPI-2
  • Problem areas and needed additional features
    identified in MPI-1 are being addressed by the
    MPI-2 forum.
  • These include
  • inter-language operation
  • dynamic process control / management
  • parallel IO
  • extended collective operations
  • Support for inter-implementation
    communication/control was considered ??
  • See other projects such as NIST IMPI, PACX and
    PLUS

8
User requirements for Inter-Operation
  • Dynamic connections between tasks on different
    platforms/systems/sites
  • Transparent inter-operation once it has started
  • Single style of API, i.e. MPI only!
  • Access to virtual machine / resource management
    when required not mandatory use
  • Support for complex features across
    implementations such as user derived data types
    etc.

9
MPI_Connect/PVMPI 2
  • API in the MPI-1 style
  • Inter-application communication using standard
    point to point MPI-1 functions calls
  • Supporting all variations of send and receive
  • All MPI data types including user derived
  • Naming Service functions similar to the semantics
    and functionality of current MPI
    inter-communicator functions
  • ease of use for programmers only experienced in
    MPI and not other message passing systems such as
    PVM / Nexus or TCP socket programming!

10
Process Identification
  • Process groups are identified by a character
    name. Individual processes are address by a set
    of tuples.
  • In MPI
  • communicator, rank
  • process group, rank
  • In MPI_Connect/PVMPI2
  • name, rank
  • name, instance
  • Instance and rank are identical and range from
    0..N-1 where N is the number of processes.

11
Registration I.e. naming MPI applications
  • Process groups register their name with a global
    naming service which returns them a system handle
    used for future operations on this name /
    communicator pair.
  • int MPI_Conn_register (char name, MPI_Comm
    local_comm, int handle)
  • call MPI_CONN_REGISTER (name, local_comm, handle,
    ierr)
  • Processes can remove their name from the naming
    service with
  • int MPI_Conn_remove (int handle)
  • A process may have multiple names associated with
    it.
  • Names can be registered, removed and
    reregistered multiple times without restriction.

12
MPI-1 Inter-communicators
  • An inter-communicator is used for point to point
    communication between disjoint groups of
    processes.
  • Inter-communicators are formed using
    MPI_Intercomm_create() which operates upon two
    existing non-overlapping intra-communicator and a
    bridge communicator.
  • MPI_Connect/PVMPI could not use this mechanism as
    there is not a MPI bridge communicator between
    groups formed from separate MPI applications as
    their MPI_COMM_WORLDs do not overlap.

13
Forming Inter-communicators
  • MPI_Connect/PVMPI2 forms its inter-communicators
    with a modified MPI_Intercomm_create call.
  • The bridging communication is performed
    automatically and the user only has to specify
    the remote groups registered name.
  • int MPI_Conn_intercomm_create (int local_handle,
    char remote_group_name, MPI_Comm
    new_inter_comm)
  • Call MPI_CONN_INTERCOMM_CREATE (localhandle,
    remotename, newcomm, ierr)

14
Inter-communicators
  • Once an inter-communicator has been formed it can
    be used almost exactly as any other MPI
    inter-communicator
  • All point to point operations
  • Communicator comparisons and duplication
  • Remote group information
  • Resources released by MPI_Comm_free()

15
Simple exampleAir Model
/ air model / MPI_Init (argc,
argv) MPI_Conn_register (AirModel,
MPI_COMM_WORLD, air_handle) MPI_Conn_interc
omm_create ( handle, OceanModel,
ocean_comm) MPI_Comm_rank( MPI_COMM_WORLD,
myrank) while (!done) / do work using
intra-comms / / swap values with other model
/ MPI_Send( databuf, cnt, MPI_DOUBLE, myrank,
tag, ocean_comm) MPI_Recv( databuf, cnt,
MPI_DOUBLE, myrank, tag, ocean_comm,
status) / end while done work
/ MPI_Conn_remove ( air_handle ) MPI_Comm_free
( ocean_comm ) MPI_Finalize()
16
Ocean model
/ ocean model / MPI_Init (argc,
argv) MPI_Conn_register (OceanModel,
MPI_COMM_WORLD, ocean_handle) MPI_Conn_intercomm
_create ( handle, AirModel, air_comm) MPI_Comm
_rank( MPI_COMM_WORLD, myrank) while (!done)
/ do work using intra-comms / / swap values
with other model / MPI_Recv( databuf, cnt,
MPI_DOUBLE, myrank, tag, air_comm, status)
MPI_Send( databuf, cnt, MPI_DOUBLE, myrank, tag,
air_comm) MPI_Conn_remove ( ocean_handle
) MPI_Comm_free ( air_comm ) MPI_Finalize()
17
Coupled model
MPI Application Ocean Model
MPI Application Air Model
MPI_COMM_WORLD
MPI_COMM_WORLD
air_comm -gt
lt- ocean_comm
Global inter-communicator
18
MPI_Connect InternalsI.e. SNIPE
  • SNIPE is a meta-computing system from UTK that
    was designed to support long-term distributed
    applications.
  • Uses SNIPE as a communications layer.
  • Naming services is provided by the RCDS RC_Server
    system (which is also used by HARNESS for
    repository information).
  • MPI application startup is via SNIPE daemons that
    interface to standard batch/queuing systems
  • These understand LAM, MPICH, MPIF (POE), SGI MPI,
    squb variations
  • soon condor and lsf

19
PVMPI2 InternalsI.e. PVM
  • Uses PVM as a communications layer
  • Naming services is provided by the PVM Group
    Server in PVM3.3.x and by the Mailbox system in
    PVM 3.4.
  • PVM 3.4 is simpler as it has user controlled
    message contexts and message handlers.
  • MPI application startup is provided by specialist
    PVM tasker processes.
  • These understand LAM, MPICH, IBM MPI (POE), SGI
    MPI

20
Internalslinking with MPI
  • MPI_Connect and PVMPI are built as a MPI
    profiling interface. Thus they are transparent to
    user applications.
  • During building it can be configured to call
    other profiling interfaces and hence allow
    inter-operation with other MPI monitoring/tracing/
    debugging tool sets.

21
MPI_Connect / PVMPI Layering
Intercomm Library
Users Code
Look up communicators etc
MPI_function
If true MPI intracomm then use profiled MPI call
PMPI_Function
Else translate into SNIPE/PVM addressing and use
SNIPE/PVM functions
other library
Work out correct return code
Return code
22
Process Management
  • PVM and/or SNIPE can handle the startup of MPI
    jobs
  • General Resource Manager / Specialist PVM Tasker
    control / SNIPE daemons
  • Jobs can also be started by MPIRUN
  • useful when testing on interactive nodes
  • Once enrolled, MPI processes are under SNIPE or
    PVM control
  • signals (such as TERM/HUP)
  • notification messages (fault tolerance)

23
Process Management
SGI O2K
IBM SP2
MPICH Cluster
MPICH Tasker
POE Tasker
Pbs/qsub Tasker and PVMD
GRM
User Request
24
Conclusions
  • MPI_Connect and PVMPI allow different MPI
    implementations to inter-operate.
  • Only 3 additional calls required.
  • Layering requires a full profiled MPI library
    (complex linking).
  • Intra-communication performance may be slightly
    effected.
  • Inter-communication as fast as intercomm library
    used (either PVM or SNIPE).

25
MPI_Connect interface much like Names, addresses
and ports in MPI-2
  • Well know address hostport
  • MPI_PORT_OPEN makes a port
  • MPI_ACCEPT lets client connect
  • MPI_CONNECT client side connection
  • Service naming
  • MPI_NAME_PUBLISH (port, info, service)
  • MPI_NAME_GET client side to get port

26
Server-Client model
MPI_Accept()
Server
Inter-comm
Client
hostport
MPI_Connect
27
Server-Client model
MPI_Port_open()
Server
hostport
MPI_Name_publish
NAME
Client
MPI_Name_get
hostport
28
SNIPE
29
SNIPE
  • Single GlobalName space
  • Built using RCDS supports URLs,URNs and LIFNs
  • testing against LDAP
  • Scalable / Secure
  • Multiple Resource Managers
  • Safe execution environment
  • Java etc..
  • Parallelism is the basic unit of execution

30
Additional Information
  • PVM http//www.epm.ornl.gov/pvm
  • PVMPI http//icl.cs.utk.edu/projects/pvmpi/
  • MPI_Connect http//icl.cs.utk.edu/projects/mpi_c
    onnect/
  • SNIPE htp//www.nhse.org/snipe/ and
    http//icl.cs.utk.edu/projects/snipe/
  • RCDS htp//www.netlib.org/utk/projects/rcds/
  • ICL http//www.netlib.org/icl/
  • CRPC http//www.crpc.rice.edu/CRPC
  • DOD HPC MSRCs
  • CEWES http//www.wes.hpc.mil
  • ARL http//www.arl.hpc.mil
  • ASC http//www.asc.hpc.mil
Write a Comment
User Comments (0)
About PowerShow.com