Advanced MPI - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced MPI

Description:

You have run parallel jobs through a queueing system in a shared environment ... PVFS, LUSTRE, IBRIX,Panasas. I/O Libraries. HDF, NetCDF, Panda ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 82
Provided by: kentmi
Category:
Tags: mpi | advanced | ibrix

less

Transcript and Presenter's Notes

Title: Advanced MPI


1
Advanced MPI
  • Dan Stanzione

2
Audience Background
  • This class assumes you have some background in
    parallel programming
  • You are familiar with basic MPI
  • You are familiar with basic OpenMP
  • And, of course, you are familiar with a
    programming language and the UNIX/Linux operating
    environment
  • You have run parallel jobs through a queueing
    system in a shared environment such as Ranger
    before.
  • You can use SSH to log into remote systems and
    transfer files
  • If this is not the case, please see the TACC
    introductory courses
  • Slides from Lisboa class this week
    http//taccspringschool.ist.utl.pt/SpringSchool/Ag
    enda.html
  • Ranger Virtual Workshop
  • https//www.cac.cornell.edu/ranger/

3
Outline
  • Review of MPI Advanced Topics
  • Derived Datatypes
  • Communicator manipulations
  • One-sided communication
  • I/O
  • What is Parallel I/O? Do I need it?
  • Cluster Filesystem Options
  • MPI I/O and ROMIO
  • Example striping schemes

4
User Defined Datatypes
  • Methods for creating data types
  • MPI_Type_contiguous()
  • MPI_Type_vector()
  • MPI_Type_indexed()
  • MPI_Type_struct()
  • MPI_Pack()
  • MPI_Unpack()
  • MPI allows datatypes to be defined in much the
    same way as modern programming languages
    (C,C,F90)
  • This allows your communication and I/O operations
    to operate using the same datatypes as the rest
    of your program
  • Makes expressing the partitioning of datasets
    easier

5
Contiguous Array
  • Creates an array of counts elements
  • MPI_Type_contiguous(int count,
  • MPI_Datatype oldtype,
  • MPI_Datatype newtype)

6
Strided Vector
  • Constructs a cyclic set of elements
  • MPI_Type_vector(int count,
  • int blocklength,
  • int stride,
  • MPI_Datatype oldtype,
  • MPI_Datatype newtype)
  • Stride specified in number of elements
  • Stride can be specified in bytes
  • MPI_Type_hvector()
  • Stride counts from start of block

7
Subarrarys
  • Perhaps the most useful MPI Datatype, the
    Subarray type lets you divide a
    multi-dimensional array into smaller blocks
  • int MPI_Type_create_subarray(ndims,
    array_of_sizes, array_of_subsizes,
    array_of_starts, order, oldtype, newtype)
  • int ndims
  • int array_of_sizes
  • int array_of_subsizes
  • int array_of_starts
  • int order
  • MPI_Datatype oldtype
  • MPI_Datatype newtype
  • SubArray Example coming after we add some
    communicator and I/O operations

8
SubArray Illustration
start1
Size0
start0
subsize0
subsize1
Size1
9
Indexed Vector
  • Allows an irregular pattern of elements
  • MPI_Type_indexed(int count,
  • int array_of_blocklengths,
  • int array_of_displacements,
  • MPI_Datatype oldtype,
  • MPI_Datatype newtype)
  • Displacements specified in number of elements
  • Displacements can be specified in bytes
  • MPI_Type_hindexed()
  • MPI_Type_create_indexed_block() is a shortcut if
    all blocks are the same length

10
Structured Records
  • Allows different types to be combined
  • MPI_Type_struct(int count,
  • int array_of_blocklengths,
  • MPI_Aint array_of_displacements,
  • MPI_Datatype array_of_types,
  • MPI_Datatype newtype)
  • Blocklengths specified in number of elements
  • Displacements specified in bytes

11
Committing types
  • In order for a user-defined derived datatype to
    be used as an argument to other MPI calls, the
    type must be committed.
  • MPI_Type_commit(type)
  • MPI_Type_free(type)
  • Use commit after calling the type constructor,
    but before using the type anywhere else
  • Call free after the type is no longer in use (no
    one actually does this, but it makes computer
    scientists happy...)

12
Vector Example
  • MPI_TYPE_VECTOR function allows creating
    non-contiguous vectors with constant stride

mpi_type_vector(count, blocklen, stride, oldtype,
vtype, ierr)mpi_type_commit(vtype, ierr)
1 6 11 16 2 7 12 17 3 8 13
18 4 9 14 19 5 10 15 20
nrows
ncols
call MPI_Type_vector(ncols, 1, nrows,
MPI_DOUBLE_PRECISION, vtype, ierr) call
MPI_Type_commit(vtype, ierr) call MPI_Send(
A(nrows,1) , 1 , vtype )
13
Dealing with Communicators
  • Many MPI operations deal with all the processes
    in a communicator
  • MPI_COMM_WORLD by default contains every task in
    your MPI job
  • Other communicators can be defined for more
    complex operations for different parts of the
    task, to add topology, to segregate different
    kinds of messaging

14
Communicators and Groups I
  • All MPI communication is relative to a
    communicator which contains a context and a
    group. The group is just a set of processes.

MPI_COMM_WORLD
3
2
4
1
0

MPI_COMM_WORLD
2
0
COMM1
1
0
1
COMM2
15
Communicators and Groups II
  • To subdivide communicators into multiple
    non-overlapping communicators Approach I
  • e.g. to form groups of rows of PEs

MPI_Comm_rank(MPI_COMM_WORLD, rank) myrow
(int)(rank/ncol)
16
MPI_Comm_split
  • Argument 1 communicator to split
  • Argument 2 key, all processes with the same key
    go in the same communicator
  • Argument 3 (optional) value to determine
    ordering in the result communicator
  • Argument 4 result communicator

MPI_Comm_rank(MPI_COMM_WORLD, rank) myrow
(int)(rank/ncol) MPI_Comm_split(MPI_COMM_WORLD,
myrow,rank,row_comm)
17
Topologies and Communicators
  • MPI allows processes to be grouped in logical
    topologies
  • Topologies can aid the programmer
  • Convenient naming methods for processes in a
    group
  • Naming can match communication patterns
  • a standard mechanism for representing common
    algorithmic concepts (i.e. 2D grids)
  • Topologies can aid the runtime environment
  • Better mappings of MPI tasks to hardware nodes
  • (Not really widely used in most implementations
    yet)

18
Topology Mechanics
  • Topologies have the scope of a single
    (intra)communicator
  • Topologies are an optional attribute given to a
    communicator
  • Two topologies are supported
  • Cartesian coordinates (grid)
  • Graph
  • nodes are tasks
  • edges are named communication pathways

19
Cartesian Topologies
  • int MPI_Cart_create(MPI_Comm comm_old, int ndims,
    int dims, int periods, int reorder, MPI_Comm
    comm_cart)
  • MPICartcomm MPIIntracommCreateCart(...)
  • MPI_CART_CREAT(...)
  • comm_old - input communicator
  • ndims - of dimensions in cartesian grid
  • dims - integer array of size ndims specifying the
    number of processes in each dimension
  • periods - true/false specifying whether each
    dimension is periodic
  • reorder - ranks may be reordered or not
  • comm_cart - new communicator containing new
    topology.

20
MPI_DIMS_CREATE
  • A helper function for specifying a likely
    dimension decomposition.
  • int MPI_Dims_create(int nnodes, int ndims, int
    dims)
  • MPI_DIMS_CREATE(NNODES, NDIMS, DIMS, IERROR)
  • void MPICompute_dims(int nnodes,int ndims, int
    dims)
  • nnodes - total nodes in grid
  • ndims - number of dimensions
  • dims - array returned with dimensions
  • Example
  • MPI_Dims_create(6,2,dims) will return (3,2) in
    dims
  • MPI_Dims_create(6,3,dims) will return (3,2,1) in
    dims
  • No rounding or ceiling function provided

21
Cartesian Inquiry Functions
  • MPI_Cartdim_get will return the number of
    dimensions in a cartesian structure
  • int MPI_Cartdim_get(MPI_Comm comm, int ndims)
  • MPI_Cart_get provides information on an existing
    topology
  • Arguments roughly mirror the create call
  • int MPI_Cart_get(MPI_Comm comm,int maxdims, int
    dims, int periods, int coords)
  • maxdims keeps a given communicator from
    overflowing your arguments

22
Cartesian Translator Functions
  • Task IDs in a cartesian coordinate system
    correspond to ranks in a "normal" communicator.
  • point-to-point communication routines
    (send/receive) rely on ranks
  • int MPI_Cart_rank(MPI_Comm comm, int coords, int
    rank)
  • int MPI_Cart_coords(MPI_Comm comm, int rank, int
    maxdims, int coords)
  • Coords - cartesian coordinates
  • rank - ranks

23
Cartesian Shift function
  • int MPI_Cart_Shift(MPI_Comm comm, int direction,
    int disp, int rank_source, int rank_dest)
  • direction - coordinate dimension of shift
  • disp - displacement (can be positive or negative)
  • rank_source and rank_dest are return values
  • use that source and dest to call MPI_Sendrecv

24
Remote Memory Access Windows and Window Objects
  • MPI-2 provides a facility for remote memory
    access in specified regions
  • "one-sided communication" - no need for send and
    receive
  • From what I hear, this may not work well yet,
    but...
  • MPI_Win_create( base, size, disp_unit, info,
    comm, win)
  • exposes memory from base to base(sizesizeof(disp
    _unit) to remote memory access

25
One-Sided Communication Calls
  • MPI_Put() - stores into remote memory
  • MPI_Get() - reads from remote memory
  • MPI_Accumulate() - updates remote memory has an
    op like MPI_Reduce
  • All are non-blocking data transfer is described,
    maybe even initiated, but may continue after call
    returns
  • Subsequent synchronization is needed to make sure
    operations on window object are complete
    MPI_Win_fence()

26
I/O -(Parallel and Otherwise) on Large Scale
Systems Dan Stanzione Arizona State University
27
Parallel I/O in Data Parallel Programs
  • Each task reads a distinct partition of the input
    data and writes a distinct partition of the
    output data.
  • Each task reads its partition in parallel
  • Data is distributed to the slave nodes
  • Each task computes output data from input data
  • Each task writes its partition in parallel

28
What Are All These Names?
  • MPI - Message Passing Interface Standard
  • Also known as MPI-1
  • MPI-2 - Extensions to MPI standard
  • I/O, RDMA, dynamic processes
  • MPI-IO - I/O part of MPI-2 extensions
  • ROMIO - Implementation of MPI-IO
  • Handles mapping MPI-IO calls into communication
    (MPI) and file I/O

29
Filesystems
  • Since each node in a cluster has it's own disk,
    making the same files available on each node can
    be problematic
  • Three filesystem options
  • Local
  • Remote (eg. NFS)
  • Parallel (eg. PVFS, LUSTRE)

30
Filesystems (cont.)
  • Local - Use storage on each node's disk
  • Relatively high performance
  • Each node has different filesystem
  • Shared datafiles must be copied to each node
  • No synchronization
  • Most useful for temporary/scratch files accessed
    only by copy of program running on single node
  • RANGER DOESNT HAVE LOCAL DISKS
  • This trend may continue with other large scale
    systems for reliability reasons
  • Very, very small RAMdisk (300MB)

31
Accessing Local File Systems
  • I/O system calls on compute nodes are executed on
    the compute node
  • File systems on the slave can be made available
    to tasks running there and accessed as on any
    Linux system
  • Recommended programming model does not assume
    that a task will run on a specific node
  • Best used for temporary storage
  • Access permissions may be a problem
  • Very small on newer systems like Ranger

32
Filesystems(cont.)
  • Remote - Share a single disk among all nodes
  • Every node sees same filesystem
  • Synchronization mechanisms manage changes
  • "Traditional" UNIX approach
  • Relatively low performance
  • Doesn't scale well server becomes bottleneck in
    large systems
  • Simplest solution for small clusters,
    reading/writing small files

33
Accessing Network File Systems
  • Network file systems such as NFS and AFS can be
    mounted by slave nodes
  • Provides a shared storage space for home
    directories, parameter files, smaller data files
  • Performance problems can be severe for a very
    large number of nodes (100)
  • Otherwise, works like local file systems

34
Filesystems(cont.)
  • Parallel - Stripe files across multiple disks on
    multiple nodes
  • Relatively high performance
  • Each node sees same filesystem
  • Works best for I/O intensive applications
  • Not a good solution for small files
  • Certain slave nodes are designated I/O nodes,
    local disks used to store pieces of filesystem

35
Accessing Parallel File Systems
  • Distribute file data among many I/O nodes
    (servers), potentially every node in the system
  • Typically not so good for small files, but very
    good for large data files
  • Should provide good performance even for a very
    large degree of sharing
  • Critical for scalability in applications with
    large I/O demands
  • Particularly good for data parallel model

36
Using File Systems
  • Local File Systems
  • EXT3, /tmp
  • Network File Systems
  • NFS, AFS
  • Parallel File Systems
  • PVFS, LUSTRE, IBRIX,Panasas
  • I/O Libraries
  • HDF, NetCDF, Panda

37
Example Application for Parallel I/O
Input
Read
Process
Write
Output
38
Issues in Parallel I/O
  • Physical distribution of data to I/O nodes
    interacts with logical distribution of the I/O
    requests to affect performance
  • Logical record sizes should be considered in
    physical distribution
  • I/O buffer sizes depend on physical distribution
    and number of tasks
  • Performance is best with rather large requests
  • Buffering should be used to get requests of 1MB
    or more, depending on the size of the system

39
I/O Libraries
  • May make I/O simpler for certain applications
  • Multidimensional data sets
  • Special data formats
  • Consistent access to shared data
  • "Out-of-core" computation
  • May hide some details of parallel file systems
  • Partitioning
  • May provide access to special features
  • Caching, buffering, asynchronous I/O, performance

40
MPI-IO
  • Common file operations
  • MPI_File_open()
  • MPI_File_close()
  • MPI_File_read()
  • MPI_File_write()
  • MPI_File_read_at()
  • MPI_File_write_at()
  • MPI_File_read_shared()
  • MPI_File_write_shared()
  • Open, close are collective. The rest have
    collective counterparts add _all

41
MPI_File_open
  • MPI_File_open(
  • MPI_Comm comm,
  • char filename,
  • int amode,
  • MPI_Info info,
  • MPI_File fh)
  • Collective operation on comm
  • amode similar to UNIX file mode a few extra MPI
    possibilities

42
MPI_File_close
  • MPI_File_close(
  • MPI_File fh
  • )

43
File Views
  • File views supported
  • MPI_File_set_view()
  • Essentially, a file view allows you to change
    your program's treatment of a file as simply a
    stream of bytes, to viewing the file as a set of
    MPI_Datatypes and displacements.
  • Arguments to set view are similar to the
    arguments for creating derived datatypes

44
MPI_File_read
  • MPI_File_read(
  • MPI_File fh,
  • void buf,
  • int count,
  • MPI_Datatype datatype,
  • MPI_Status status
  • )

45
MPI_File_read_at
  • MPI_File_read_at(
  • MPI_File fh,
  • MPI_Offset offest,
  • void buf,
  • int count,
  • MPI_Datatype datatype,
  • MPI_Status status
  • )
  • MPI_File_read_at_all() is the collective version

46
Non-Blocking I/O
  • MPI_File_iread()
  • MPI_File_iwrite()
  • MPI_File_iread_at()
  • MPI_File_iwrite_at()
  • MPI_File_iread_shared()
  • MPI_File_iwrite_shared()

47
MPI_File_iread
  • MPI_File_iread(
  • MPI_File fh,
  • void buf,
  • int count,
  • MPI_Datatype datatype,
  • MPI_Request request
  • )
  • Request structure can be queried to determine if
    the operation is complete

48
Collective access
  • The shared routines use a collective file
    pointer
  • Collective routines also provided to allow each
    task to read/write a specific chunk of the file
  • MPI_File_read_ordered(MPI_File fh, void buf, int
    count, MPI_Datatype type, MPI_Status st)
  • MPI_File_write_ordered()
  • MPI_File_seek_shared()
  • MPI_File_read_all()
  • MPI_File_write_all()

49
File Functions
  • MPI_File_delete()
  • MPI_File_set_size()
  • MPI_File_preallocate()
  • MPI_File_get_size()
  • MPI_File_get_group()
  • MPI_File_get_amode()
  • MPI_File_set_info()
  • MPI_File_get_info()

50
ROMIO MPI-IO Implementation
  • Implementation of MPI-2 I/O specification
  • Operates on wide variety of platforms
  • Abstract Device Interface for I/O (ADIO) aids in
    porting to new file systems
  • Fortran and C bindings
  • Successes
  • Adopted by industry (e.g. Compaq, HP, SGI)
  • Used at ASCI sites (e.g. LANL Blue Mountain)

51
Data Staging for Tiled Display
  • Commodity components
  • projectors, PCs
  • Provide very high resolutionvisualization
  • Staging application splitsframes into a tile
    stream foreach visualization node
  • Uses MPI-IO to access data from PVFS file system
  • Streams of tiles are merged into movie files on
    visualization node

52
Splitting Movie Frames into Tiles
  • Hundreds of frames make up a single movie
  • Each frame is stored in its own file in PVFS
  • Frame size is 2532x1408 pixels
  • 3x2 display
  • Tile size is 1024x768 pixels (overlapped)

53
Obtaining Highest Performance
  • To make best use of PVFS
  • Use MPI-IO (ROMIO) for data access
  • Use file views and datatypes
  • Take advantage of collectives
  • Use hints to optimize for your platform
  • Simple, right )?

54
Trivial MPI-IO Example
  • Reading contiguous pieces with MPI-IO calls
  • Simplest, least powerful way to use MPI-IO
  • Easy to port from POSIX calls
  • Lots of I/O operations to get desired data

MPI_File_open(comm, fname, MPI_MODE_RDONLY, MPI_I
NFO_NULL, handle) / read tile data from one
frame / for (row 0 row lt 768 row)
offset rowrow_size tile_offset
header_size MPI_File_read_at(handle, offset,
buffer, 10243, MPI_BYTE, status) MPI_File_
close(handle)
55
Avoiding the VFS Layer
  • UNIX calls go through VFS layer
  • MPI-IO calls use Filesystem library directly
  • Significant performance gain

56
Why Use File Views?
  • Concisely describe noncontiguous regions in a
    file
  • Create datatype describing region
  • Assign view to open file handle
  • Separate description of region from I/O operation
  • Datatype can be reused on subsequent calls
  • Access these regions with a single operation
  • Single MPI read call requests all data
  • Provides opportunity for optimization of access
    in MPI-IO implementation

57
Setting a File View
  • Use MPI_Type_create_subarray() to define a
    datatype describing the data in the file
  • Example for tile access (24-bit data)

MPI_Type_contiguous(3, MPI_BYTE,
rgbtype) frame_size1 2532 / frame width
/ frame_size0 1408 / frame height
/ tile_size1 1024 / tile width
/ tile_size0 768 / tile height / /
create datatype describing tile
/ MPI_Type_create_subarray(2, frame_size,
tile_size, tile_offset, MPI_ORDER_C, rgbtype,
tiletype) MPI_Type_commit(tiletype) MPI_File_
set_view(handle, header_size, rgbtype, tiletype,
native, MPI_INFO_NULL) MPI_File_read(handle,
buffer, buffer_size, rgbtype, status)
58
Noncontiguous Access in ROMIO
  • ROMIO performs data sieving to cut down number
    of I/O operations
  • Uses large reads which grab multiple
    noncontiguous pieces
  • Example, reading tile 1

59
Data Sieving Performance
  • Reduces I/O operations from 4600 to 6
  • 87 effective throughput improvement
  • Reading 3 times as much data as necessary

60
Collective I/O
  • MPI-IO supports collective I/O calls (_all
    suffix)
  • All processes call the same function at once
  • May vary parameters (to access different regions)
  • More fully describe the access pattern as a whole
  • Explicitly define relationship between accesses
  • Allow use of ROMIO aggregation optimizations
  • Flexibility in what processes interact with I/O
    servers
  • Fewer, larger I/O requests

61
Collective I/O Example
  • Single line change

/ create datatype describing tile
/ MPI_Type_create_subarray(2, frame_size,
tile_size, tile_offset, MPI_ORDER_C, rgbtype,
tiletype) MPI_Type_commit(tiletype) MPI_File_
set_view(handle, header_size, rgbtype, tiletype,
native, MPI_INFO_NULL) if 0 MPI_File_read(han
dle, buffer, buffer_size, rgbtype,
status) endif / collective read
/ MPI_File_read_all(handle, buffer, buffer_size,
rgbtype, status)
62
Two-Phase Access
  • ROMIO implements two-phase collective I/O
  • Data is read by clients in contiguous pieces
    (phase 1)
  • Data is redistributed to the correct client
    (phase 2)
  • ROMIO applies two-phase when collective accesses
    overlap between processes
  • More efficent I/O access than data sieving alone

63
Two-Phase Performance
  • Often a big win

64
Hints
  • Controlling PVFS
  • striping_factor - size of strips on I/O servers
  • striping_unit - number of I/O servers to stripe
    across
  • start_iodevice - which I/O server to start with
  • Controlling aggregation
  • cb_config_list - list of aggregators
  • cb_nodes - number of aggregators (upper bound)
  • Tuning ROMIO optimizations
  • romio_cb_read, romio_cb_write - aggregation
    on/off
  • romio_ds_read, romio_ds_write - data sieving
    on/off

65
The Proof is in the Performance
  • Final performance is almost 3 times VFS access!
  • Hints allowed us to turn off two-phase, modify
    striping of data

66
A More Sophisticated I/O Example
  • Dividing a 2D Matrix with Ghost Rows

67
File - on Disk vs. in Memory
Partition on one processor (with ghost rows on
borders)
Full Dataset
68
includeltstdio.hgt includeltmpi.hgt main(int argc,
char argv) int gsizes2,psizes2,lsi
zes2,memsizes2 int
dims2,periods2,coords2,start_indices2
MPI_Comm comm MPI_Datatype filetype,
memtype MPI_File fh MPI_Status status
int local_array1212 int rank, m, n
m20 n30 gsizes0m
gsizes1n psizes02 psizes13
lsizes0 m/psizes0 / Rows in local
array / lsizes1 m/psizes1 / Cols
in local array / dims02 dims13
periods0periods1 1
MPI_Init(argc,argv) MPI_Cart_create(MPI
_COMM_WORLD, 2, dims, periods, 0, comm)
MPI_Comm_rank(comm, rank)
69
MPI_Cart_coords(comm, rank, 2, coords)
start_indices0 coords0 lsizes0
start_indices1 coords1 lsizes1
MPI_Type_create_subarray(2, gsizes, lsizes,
start_indices, MPI_ORDER_C,
MPI_FLOAT, filetype) MPI_Type_commit(fi
letype) MPI_File_open(MPI_COMM_WORLD,
"datafile", MPI_MODE_CREATE
MPI_MODE_WRONLY, MPI_INFO_NULL, fh)
MPI_File_set_view(fh,0, MPI_CHAR, filetype,
"native", MPI_INFO_NULL) memsizes0
lsizes0 2 memsizes1 lsizes1
2 start_indices0 start_indices1
1 MPI_Type_create_subarray(2, memsizes,
lsizes, start_indices, MPI_ORDER_C,
MPI_CHAR, memtype) MPI_Type_commit(me
mtype) MPI_File_write_all(fh,
local_array, 1, memtype, status)
MPI_File_close(fh)
70
Summary Why Use MPI-IO?
  • Better concurrent access model than POSIX one
  • Explicit list of processes accessing concurrently
  • More lax (but still very usable) consistency
    model
  • More descriptive power in interface
  • Derived datatypes for concise, noncontiguous file
    and/or memory regions
  • Collective I/O functions
  • Optimizations built into MPI-IO implementation
  • Noncontiguous access
  • Collective I/O (aggregation)
  • Performance portability

71
Optional, Really Advanced Stuff
  • Dynamic Process Management
  • Intercommunicator communication
  • MPI external connections

72
Creating Communicators
  • int MPI_Comm_dup(MPI_Comm comm, MPI_Comm
    newcomm)
  • MPIIntracomm MPIIntracommDup() const
  • MPIIntercomm MPIIntercommDup() const
  • MPICartcomm MPICartcommDup() const
  • MPIGraph comm MPIGraphcommDup() const
  • Creates an exact copy of the communicator

73
Creating Communicators
  • int MPI_Comm_create(MPI_Comm comm, MPI_Group
    group, MPI_Comm newcomm)
  • MPIIntracomm MPIIntracommcreate(...) const
  • MPIIntercomm MPIIntercommcreate(...) const
  • Creates a new communicator with the contents of
    group
  • Group must be a subset of Comm
  • Int MPI_Comm_split(comm,color,key,newcomm)

74
Destroying Communicators
  • int MPI_Comm_free(MPI_Comm comm)
  • void MPICommFree()
  • Destroys the named communicator

75
Dynamic Process Management
  • Create new processes from running programs (as
    opposed to with MPIrun)
  • MPI_Comm_spawn
  • (for SPMD style programs)
  • MPI_Comm_spawn_multiple
  • (MPMD style programs)
  • Connecting two (or more) applications together
  • MPI_Comm_accept and MPI_Comm_connect
  • Useful in assembling complex distributed
    applications

76
Dynamic Process Management
  • Issues
  • maintaining simplicity, flexibility, and
    correctness
  • interaction with OS, resource manager, and
    process manager
  • connecting independently started processes
  • Spawning new processes is collective, returning
    an intercommunicator
  • Local group is group of spawning processes.
  • Remote group is group of new processes
  • New processes have own MPI_COMM_WORLD
  • MPI_Comm_get_parent lets new processes find
    parent communicator

77
Intercommunicators
  • Contain a local group and a remote group
  • Point-to-point communication is between a process
    in one group and a process in the other
  • Can be merged into a normal communicator
  • Created by MPI_Intercomm_create()

78
Spawning Processes
  • int MPI_Comm_spawn(command, argv, numprocs, info,
    root, comm, intercomm, errcodes)
  • Tries to start numprocs process running command,
    passing them command line arguments argv
  • The operation is collective over comm
  • Spawnees are in remote group of intercomm
  • Errors are reported on a per-process basis in
    errcodes
  • Info used to optionally specify hostname,
    archname, etc...

79
Spawning Multiple Executables
  • MPI_Comm_spawn_multiple(...)
  • Arguments command, argv, numprocs, info all
    become arrays
  • Still Collective

80
Establishing Connections
  • MPI-2 makes it possible for two MPI jobs started
    separately to establish communication
  • e.g. visualizer connecting to simulation
  • Connection results in an intercommunicator
  • Client/server architecture
  • Similar to TCP Sockets programming

81
Establishing Connections
  • Server
  • MPI_Open_port(info, port_name)
  • MPI_Comm_accept(port_name, info,root, comm,
    intercomm)
  • Client
  • MPI_Comm_connect( port_name, info, root, comm,
    intercomm)
  • Optional name service (like normal UNIX)
  • MPI_Publish_name( ... )
  • MPI_Lookup_name( ... )
  • (not sure if name service is implemented)
Write a Comment
User Comments (0)
About PowerShow.com