Title: Message Passing Models
1Message Passing Models
2Overview
- Hardware model
- Programming model
- Message Passing Interface
3Generic Model Of A Message-passing Multicomputer
5
Node
Node
Node
Node
Node
Node
Message-passing
direct network
interconnection
Node
Node
Node
Node
Node
Node
Gyula Fehér
4Generic Node Architecture 5
External
channel
Fat-Node
Node
Node
-powerful processor
-large memory
-many chips
Node-processor
-costly/node
-moderate parallelism
Processor
Local memory
....
Thin-Node
Internal
channel(s)
-small processor
Router
-small memory
External
-one-few chips
channel
Communication
-cheap/node
Processor
-high parallelism
External
Switch unit
....
channel
External
channel
Gyula Fehér
5Generic Organization Model 5
Switching network
PM
PM
CP
CP
S
S
PM
PM
PM
CP
CP
CP
(c) Centralized
(b) Decentralized
Gyula Fehér
6Message Passing Properties 1
- Complete computer as building block, including
I/O - Programming model directly access only private
address space (local memory) - Communication via explicit messages
(send/receive) - Communication integrated at I/O level, not memory
system, so no special hardware - Resembles a network of workstations (which can
actually be used as multiprocessor systems)
7Message Passing Program 1
- Problem Sum all of the elements of an array of
size n. - INITIALIZE //assign proc_num and num_procs
- if (proc_num 0) //processor with a proc_num of
0 is the master, - //which sends out messages and sums the result
-
- read_array(array_to_sum, size) //read the array
and array size from file - size_to_sum size/num_procs
- for (current_proc 1 current_proc lt num_procs
current_proc) -
- lower_ind size_to_sum current_proc
- upper_ind size_to_sum (current_proc 1)
- SEND(current_proc, size_to_sum)
- SEND(current_proc, array_to_sumlower_indupper_in
d) -
- //master nodes sums its part of the array
- sum 0
- for (k 0 k lt size_to_sum k)
- sum array_to_sumk
- global_sum sum
8Message Passing Program (cont.) 1
- Multiprocessor Software Functions Provided
- INITIALIZE assigns a number (proc_num) to each
processor in the system, assigns the total number
of processors (num_procs). - SEND(receiving_processor_number, data) - sends
data to another processor - BARRIER(n_procs) When a BARRIER is encountered,
a processor waits at that BARRIER until n_procs
processors reach the BARRIER, then execution can
proceed.
9Advantages 1
- Advantages
- Easier to build than scalable shared memory
machines - Easy to scale (but topology is important)
- Programming model more removed from basic
hardware operations - Coherency and synchronization is the
responsibility of the user, so the system
designer need not worry about them. - Disadvantages
- Large overhead copying of buffers requires large
data transfers (this will kill the benefits of
multiprocessing, if not kept to a minimum). - Programming is more difficult.
- Blocking nature of SEND/RECEIVE can cause
increased latency and deadlock issues.
10Message-Passing Interface MPI 3
- Standardization - MPI is the only message passing
library which can be considered a standard. It is
supported on virtually all HPC platforms.
Practically, it has replaced all previous message
passing libraries. - Portability - There is no need to modify your
source code when you port your application to a
different platform that supports the MPI
standard. - Performance Opportunities - Vendor
implementations should be able to exploit native
hardware features to optimize performance. - Functionality - Over 115 routines are defined.
- Availability - A variety of implementations are
available, both vendor and public domain.
11MPI basics 3
- Start Processes
- Send Messages
- Receive Messages
- Synchronize
- With these four capabilities, you can construct
any program.
12Communicators 3
- Provide a named set of processes for
communication - System allocated unique tags to processes
- All processes can be numbered from 0 to n-1
- Allow construction of libraries application
creates communicators - MPI_COMM_WORLD
- MPI uses objects called communicators and groups
to define which collection of processes may
communicate with each other. - Provide functions (split, duplicate, ...) for
creating communicators from other communicators - Functions (size, my_rank, ) for finding out
about all processes within a communicator - Blocking vs. non-blocking
13Hello world example 3
- include ltstdio.hgt
- include "mpi.h"
- main(int argc, char argv)
-
- int my_PE_num
- MPI_Init(argc, argv)
- MPI_Comm_rank(MPI_COMM_WORLD, my_PE_num)
- printf("Hello from d.\n", my_PE_num)
- MPI_Finalize()
-
14Hello world example 3
- Hello from 5.
- Hello from 3.
- Hello from 1.
- Hello from 2.
- Hello from 7.
- Hello from 0.
- Hello from 6.
- Hello from 4.
15MPMD 3
- Use MPI_Comm_rank
- if (my_PE_num 0)
- Routine1
- else if (my_PE_num 1)
- Routine2
- else if (my_PE_num 2)
- Routine3 . . .
16Blocking Sending and Receiving Messages 3
- include ltstdio.hgt
- include "mpi.h"
- main(int argc, char argv)
-
- int my_PE_num, numbertoreceive, numbertosend42
- MPI_Status status
- MPI_Init(argc, argv)
- MPI_Comm_rank(MPI_COMM_WORLD, my_PE_num)
- if (my_PE_num0)
-
- MPI_Recv( numbertoreceive, 1, MPI_INT,
MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD,
status) - printf("Number received is d\n",
numbertoreceive) -
- else
- MPI_Send( numbertosend, 1, MPI_INT, 0, 10,
MPI_COMM_WORLD) - MPI_Finalize()
-
17Non-Blocking Message Passing Routines 4
- include "mpi.h"
- include ltstdio.hgt
- int main(int argc, char argv)
-
- int numtasks, rank, next, prev, buf2, tag11,
tag22 - MPI_Request reqs4
- MPI_Status stats4
- MPI_Init(argc,argv)
- MPI_Comm_size(MPI_COMM_WORLD, numtasks)
- MPI_Comm_rank(MPI_COMM_WORLD, rank)
- prev rank-1 next rank1
- if (rank 0) prev numtasks - 1
- if (rank (numtasks - 1)) next 0
- MPI_Irecv(buf0, 1, MPI_INT, prev, tag1,
MPI_COMM_WORLD, reqs0) - MPI_Irecv(buf1, 1, MPI_INT, next, tag2,
MPI_COMM_WORLD, reqs1)
18Collective Communications 3
- The Communicator specifies a process group to
participate in a collective communication - MPI implements various optimized functions
- Barrier synchronization
- Broadcast
- Reduction operations
- with one destination or all in group destination
- Collective operations are blocking
19Comparison MPI vs. OpenMP
Features OpenMP MPI
Apply parallelism in steps yes no
Scale to large number of processors maybe yes
Code complexity Small increase Major increase
Runtime environment Expensive compilers Free
Cost of hardware Very expensive Cheap
20References
- J. Kowalczyk, Multiprocessor Systems, Xilinx,
2003. - D. Culler, J. P. Singh, Parallel Computer
Architectures, A Hardware/Software Approach,
Morgan Kaufman, 1999. - MPI Basics
- Message Passing Interface (MPI)
- D. Sima, T. Fountain and P. Kascuk, Advanced
Computer Architectures A Design Space Approach,
Pearson, 1997.