Title: Collective Communications
1Collective Communications
- Paul Tymann
- Computer Science Department
- Rochester Institute of Technology
- ptt_at_cs.rit.edu
2Collective Communications
- There are certain communication patterns that
appear in many different types of applications - MPI provides routines that implement these
patterns - Barrier synchronization
- Broadcast from one member to all other members
- Gather data from an array spread across
processors into one array - Scatter data from one member to all members
- All-to-all exchange of data
- Global reduction (e.g., sum, min of "common" data
elements) - Scan across all members of a communicator
3Characteristics
- MPI collective communication routines differ in
many ways from MPI point-to-point communication
routines - Involve coordinated communication within a group
of processes identified by an MPI communicator - Substitute for a more complex sequence of
point-to-point calls - All routines block until they are locally
complete - Communications may, or may not, be synchronized
(implementation dependent) - In some cases, a root process originates or
receives all data - Amount of data sent must exactly match amount of
data specified by receiver - Many variations to basic categories
- No message tags are needed
- MPI collective communication can be divided into
three subsets synchronization, data movement,
and global computation.
4Data Movement
- MPI provides three types of collective data
movement routines - broadcast
- gather
- scatter
- allgather
- alltoall
- Let's take a look at the functionality and syntax
of these routines.
5Broadcast
- Exactly what it says
- Implementation will do what ever is most
efficient for the given hardware (might use a
reduction) - MPI_Bcast( buffer, count, datatype, root,
communicator ) - What might catch you by surprise is that the
receiving process calls MPI_Bcast() as well - I often use broadcast to distribute parameters
6Mandelbrot
The Mandelbrot set is a connected set of points
in the complex plane. Pick a point z0 in the
complex plane. Calculatez1 z02 z0z2 z12
z0z3 z22 z0. . . If the sequence z0 , z1
, z2 , z3 , ... remains within a distance of 2 of
the origin forever, then the point z0 is said to
be in the Mandelbrot set. If the sequence
diverges from the origin, then the point is not
in the set.
7Parallel Mandelbrot
- Can be done using farmer/worker since the
calculation of each pixel in the picture is
independent of any other pixel value. - We need to distribute a number of parameters to
each of the processors - The size of the window
- Location of the center
- Width
- Maximum number of iterations
8The Farmer
void manager( int numProcs, char host )
double msg WORK_SIZE int maxMessageSize
( WINDOW_SIZE / ( numProcs - 1 )
WINDOW_SIZE ( numProcs -1 ) )
WINDOW_SIZE int result maxMessageSize
int i MPI_Status status int count
msg _PIXELS WINDOW_SIZE msg _X
X_CENTER - ( WIDTH / 2.0 ) msg _Y
Y_CENTER - ( WIDTH / 2.0 ) msg _WIDTH
_WIDTH msg _ITERS ITERATIONS
9The Farmer
MPI_Bcast( msg, WORK_SIZE,
MPI_DOUBLE,
0, MPI_COMM_WORLD ) for
( i 0 i lt numProcs - 1 i i 1 )
MPI_Recv( ) // Parameters omitted
MPI_Get_count( status, MPI_INT, count )
drawTile( win, WINDOW_SIZE, numProcs,
status.MPI_SOURCE, result )
10A Worker
void worker( int myRank, int numProcs, char
host ) double msg WORK_SIZE int
result MPI_Status status double x
double pointsPerPixel int colStart int
numCols / Obtain parameters from manager
/ MPI_Bcast( msg, WORK_SIZE,
MPI_DOUBLE, 0,
MPI_COMM_WORLD ) / Rest of the
program has been omitted /
11MPE
- MPI Parallel Environment (MPE) is a software
package that contains a number of useful tools - Profiling Library
- Viewers for logfiles
- Parallel X Graphics library
- Debugger setup routines
- MPE is not part of the SUN HPC package, but it
works with it. I have compiled and installed it
in my account
12X Routines
- MPE_Open_graphics - (collectively) opens an X
Windows display - MPE_Draw_point - Draws a point on an X Windows
display - MPE_Draw_points - Draws points on an X Windows
display - MPE_Draw_line - Draws a line on an X11 display
- MPE_Fill_rectangle - Draws a filled rectangle on
an X11 display - MPE_Update - Updates an X11 display
- MPE_Close_graphics - Closes a X11 graphics device
- MPE_Xerror( returnVal, functionName )
- MPE_Make_color_array - Makes an array of color
indices MPE_Num_colors - Gets the number of
available colors MPE_Draw_circle - Draws a circle
- MPE_Draw_logic - Sets logical operation for
laying down new pixels - MPE_Line_thickness - set thickness of lines
- MPE_Add_RGB_color( graph, red, green, blue,
mapping ) - MPE_Get_mouse_press - Waits for mouse button
press - MPE_Iget_mouse_press - Checks for mouse button
press - MPE_Get_drag_region - get rubber-band box''
region (or circle
13Using X Routines
include mpe.h include mpe_graphics.h MPE_XG
raph win MPE_Open_graphics( win,
// Display handle
MPI_COMM_SELF, // Communicator
(char )0, // X Display
-1, -1, //
Location on screen 500, 500,
// Size
MPE_GRAPH_INDEPENDENT ) // Collective MPE_Draw_p
oint( win, // Display
handle col,
// Coordinate of row,
// the point color
) // Color to
use MPE_Close_graphics( win )
// Display handle
14Compiling
- A little more to compiling
- mpcc -I/home/fac/ptt/pub/mpe/include
-L/home/fac/ptt/mpe/lib -o mandel
Mandelbrot.c -lmpi -lm -lmpe
-lX - Run it the same way
15Reduce
- MPI provides functions that perform standard
reductions across processorsint MPI_Reduce(
void operand, - void result, int count,
MPI_Datatype type, MPI_Op operator,
int root MPI_Comm comm )
Operation Name Meaning
MPI_MAX Maximum
MPI_MIN Minimum
MPI_SUM Sum
MPI_PROD Product
MPI_LAND Logical and
MPI_BAND Bitwise and
MPI_LOR Logical or
MPI_BOR Bitwise or
MPI_LXOR Logical xor
MPI_BXOR Bitwise xor
MPI_MAXLOC Max and location
MPI_MINLOC Min and location
16Dot Product
- The dot product of two vectors is defined as
- x y x0y0 x1y1 x2y2 xn-1yn-1
- Imagine having two vectors each containing n
elements stored on p processors - Each processor will have N n/p elements
- Lets assume a block distribution of data meaning
- P0 has x0, x1, , xN-1 and y0, y1, yN-1
- P1 has xN, xN1, , x2N-1 and yN, yN1, y2N-1
17Serial_dot
float Serial_dot( float x, float y, int n )
int i float sum 0.0 for ( i 0 i
lt n i ) sum sum x i y i
return sum
18Parallel_dot
float Parallel_dot( float local_x,
float local_y, int
local_n ) float local_dot float dot
0.0 local_dot Serial_dot( local_x, local_y,
local_n ) MPI_Reduce( local_dot,
dot, 1,
MPI_FLOAT, MPI_SUM,
0, MPI_COMM_WORLD ) return
dot / only process 0 will have result /
19MPI_Allreduce
- Note that MPI_Reduce() leaves the result in the
root processor - What if you wanted the result everywhere?
- You could reduce and broadcast
- Consider the modified reduction