Collective Communications - PowerPoint PPT Presentation

About This Presentation

Title:

Collective Communications

Description:

Collective Communications Paul Tymann Computer Science Department Rochester Institute of Technology ptt_at_cs.rit.edu Collective Communications There are certain ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 20

Provided by: Computer84

Learn more at: https://www.cs.rit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Collective Communications

1
Collective Communications

Paul Tymann
Computer Science Department
Rochester Institute of Technology
ptt_at_cs.rit.edu

2
Collective Communications

There are certain communication patterns that
appear in many different types of applications
MPI provides routines that implement these
patterns
Barrier synchronization
Broadcast from one member to all other members
Gather data from an array spread across
processors into one array
Scatter data from one member to all members
All-to-all exchange of data
Global reduction (e.g., sum, min of "common" data
elements)
Scan across all members of a communicator

3
Characteristics

MPI collective communication routines differ in
many ways from MPI point-to-point communication
routines
Involve coordinated communication within a group
of processes identified by an MPI communicator
Substitute for a more complex sequence of
point-to-point calls
All routines block until they are locally
complete
Communications may, or may not, be synchronized
(implementation dependent)
In some cases, a root process originates or
receives all data
Amount of data sent must exactly match amount of
data specified by receiver
Many variations to basic categories
No message tags are needed
MPI collective communication can be divided into
three subsets synchronization, data movement,
and global computation.

4
Data Movement

MPI provides three types of collective data
movement routines
broadcast
gather
scatter
allgather
alltoall
Let's take a look at the functionality and syntax
of these routines.

5
Broadcast

Exactly what it says
Implementation will do what ever is most
efficient for the given hardware (might use a
reduction)
MPI_Bcast( buffer, count, datatype, root,
communicator )
What might catch you by surprise is that the
receiving process calls MPI_Bcast() as well
I often use broadcast to distribute parameters

6
Mandelbrot
The Mandelbrot set is a connected set of points
in the complex plane. Pick a point z0 in the
complex plane. Calculatez1 z02 z0z2 z12
z0z3 z22 z0. . . If the sequence z0 , z1
, z2 , z3 , ... remains within a distance of 2 of
the origin forever, then the point z0 is said to
be in the Mandelbrot set. If the sequence
diverges from the origin, then the point is not
in the set.
7
Parallel Mandelbrot

Can be done using farmer/worker since the
calculation of each pixel in the picture is
independent of any other pixel value.
We need to distribute a number of parameters to
each of the processors
The size of the window
Location of the center
Width
Maximum number of iterations

8
The Farmer
void manager( int numProcs, char host )
double msg WORK_SIZE int maxMessageSize
( WINDOW_SIZE / ( numProcs - 1 )
WINDOW_SIZE ( numProcs -1 ) )
WINDOW_SIZE int result maxMessageSize
int i MPI_Status status int count
msg _PIXELS WINDOW_SIZE msg _X
X_CENTER - ( WIDTH / 2.0 ) msg _Y
Y_CENTER - ( WIDTH / 2.0 ) msg _WIDTH
_WIDTH msg _ITERS ITERATIONS
9
The Farmer
MPI_Bcast( msg, WORK_SIZE,
MPI_DOUBLE,
0, MPI_COMM_WORLD ) for
( i 0 i lt numProcs - 1 i i 1 )
MPI_Recv( ) // Parameters omitted
MPI_Get_count( status, MPI_INT, count )
drawTile( win, WINDOW_SIZE, numProcs,
status.MPI_SOURCE, result )
10
A Worker
void worker( int myRank, int numProcs, char
host ) double msg WORK_SIZE int
result MPI_Status status double x
double pointsPerPixel int colStart int
numCols / Obtain parameters from manager
/ MPI_Bcast( msg, WORK_SIZE,
MPI_DOUBLE, 0,
MPI_COMM_WORLD ) / Rest of the
program has been omitted /
11
MPE

MPI Parallel Environment (MPE) is a software
package that contains a number of useful tools
Profiling Library
Viewers for logfiles
Parallel X Graphics library
Debugger setup routines
MPE is not part of the SUN HPC package, but it
works with it. I have compiled and installed it
in my account

12
X Routines

MPE_Open_graphics - (collectively) opens an X
Windows display
MPE_Draw_point - Draws a point on an X Windows
display
MPE_Draw_points - Draws points on an X Windows
display
MPE_Draw_line - Draws a line on an X11 display
MPE_Fill_rectangle - Draws a filled rectangle on
an X11 display
MPE_Update - Updates an X11 display
MPE_Close_graphics - Closes a X11 graphics device
MPE_Xerror( returnVal, functionName )
MPE_Make_color_array - Makes an array of color
indices MPE_Num_colors - Gets the number of
available colors MPE_Draw_circle - Draws a circle
MPE_Draw_logic - Sets logical operation for
laying down new pixels
MPE_Line_thickness - set thickness of lines
MPE_Add_RGB_color( graph, red, green, blue,
mapping )
MPE_Get_mouse_press - Waits for mouse button
press
MPE_Iget_mouse_press - Checks for mouse button
press
MPE_Get_drag_region - get rubber-band box''
region (or circle

13
Using X Routines
include mpe.h include mpe_graphics.h MPE_XG
raph win MPE_Open_graphics( win,
// Display handle
MPI_COMM_SELF, // Communicator
(char )0, // X Display
-1, -1, //
Location on screen 500, 500,
// Size
MPE_GRAPH_INDEPENDENT ) // Collective MPE_Draw_p
oint( win, // Display
handle col,
// Coordinate of row,
// the point color
) // Color to
use MPE_Close_graphics( win )
// Display handle
14
Compiling

A little more to compiling
mpcc -I/home/fac/ptt/pub/mpe/include
-L/home/fac/ptt/mpe/lib -o mandel
Mandelbrot.c -lmpi -lm -lmpe
-lX
Run it the same way

15
Reduce

MPI provides functions that perform standard
reductions across processorsint MPI_Reduce(
void operand,
void result, int count,
MPI_Datatype type, MPI_Op operator,
int root MPI_Comm comm )

Operation Name Meaning
MPI_MAX Maximum
MPI_MIN Minimum
MPI_SUM Sum
MPI_PROD Product
MPI_LAND Logical and
MPI_BAND Bitwise and
MPI_LOR Logical or
MPI_BOR Bitwise or
MPI_LXOR Logical xor
MPI_BXOR Bitwise xor
MPI_MAXLOC Max and location
MPI_MINLOC Min and location
16
Dot Product

The dot product of two vectors is defined as
x y x0y0 x1y1 x2y2 xn-1yn-1
Imagine having two vectors each containing n
elements stored on p processors
Each processor will have N n/p elements
Lets assume a block distribution of data meaning
P0 has x0, x1, , xN-1 and y0, y1, yN-1
P1 has xN, xN1, , x2N-1 and yN, yN1, y2N-1

17
Serial_dot
float Serial_dot( float x, float y, int n )
int i float sum 0.0 for ( i 0 i
lt n i ) sum sum x i y i
return sum
18
Parallel_dot
float Parallel_dot( float local_x,
float local_y, int
local_n ) float local_dot float dot
0.0 local_dot Serial_dot( local_x, local_y,
local_n ) MPI_Reduce( local_dot,
dot, 1,
MPI_FLOAT, MPI_SUM,
0, MPI_COMM_WORLD ) return
dot / only process 0 will have result /
19
MPI_Allreduce