Title: CS 584
1CS 584
2Algorithm Analysis Assumptions
- Consider ring, mesh, and hypercube.
- Each process can either send or receive a single
message at a time. - No special communication hardware.
- When discussing a mesh architecture we will
consider a square toroidal mesh. - Latency is ts and Bandwidth is tw
3Basic Algorithms
- Broadcast Algorithms
- one to all (scatter)
- all to one (gather)
- all to all
- Reduction
- all to one
- all to all
4Broadcast (ring)
- Distribute a message of size m to all nodes.
source
5Broadcast (ring)
- Distribute a message of size m to all nodes.
- Start the message both ways
4
3
2
source
1
4
3
2
T (ts twm)(p/2)
6Broadcast (mesh)
7Broadcast (mesh)
Broadcast to source row using ring algorithm
8Broadcast (mesh)
Broadcast to source row using ring algorithm
Broadcast to the rest using ring algorithm
from the source row
9Broadcast (mesh)
Broadcast to source row using ring algorithm
Broadcast to the rest using ring algorithm
from the source row
T 2(ts twm)(p1/2/2)
10Broadcast (hypercube)
11Broadcast (hypercube)
3
3
2
3
2
1
3
A message is sent along each dimension of the
hypercube. Parallelism grows as a binary tree.
12Broadcast (hypercube)
3
3
T (ts twm)log2 p
2
3
2
1
3
A message is sent along each dimension of the
hypercube. Parallelism grows as a binary tree.
13Broadcast
- Mesh algorithm was based on embedding rings in
the mesh. - Can we do better on the mesh?
- Can we embed a tree in a mesh?
- Exercise for the reader. (- hint, hint -)
14Other Broadcasts
- Many algorithms for all-to-one and all-to-all
communication are simply reversals and duals of
the one-to-all broadcast. - Examples
- All-to-one
- Reverse the algorithm and concatenate
- All-to-all
- Butterfly and concatenate
15Reduction Algorithms
- Reduce or combine a set of values on each
processor to a single set. - Summation
- Max/Min
- Many reduction algorithms simply use the
all-to-one broadcast algorithm. - Operation is performed at each node.
16Reduction
- If the goal is to have only one processor with
the answer, use broadcast algorithms. - If all must know, use butterfly.
- Reduces algorithm from 2log p to log p
17How'd they do that?
- Broadcast and Reduction algorithms are based on
Gray code numbering of nodes. - Consider a hypercube.
Neighboring nodes differ by only one bit
location.
18How'd they do that?
- Start with most significant bit.
- Flip the bit and send to that processor
- Proceed with the next most significant bit
- Continue until all bits have been used.
19Procedure SingleNodeAccum(d, my_id, m, X, sum)
for j 0 to m-1 sumj Xj mask 0
for i 0 to d-1 if ((my_id AND mask) 0)
if ((my_id AND 2i) ltgt 0 msg_dest my_id XOR
2i send(sum, msg_dest) else msg_src
my_id XOR 2i recv(sum, msg_src) for j 0 to
m-1 sumj Xj endif endif mask
mask XOR 2i endfor end