Exercise 1 - PowerPoint PPT Presentation

1 / 6
About This Presentation
Title:

Exercise 1

Description:

Describe how to implement communication procedure Row2All(i) in a cxd torus: ... at the end each node of the torus should receive all messages from row i ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:5.0/5.0
Slides: 7
Provided by: Pao3
Category:
Tags: exercise | torus

less

Transcript and Presenter's Notes

Title: Exercise 1


1
Exercise 1
  • Row2All broadcast
  • Describe how to implement communication procedure
    Row2All(i) in a cxd torus
  • each node of row i contains a message
  • at the end each node of the torus should receive
    all messages from row i
  • derive the time complexity of your approach(es)
  • can you prove that your solution is bandwidth
    optimal?
  • what happens if d (or c) is very large with
    respect to another one?

2
Exercise 1
  • Row2All broadcast
  • Describe how to implement communication procedure
    Row2All(i) in a cxd torus
  • each node of row i contains a message
  • at the end each node of the torus should receive
    all messages from row i
  • derive the time complexity of your approach(es)
  • can you prove that your solution is bandwidth
    optimal?
  • what happens if d (or c) is very large with
    respect to another one?

3
Exercise 2
All2All broadcast on a tree Given a balanced
binary tree, describe a procedure to perform
all2all broadcast that takes time (tstwmp/2)log
p for m-word messages on p nodes. Assume that
only the leaves of the tree contain nodes, and
that an exchange of two m-word messages any two
nodes connected by bidirectional channels takes
time tstwmpk if the channel (or a part of it) is
shared by k simultaneous messages.
4
Exercise 3
  • All-reduce operation in ring
  • Consider the all-reduce operation in which each
    processor starts with an array of m words, and
    needs to get the result sum of the respective
    words in the array at each processor. This
    operation can be implemented on a n x n torus
    using one of the following three alternatives
  • all2all broadcast of all the arrays followed by
    a local computation of the sum of the respective
    elements of the array
  • single node accumulation of all the arrays,
    followed by one2all broadcast of the result array
  • an algorithm that uses the pattern of all2all
    broadcast, but simply adds numbers rather then
    concatenating messages
  • For each of the above cases, compute the run
    time in terms of m, ts and tw.
  • Assume that ts100, tw1 and m is very large.
    Which of the three alternatives is better?
  • Assume that ts100, tw1 and m is very small
    (say 1). Which of the three alternatives is
    better?

5
Exercise 4
How to do prefix sums with p processors Describe
an algorithm for computing prefix sums in an
n-node array distributed among p
processors. Evaluate speedup and efficiency of
your solution. What is the isoefficiency function
of your solution?
6
Exercise 5
  • Simplified bucket sort
  • Consider a simplified version of bucket-sort. You
    are given an array A of n random integers in the
    range 1..r as input. The output data consists
    of r buckets, such that at the end of the
    algorithm, bucket i contains indices of all
    elements of A that are equal to i.
  • describe a decomposition based on partitioning
    the input data (array A) and how would it work
  • describe a decomposition based on partitioning
    the output data and how would the resulting
    algorithm work
  • evaluate speedup and efficiency of these
    approaches
  • derive the isoefficiency function for each
    approach
Write a Comment
User Comments (0)
About PowerShow.com