Sorting on 2D Mesh - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Sorting on 2D Mesh

Description:

A 2D torus has long wraparound links that can slow down inter-processor ... However, we can use the method of folding to lay out a 2D torus in such a way ... – PowerPoint PPT presentation

Number of Views:680
Avg rating:3.0/5.0
Slides: 19
Provided by: spe9
Category:
Tags: algo | mesh | sorting

less

Transcript and Presenter's Notes

Title: Sorting on 2D Mesh


1
Sorting on 2D Mesh
by Shietung Peng
2
Mesh-connected Computers
  • The following figure shows the basic 2D mesh
    architecture. Each processor, other than the ones
    located on the boundary, has degree 4. The free
    links of the boundary processors can be used for
    input/output or to establish row and column
    wraparound connections to form a 2D torus,

3
Mesh-connected Computers
  • A 2D torus has long wraparound links that can
    slow down inter-processor communication or
    produce some bad side-effects. However, we can
    use the method of folding to lay out a 2D torus
    in such a way that only short, local links are
    used. The figure below shows a 5-by-5 torus
    folded along its columns.

4
Mesh-connected Computers
  • The four neighbors of a node are referred to as
    north, east, west, and south, leading to the name
    NEWS mesh.
  • Various control schemes (MIMD, SPMD, and MIMD)
    are possible. SIMD mesh is the default model
    assumed in the lecture.
  • There are 3 sub-model SIMD mesh depending on the
    inter-processor communication
  • All processors must communicate with a neighbor
    in the same direction (there is never contention
    for the use of a link).
  • Each processor can send a message to only one
    neighbor at each step, but the neighbor is
    determined locally based on data-dependent
    conditions.
  • Allow transmission and reception to/from all
    neighbors at once.

5
Mesh-connected Computers
  • Processors in a 2D mesh can be indexed in a
    variety of ways. Besides the common row and
    column indices, numbering the processors from 0
    to p -1 (linear order) is convenient at times.
    Some possible linear indexing schemes are showed
    below.

6
The Shear-sort Algorithm
  • The simplest shear-sort consists is depicted in
    the following figure.
  • The time complexity of the algorithm is lgr(p/r
    r) p/r for a r rows mesh.

7
The Shear-sort Algorithm
  • To prove the correctness of the algorithm, it
    suffices to show that the algorithm is correct
    for 0/1 inputs (the 0 -1 principle).
  • The proof contains three parts.
  • Part 1 A pair of dirty rows (containing both 0s
    and 1s) create at least one clean row (containing
    only 0s or 1s) in each iteration.

8
The Shear-sort Algorithm
  • Part 2 The number of dirty rows halves with
    each iteration.
  • Part 3 After lg r iterations, at most one dirty
    row remains, This dirty row will be put in its
    proper sorted order by the last sort by rows.

9
The Shear-sort Algorithm
  • An example of shear-sort
  • on a 4-by-4 mesh.

10
The Optimized Shear-sort
  • It is possible to speed up shearsort by a
    constant factor by taking advantage of the
    reduction of the number of dirty rows in each
    iteration. Note that in sorting a sequence of 0s
    and 1s on a linear array, the number of odd-even
    transposition steps can be limited to the number
    of dirty elements that are not already in their
    proper places. For example, sorting the sequence
    000001011111 requires no more than two odd-even
    transposition steps. Therefore, we can replace
    the complete column sorts within the shearsort
    algorithm with successively fewer odd-even
    transposition steps. When r is a power of 2, the
    time complexity of this optimized shearsort is

11
Extension of the Simple Shear-sort
  • The extended algorithm for multiple keys works as
    follows
  • Sort the sub-lists of size n/p within the
    processors.
  • Row and column sort as in the simple shear-sort,
    except that each compare-exchange step is
    replaced by a merge-split step.
  • An example of shear-sort
  • with n32 on a 4-by-4 mesh.

12
The First Recursive Algorithm
  • Sorting on a square mesh based on 4-way
    divide-and-conquer strategy

13
The First Recursive Algorithm
  • The key for the proof of the correctness of the
    algorithm is that the snakelike row sort will
    spread the elements of the clean rows roughly
    evenly in the left and right halves of the array.
    Hence, after Phase 3, we end up with the total
    number of dirty rows at most 4 (see the following
    figure). It is not difficult to prove that

14
The Second Recursive Algorithm
  • The running time of the first recursive algorithm
    is as follows
  • where for row sort (each half
    being sorted), for column sort, and
    for the last phase.
  • The second recursive algorithm tries to improve
    the first one by shuffling the columns to reduce
    the number of dirty rows to 2 at the end of Phase
    3.

15
The Second Recursive Algorithm
  • The second recursive algorithm is depicted as
    follows

16
The Second Recursive Algorithm
  • The proof of the correctness of the second
    recursive algorithm is depicted by the following
    figure. The key is that the number of 0s in two
    different double-columns differ by at most 2 (the
    details is left as an easy exercise).

17
The Lower Bound for Sorting on 2D Mesh
  • The running time of the second recursive
    algorithm is as follows
  • The trivial lower bound for sorting on 2D mesh
    equals to the diameter of the 2D mesh.
  • A nontrivial lower bound for sorting in snakelike
    row-major order is
  • There is a sorting algorithm, called
    Schnorr-Shamir algorithm, that matches this lower
    bound asymptotically in that its running time is

18
Exercise 5
  • Show that in the first recursive sorting
    algorithm, replacing Phase 4 by row sorts and
    partial column sorts, as in optimized shearsort,
    would lead to a more efficient algorithm. Provide
    complexity analysis for the improved version of
    the algorithm.
  • Show that on an r-by-2 mesh, shearsort requires
    only 3r/23 steps. Then use this result to
    improve the performance of the second recursive
    sorting algorithm, providing the complexity
    analysis for the improved version.
Write a Comment
User Comments (0)
About PowerShow.com