Title: CS 584
1CS 584
2Sorting
- One of the most common operations
- Definition
- Arrange an unordered collection of elements into
a monotonically increasing or decreasing order. - Two categories of sorting
- internal (fits in memory)
- external (uses auxiliary storage)
3Sorting Algorithms
- Comparison based
- compare-exchange
- O(n log n)
- Noncomparison based
- Uses known properties of the elements
- O(n) - bucket sort etc.
4Parallel Sorting Issues
- Input and Output sequence storage
- Where?
- Local to one processor or distributed
- Comparisons
- How compare elements on different nodes
- of elements per processor
- One (compare-exchange --gt comm.)
- Multiple (compare-split --gt comm.)
5Compare-Exchange
6Compare-Split
7Sorting Networks
- Specialized hardware for sorting
- based on comparator
x y x y
maxx,y minx,y minx,y maxx,y
8Sorting Network
9Parallel Sorting Algorithms
- Merge Sort
- Quick Sort
- Bitonic Sort
- Others
10Merge Sort
- Simplest parallel sorting algorithm?
- Steps
- Distribute the elements
- Everybody sort their own sequence
- Merge the lists
- Problem
- How to merge the lists
11Quicksort
- Simple, low overhead
- O(n log n)
- Divide and conquer
- Divide recursively into smaller subsequences.
12Quicksort
- n elements stored in A1n
- Divide
- Divide a sequence into two parts
- Aqr becomes Aqs and As1r
- make all elements of Aqs smaller than or equal
to all elements of As1r - Conquer
- Recursively apply Quicksort
13Quicksort
- Partition the sequence Aqr by picking a pivot.
- Performance is greatly affected by the choice of
the pivot. - If we pick a bad pivot, we end up with a O(n2)
algorithm.
14Parallelizing Quicksort
- Task parallelism
- At each step of the algorithm 2 recursive calls
are made. - Farm out one of the recursive calls to another
processor. - Problems
- The work of partitioning is done by one
processor.
15Parallelizing Quicksort
- Consider domain decomposition.
- Hypercube
- a d dimensional hypercube can be split into two
(d-1) dimensional hypercubes such that each
processor in one cube is connected to one in the
other cube. - If all processors know the pivot, neighbors split
their respective lists and all elements larger
than the pivot are distributed to one subcube and
smaller elements are distributed to the other
subcube
16(No Transcript)
17Parallelizing Quicksort
- After we go through each dimension, if ngtp the
numbers are not totally sorted. - Why?
- Each processor then sorts their own sublist using
a sequential quicksort. - Pivot selection is particularly important
- Bad pivots eliminate some processors
18Pivot Selection
- Random selection
- During the ith split one of the processors in
each subcube picks a random element from its list
and broadcasts to others. - Problem
- What if a bad pivot is selected at first?
19Pivot Selection
- Median selection
- If the distribution is uniform then each
processor's list is a representative sample thus
the median is representative - Problem
- Is the distribution really uniform?
- Can we assume that a single processor's list has
the same distribution as the full list?
20Procedure HypercubeQuickSort(B) sort B using
sequential quicksort for I 1 to d Select
pivot and broadcast or receive pivot
partition B into B1 and B2 such that B1lt pivot lt
B2 if ith bit of iproc is zero then send B2
to neighbor along ith dimension C subsequence
received along ith dimension Merge B1
and C into B else send B2 to neighbor along
C subsequence received along ith
dimension Merge B2 and C into B
endif endfor
21Analysis
- Iterations log2p
- Select a pivot O(n)
- keep sublist sorted
- Broadcast pivot O(log2p)
- Split the sequence
- split own sequence O(log n/p)
- exchange blocks with neighbor O(n/p)
- merge blocks O(n/p)
22Analysis
- Quicksort appears very scalable
- Depends heavily on the pivot
- Easy to parallelize
- Hypercube sorting algorithms depend on the
ability to map a hypercube onto the node
communication architecture.
23Bitonic Sort
- Key operation
- rearrange a bitonic sequence to ordered
- Bitonic Sequence
- sequence of elements lta0, a1, , an-1gt
- There exists i such that lta0, ,aigt is
monotonically increasing and ltai1, , an-1gt is
monotonically decreasing or - There exists a cyclic shift of indicies such that
the above is satisfied.
24Bitonic Sequences
- lt1, 2, 4, 7, 6, 0gt
- First it increases then decreases
- i 3
- lt8, 9, 2, 1, 0, 4gt
- Consider a cyclic shift
- i will equal 2 or 3
25Rearranging a Bitonic Sequence
- Let s lta0, a1, , an-1gt
- an/2 is the beginning of the decreasing seq.
- Let s1 ltmina0, an/2, mina1, an/2
1minan/2-1,an-1gt - Let s2ltmaxa0, an/2, maxa1,an/21
maxan/2-1,an-1 gt - In sequence s1 there is an element bi minai,
an/2i - all elements before bi are from increasing
- all elements after bi are from decreasing
- Sequence s2 has a similar point
- Sequences s1 and s2 are bitonic
26Rearranging a Bitonic Sequence
- Every element of s1 is smaller than every element
of s2 - Thus, we have reduced the problem of rearranging
a bitonic sequence of size n to rearranging two
bitonic sequences of size n/2 then concatenating
the sequences.
27Rearranging a Bitonic Sequence
28Bitonic Merging Network
29What about unordered lists?
- To use the bitonic merge for n items, we must
first have a bitonic sequence of n items. - Two elements form a bitonic sequence
- Any unsorted sequence is a concatenation of
bitonic sequences of size 2 - Merge those into larger bitonic sequences until
we end up with a bitonic sequence of size n
30Creating a Bitonic Sequence
31Mapping onto a hypercube
- One element per processor
- Start with the sorting network maps
- Each wire represents a processor
- Map processors to wires to minimize the distance
traveled during exchange
32Bitonic Merge on Hypercube
33Bitonic Sort
Procedure BitonicSort for i 0 to d -1 for
j i downto 0 if (i 1)st bit of iproc ltgt
jth bit of iproc comp_exchange_max(j,
item) else comp_exchange_min(j,
item) endif endfor endfor comp_exchange_max
and comp_exchange_min compare and exchange the
item with the neighbor on the jth dimension
34Bitonic Sort Stages
35Assignment
- Pick 16 random integers
- Draw the Bitonic Sort network
- Step through the Bitonic sort network to produce
a sorted list of integers. - Explain how the if statement in the Bitonic sort
algorithm works.