Title: Problem Solving Strategies
1Problem Solving Strategies
- Partitioning
- Divide the problem into disjoint parts
- Compute each part separately
- Divide and Conquer
- Divide Phase Recursively create sub-problems of
the same type - Base case Reached Execute an algorithm
- Conquer phase Merge the results as the recursion
unwinds - Traditional Example Merge Sort
- Where is the work?
- Partitioning Creating disjoint parts of the
problem - Divide and Conquer Merging the separate results
- Traditional Example Quick Sort
2Parallel Sorting Considerations
- Distributed memory
- Distributed system precision differences can
cause unpredictable results - Traditional algorithms can require excessive
communication - Modified algorithms minimize communication
requirements - Typically, data is scattered to the P processors
- Shared Memory
- Critical sections and Mutual Exclusion locks can
inhibit performance - Modified algorithms eliminate the need for locks
- Each processor can sort N/P data points or they
can work in parallel in a more fine grain manner
(no need for processor communication).
3Two Related Sorts
- void bubble(char x, int N)
- int sorted0, i, sizeN-1
- char temp
- while (!sorted) sorted1
- for (i0iltsizei)
- if (strcmp(xi,xi1gt0)
- strcpy(temp,xi)
- strcpy(xi,xi1)
- strcpy(xi1,temp)
- sorted 0
-
-
- size--
-
- void oddEven(char x, int N)
- int even0,sorted0,i,sizeN-1
- char temp
- while(!sorted)
- sorted1
- for(ieven iltsize i2)
- if(strcmp(xi,xi1gt0)
- strcpy(temp,xi)
- strcpy(xi,xi1)
- strcpy(xi1,temp)
- sorted 0
-
- even 1 even
-
- Sequential version Odd-Even has no advantages
- Parallel version Processors can work
independently without data conflicts
4Bubble, Odd Even Example
Bubble Smaller values move left one spot per
pass. Largest value move immediately to the end.
The loop size can shrink by one each pass. Odd
Even Large values move only one position per
pass. The loop size cannot shrink. However, all
interchanges can occur in parallel.
5One Parallel Iteration
- Odd ProcessorsmergeLow(pr data, pr-1 data)
Barrierif (rltP-2) mergeHigh(pr data,pr1
data)Barrier - Even ProcessorsmergeHigh(pr data, pr1 data)
Barrierif (rgt1) mergeLow(pr data, pr-1
data)Barrier
- Odd Processors
- sendRecv(pr data, pr-1 data) mergeHigh(pr
data, pr-1 data)if(rltP-2) - sendRecv(pr data, pr1 data)
mergeLow(pr data, pr1 data) -
- Even Processors sendRecv(pr data, pr1 data)
mergeLow(pr data, pr1 data) if(rgt1) sendrec
v(pr data, Pr-1 data) - mergeHigh(pr data, pr-1 data)
Notation r Processor rank, P number of
processors, pr data is the block of data
belonging to processor, r
Note P/2 Iterations are necessary to complete
the sort
6A Distributed Memory Implementation
- Scatter the data among available processors
- Locally sort N/P items on each processor
- Even Passes
- Even processors, pltN-1, exchange data with
processor, p1. - Processors, p and p1 perform a partial merge
where p extracts the lower half and p1 extracts
the upper half. - Odd Passes
- Even processors, pgt2, exchange data with
processor, p-1. - Processors, p, and p-1 perform a partial merge
where p extracts the upper half and p-1 extracts
the lower half. - Exchanging Data MPI_Sendrecv
7Partial Merge Lower keys
Store the lower n keys from arrays a and b into
array c
- mergeLow(char a, char b, char c, int n)
- int countA0, countB0, countC0
- while (countC lt n)
- if (strcmp(acountA,bcountB)
- strcpy(ccountC, acountA)
- else
- strcpy(ccountC, acountB)
-
- To merge upper keys
- Initialize the counts to n-1
- Decrement the counts instead of increment
- Change the countC lt n to countC gt 0
8Bitonic Sequence
10,12,14,20 95,90,60,40,35,23,18,0 3,5,8,9
- 3,5,8,9,10,12,14,20 95,90,60,40,35,23,18,0
Increasing and then decreasing where the end can
wrap around
9BitonicSort
- Unsorted 10,20,5,9.3,8,12,14,90,0,60,40,23,35,95,
18 - Step 1 10,20 9,5 3,8 14,12 0,90 60,40
23,35 95,18 - Step 2 9,510,2014,123,80,4060,9095,
3523,18 - 5,9 10,20 14,12 8,3 0,40 60,90 95,35
23,18 - Step 3 5,9,8,314,12,10,20
95,40,60,900,35,23,18 - 5,38,910,1214,20 95,9060,4023,35
0,18 - 3,5, 8,9, 10,12, 14,20 95,90, 60,40, 35,23,
18,0 - Step 4 3,5,8,9,10,12,14,0
95,90,60,40,35,23,18,20 - 3,5,8,0 10,12,14,9 35,23,18,2095,90,60,
40 - 3,08,5 10,914,12 18,2035,23
60,4095,90 - Sorted 0,3,5,8,9,10,12,14,18,20,23,35,40,60,90,95
10Bitonic Sorting Functions
- void bitonicMerge(int lo, int n, int dir)
-
- if (ngt1)
-
- int mn/2
- for (int ilo iltlom i)
- compareExchange(i, im, dir)
- bitonicMerge(lo, m, dir)
- bitonicMerge(lom, m, dir)
-
- void bitonicSort(int lo, int n, int dir)
-
- if (ngt1)
-
- int mn/2
- bitonicSort(lo, m, UP)
- bitonicSort(lom, m, DOWN
- bitonicMerge(lo, n, dir)
-
- Notes
- dir 0 for DOWN, and 1 for UP
- compareExchange moves
- low value left if dir UP
- high value left if dir DOWN
11Bitonic Sort Partners/Direction
Algorithm Steps ? level 1 2 2 3 3 3 4 4 4 4 j 0
0 1 0 1 2 0 1 2 3
- rank 0 partners 1/L, 2/L 1/L, 4/L
2/L 1/L, 8/L 4/L 2/L 1/L - rank 1 partners 0/H, 3/L 0/H, 5/L
3/L 0/H, 9/L 5/L 3/L 0/H - rank 2 partners 3/H, 0/H 3/L, 6/L
0/H 3/L, 10/L 6/L 0/H 3/L - rank 3 partners 2/L, 1/H 2/H, 7/L
1/H 2/H, 11/L 7/L 1/H 2/H - rank 4 partners 5/L, 6/H 5/H, 0/H
6/L 5/L, 12/L 0/H 6/L 5/L - rank 5 partners 4/H, 7/H 4/L, 1/H
7/L 4/H, 13/L 1/H 7/L 4/H - rank 6 partners 7/H, 4/L 7/H, 2/H
4/H 7/L, 14/L 2/H 4/H 7/L - rank 7 partners 6/L, 5/L 6/L, 3/H
5/H 6/H, 15/L 3/H 5/H 6/H - rank 8 partners 9/L, 10/L 9/L, 12/H
10/H 9/H, 0/H 12/L 10/L 9/L - rank 9 partners 8/H, 11/L 8/H, 13/H
11/H 8/L, 1/H 13/L 11/L 8/H - rank 10 partners 11/H, 8/H 11/L, 14/H
8/L 11/H, 2/H 14/L 8/H 11/L - rank 11 partners 10/L, 9/H 10/H, 15/H
9/L 10/L, 3/H 15/L 9/H 10/H - rank 12 partners 13/L, 14/H 13/H, 8/L
14/H 13/H, 4/H 8/H 14/L 13/L - rank 13 partners 12/H, 15/H 12/L, 9/L
15/H 12/L, 5/H 9/H 15/L 12/H - rank 14 partners 15/H, 12/L 15/H, 10/L
12/L 15/H, 6/H 10/H 12/H 15/L - rank 15 partners 14/L, 13/L 14/L, 11/L
13/L 14/L, 7/H 11/H 13/H 14/H
partner rank (1ltlt(level-j-1)) direction
((rankltpartner) ((rank (1ltltlevel)) 0))
12Java Partner/Direction Code
- public static void main(String args)
- int nproc 16, partner, levels
(int)(Math.log(nproc)/Math.log(2)) - for (int rank 0 rankltnproc rank)
- System.out.printf("rank 2d partners ",
rank) - for (int level 1 level lt levels level )
- for (int j 0 j lt level j)
- partner rank (1ltlt(level-j-1))
- String dir ((rankltpartner)((rank
(1ltltlevel))0))?"L""H" - System.out.printf("3d/s", partner,
dir) - if (levelltlevels) System.out.print(", ")
-
- System.out.println()
-
13Parallel Bitonic Pseudo code
- IF master processor
- Create or retrieve data to sort
- Scatter it among all processors (including the
master) - ELSE
- Receive portion to sort
- Sort local data using an algorithm of preference
- FOR( level 1 level lt lg(P) level )
- FOR ( j 0 jltlevel j )
- partner rank (1ltlt(level-j-1))
- Exchange data with partner
- IF ((rankltpartner) ((rank (1ltltlevel))
0)) - extract low values from local and received
data (mergeLow) - ELSE extract high values from local and
received data (mergeHigh) - Gather sorted data at the master
14Bucket Sort Partitioning
- Algorithm
- Assign a range of values to each processor
- Each processor sorts the values assigned
- The resulting values are forwarded to the master
- Steps
- Scatter N/P numbers to each processor
- Each Processor
- Creates smaller buckets of numbers for designated
for each processor - Sends the designated buckets to the various
processors and receives the designated buckets it
expects to receive - Sorts its section
- Sends its data back to the processor with rank 0
15Bucket Sort Partitioning
Unsorted Numbers
Unsorted Numbers
P1
P2
P3
Pp
Sorted
- Sequential Bucket Sort
- Drop sections of data to sort into buckets
- Sort each bucket
- Copy sorted bucket data back into the primary
array - Complexity O(b (n/b lg(n/b))
Sorted
Parallel Bucket Sort
Notes
- Bucket Sort works well for uniformly distributed
data - Recursively finding mediums from a data sample
(Sample Sort) attempts to equalize bucket sizes
16Rank (Enumeration) Sort
- Count the numbers smaller to each number, srci
or duplicates with a smaller index - The count is the final array position for x
- for (i0 iltN i)
- count 0
- for (j0 jltN j)
- if (srci gt srcj srcisrcj jlti)
x - destx srci
-
- Shared Memory parallel implementation
- Assign groups of numbers to each processor
- Find positions of N/P numbers in parallel
17Counting Sort
Works on primitive fixed point types int, char,
long, etc.
- Master scatters the data among the processors
- In parallel, each processor counts the total
occurrences for each of the N/P data points - Processors perform a collective sum operation
- Processors performs an all-to-all collective
prefix sum operation - In parallel, each processor stores the N/P data
items appropriately in the output array - Sorted data gathered at the master processor
Note This logic can be repeated to implement a
radix sort
18Merge Sort
- Scatter N/P items to each processor
- Sort Phase Processors sort its data with a
method of choice - Merge Phase Data routed and a merge is performed
at each level - for (gap1 gapltP gap2)
- if ((p/gap)2 ! 0) Send data to pgap
break - else
- Receive data from pgap Merge with
local data -
-
19Quick Sort
- Slave computers
- Perform the quick sort algorithm
- Base Case if data length lt threshold, send to
master (rank 0) - Recursive Step quick sort partition the data
- Request work from the master processor
- If none terminate
- Receive data, sort and send back to master
- Master computer
- Scatter N/P items to each processor
- When receive work request Send data to slave or
termination message - When receive sorted data Place data correctly in
final data list - When data sorted save data and terminate
Note Distributed work pools requires load
balancing. Processors maintain local work pools.
When the local load queue falls below a
threshold, processors request work from their
neighbors