Problem Solving Strategies - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Problem Solving Strategies

Description:

Title: Message Passing Computing Author: harveyd Last modified by: Southern Oregon University Created Date: 4/1/2002 3:20:18 PM Document presentation format – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 20

Provided by: harv87

Learn more at: http://cs.sou.edu

Category:

more less

Transcript and Presenter's Notes

Title: Problem Solving Strategies

1
Problem Solving Strategies

Partitioning
Divide the problem into disjoint parts
Compute each part separately
Divide and Conquer
Divide Phase Recursively create sub-problems of
the same type
Base case Reached Execute an algorithm
Conquer phase Merge the results as the recursion
unwinds
Traditional Example Merge Sort
Where is the work?
Partitioning Creating disjoint parts of the
problem
Divide and Conquer Merging the separate results
Traditional Example Quick Sort

2
Parallel Sorting Considerations

Distributed memory
Distributed system precision differences can
cause unpredictable results
Traditional algorithms can require excessive
communication
Modified algorithms minimize communication
requirements
Typically, data is scattered to the P processors
Shared Memory
Critical sections and Mutual Exclusion locks can
inhibit performance
Modified algorithms eliminate the need for locks
Each processor can sort N/P data points or they
can work in parallel in a more fine grain manner
(no need for processor communication).

3
Two Related Sorts

Bubble Sort

Odd-Even Sort

void bubble(char x, int N)
int sorted0, i, sizeN-1
char temp
while (!sorted) sorted1
for (i0iltsizei)
if (strcmp(xi,xi1gt0)
strcpy(temp,xi)
strcpy(xi,xi1)
strcpy(xi1,temp)
sorted 0
size--

void oddEven(char x, int N)
int even0,sorted0,i,sizeN-1
char temp
while(!sorted)
sorted1
for(ieven iltsize i2)
if(strcmp(xi,xi1gt0)
strcpy(temp,xi)
strcpy(xi,xi1)
strcpy(xi1,temp)
sorted 0
even 1 even

Sequential version Odd-Even has no advantages
Parallel version Processors can work
independently without data conflicts

4
Bubble, Odd Even Example

Bubble Pass

Odd Even Pass

Bubble Smaller values move left one spot per
pass. Largest value move immediately to the end.
The loop size can shrink by one each pass. Odd
Even Large values move only one position per
pass. The loop size cannot shrink. However, all
interchanges can occur in parallel.
5
One Parallel Iteration

Distributed Memory

Shared Memory

Odd ProcessorsmergeLow(pr data, pr-1 data)
Barrierif (rltP-2) mergeHigh(pr data,pr1
data)Barrier
Even ProcessorsmergeHigh(pr data, pr1 data)
Barrierif (rgt1) mergeLow(pr data, pr-1
data)Barrier

Odd Processors
sendRecv(pr data, pr-1 data) mergeHigh(pr
data, pr-1 data)if(rltP-2)
sendRecv(pr data, pr1 data)
mergeLow(pr data, pr1 data)
Even Processors sendRecv(pr data, pr1 data)
mergeLow(pr data, pr1 data) if(rgt1) sendrec
v(pr data, Pr-1 data)
mergeHigh(pr data, pr-1 data)

Notation r Processor rank, P number of
processors, pr data is the block of data
belonging to processor, r
Note P/2 Iterations are necessary to complete
the sort
6
A Distributed Memory Implementation

Scatter the data among available processors
Locally sort N/P items on each processor
Even Passes
Even processors, pltN-1, exchange data with
processor, p1.
Processors, p and p1 perform a partial merge
where p extracts the lower half and p1 extracts
the upper half.
Odd Passes
Even processors, pgt2, exchange data with
processor, p-1.
Processors, p, and p-1 perform a partial merge
where p extracts the upper half and p-1 extracts
the lower half.
Exchanging Data MPI_Sendrecv

7
Partial Merge Lower keys
Store the lower n keys from arrays a and b into
array c

mergeLow(char a, char b, char c, int n)
int countA0, countB0, countC0
while (countC lt n)
if (strcmp(acountA,bcountB)
strcpy(ccountC, acountA)
else
strcpy(ccountC, acountB)

To merge upper keys
Initialize the counts to n-1
Decrement the counts instead of increment
Change the countC lt n to countC gt 0

8
Bitonic Sequence
10,12,14,20 95,90,60,40,35,23,18,0 3,5,8,9

3,5,8,9,10,12,14,20 95,90,60,40,35,23,18,0

Increasing and then decreasing where the end can
wrap around
9
BitonicSort

Unsorted 10,20,5,9.3,8,12,14,90,0,60,40,23,35,95,
18
Step 1 10,20 9,5 3,8 14,12 0,90 60,40
23,35 95,18
Step 2 9,510,2014,123,80,4060,9095,
3523,18
5,9 10,20 14,12 8,3 0,40 60,90 95,35
23,18
Step 3 5,9,8,314,12,10,20
95,40,60,900,35,23,18
5,38,910,1214,20 95,9060,4023,35
0,18
3,5, 8,9, 10,12, 14,20 95,90, 60,40, 35,23,
18,0
Step 4 3,5,8,9,10,12,14,0
95,90,60,40,35,23,18,20
3,5,8,0 10,12,14,9 35,23,18,2095,90,60,
40
3,08,5 10,914,12 18,2035,23
60,4095,90
Sorted 0,3,5,8,9,10,12,14,18,20,23,35,40,60,90,95

10
Bitonic Sorting Functions

void bitonicMerge(int lo, int n, int dir)
if (ngt1)
int mn/2
for (int ilo iltlom i)
compareExchange(i, im, dir)
bitonicMerge(lo, m, dir)
bitonicMerge(lom, m, dir)

void bitonicSort(int lo, int n, int dir)
if (ngt1)
int mn/2
bitonicSort(lo, m, UP)
bitonicSort(lom, m, DOWN
bitonicMerge(lo, n, dir)

Notes
dir 0 for DOWN, and 1 for UP
compareExchange moves
low value left if dir UP
high value left if dir DOWN

11
Bitonic Sort Partners/Direction
Algorithm Steps ? level 1 2 2 3 3 3 4 4 4 4 j 0
0 1 0 1 2 0 1 2 3

rank 0 partners 1/L, 2/L 1/L, 4/L
2/L 1/L, 8/L 4/L 2/L 1/L
rank 1 partners 0/H, 3/L 0/H, 5/L
3/L 0/H, 9/L 5/L 3/L 0/H
rank 2 partners 3/H, 0/H 3/L, 6/L
0/H 3/L, 10/L 6/L 0/H 3/L
rank 3 partners 2/L, 1/H 2/H, 7/L
1/H 2/H, 11/L 7/L 1/H 2/H
rank 4 partners 5/L, 6/H 5/H, 0/H
6/L 5/L, 12/L 0/H 6/L 5/L
rank 5 partners 4/H, 7/H 4/L, 1/H
7/L 4/H, 13/L 1/H 7/L 4/H
rank 6 partners 7/H, 4/L 7/H, 2/H
4/H 7/L, 14/L 2/H 4/H 7/L
rank 7 partners 6/L, 5/L 6/L, 3/H
5/H 6/H, 15/L 3/H 5/H 6/H
rank 8 partners 9/L, 10/L 9/L, 12/H
10/H 9/H, 0/H 12/L 10/L 9/L
rank 9 partners 8/H, 11/L 8/H, 13/H
11/H 8/L, 1/H 13/L 11/L 8/H
rank 10 partners 11/H, 8/H 11/L, 14/H
8/L 11/H, 2/H 14/L 8/H 11/L
rank 11 partners 10/L, 9/H 10/H, 15/H
9/L 10/L, 3/H 15/L 9/H 10/H
rank 12 partners 13/L, 14/H 13/H, 8/L
14/H 13/H, 4/H 8/H 14/L 13/L
rank 13 partners 12/H, 15/H 12/L, 9/L
15/H 12/L, 5/H 9/H 15/L 12/H
rank 14 partners 15/H, 12/L 15/H, 10/L
12/L 15/H, 6/H 10/H 12/H 15/L
rank 15 partners 14/L, 13/L 14/L, 11/L
13/L 14/L, 7/H 11/H 13/H 14/H

partner rank (1ltlt(level-j-1)) direction
((rankltpartner) ((rank (1ltltlevel)) 0))
12
Java Partner/Direction Code

public static void main(String args)
int nproc 16, partner, levels
(int)(Math.log(nproc)/Math.log(2))
for (int rank 0 rankltnproc rank)
System.out.printf("rank 2d partners ",
rank)
for (int level 1 level lt levels level )
for (int j 0 j lt level j)
partner rank (1ltlt(level-j-1))
String dir ((rankltpartner)((rank
(1ltltlevel))0))?"L""H"
System.out.printf("3d/s", partner,
dir)
if (levelltlevels) System.out.print(", ")
System.out.println()

13
Parallel Bitonic Pseudo code

IF master processor
Create or retrieve data to sort
Scatter it among all processors (including the
master)
ELSE
Receive portion to sort
Sort local data using an algorithm of preference
FOR( level 1 level lt lg(P) level )
FOR ( j 0 jltlevel j )
partner rank (1ltlt(level-j-1))
Exchange data with partner
IF ((rankltpartner) ((rank (1ltltlevel))
0))
extract low values from local and received
data (mergeLow)
ELSE extract high values from local and
received data (mergeHigh)
Gather sorted data at the master

14
Bucket Sort Partitioning

Algorithm
Assign a range of values to each processor
Each processor sorts the values assigned
The resulting values are forwarded to the master
Steps
Scatter N/P numbers to each processor
Each Processor
Creates smaller buckets of numbers for designated
for each processor
Sends the designated buckets to the various
processors and receives the designated buckets it
expects to receive
Sorts its section
Sends its data back to the processor with rank 0

15
Bucket Sort Partitioning
Unsorted Numbers
Unsorted Numbers
P1
P2
P3
Pp
Sorted

Sequential Bucket Sort
Drop sections of data to sort into buckets
Sort each bucket
Copy sorted bucket data back into the primary
array
Complexity O(b (n/b lg(n/b))

Sorted
Parallel Bucket Sort
Notes

Bucket Sort works well for uniformly distributed
data
Recursively finding mediums from a data sample
(Sample Sort) attempts to equalize bucket sizes

16
Rank (Enumeration) Sort

Count the numbers smaller to each number, srci
or duplicates with a smaller index
The count is the final array position for x
for (i0 iltN i)
count 0
for (j0 jltN j)
if (srci gt srcj srcisrcj jlti)
x
destx srci
Shared Memory parallel implementation
Assign groups of numbers to each processor
Find positions of N/P numbers in parallel

17
Counting Sort
Works on primitive fixed point types int, char,
long, etc.

Master scatters the data among the processors
In parallel, each processor counts the total
occurrences for each of the N/P data points
Processors perform a collective sum operation
Processors performs an all-to-all collective
prefix sum operation
In parallel, each processor stores the N/P data
items appropriately in the output array
Sorted data gathered at the master processor

Note This logic can be repeated to implement a
radix sort
18
Merge Sort

Scatter N/P items to each processor
Sort Phase Processors sort its data with a
method of choice
Merge Phase Data routed and a merge is performed
at each level
for (gap1 gapltP gap2)
if ((p/gap)2 ! 0) Send data to pgap
break
else
Receive data from pgap Merge with
local data

19
Quick Sort

Slave computers
Perform the quick sort algorithm
Base Case if data length lt threshold, send to
master (rank 0)
Recursive Step quick sort partition the data
Request work from the master processor
If none terminate
Receive data, sort and send back to master
Master computer
Scatter N/P items to each processor
When receive work request Send data to slave or
termination message
When receive sorted data Place data correctly in
final data list
When data sorted save data and terminate

Note Distributed work pools requires load
balancing. Processors maintain local work pools.
When the local load queue falls below a
threshold, processors request work from their
neighbors

Write a Comment

User Comments (0)