Introduction to parallel algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

Introduction to parallel algorithms

Description:

A group of processors work together to solve a problem ... T ~ (ts Ltb) log P. L: Length of data. x1. x8. x7. x3. x5. x2. x6. x4. x1. x1. x1. x4. x2. x3. x2 ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 31

Provided by: asri9

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to parallel algorithms

1
Introduction to parallel algorithms
COT 5405 Fall 2006

Ashok Srinivasan
www.cs.fsu.edu/asriniva
Florida State University

2
Outline

Background
Primitives
Algorithms
Important points

3
Background

Terminology
Time complexity
Speedup
Efficiency
Scalability
Communication cost model

4
Time complexity

Parallel computation
A group of processors work together to solve a
problem
Time required for the computation is the period
from when the first processor starts working
until when the last processor stops

5
Other terminology

Speedup S T1/TP
Efficiency E S/P
Work W P TP
Scalability
How does TP decrease as we increase P to solve
the same problem?
How should the problem size increase with P, to
keep E constant?

Notation
P Number of processors
T1 Time on one processor
TP Time on P processors

6
Communication cost model

Processes spend some time doing useful work, and
some time communicating
Model communication cost as
TC ts L tb
L message size
Independent of location of processes
Any process can communicate with any other
process
A process can simultaneously send and receive one
message

7
I/O model

We will ignore I/O issues, for the most part
We will assume that input and output are
distributed across the processors in a manner of
our choosing
Example Sorting
Input x1, x2, ..., xn
Initially, xi is on processor i
Output xp1, xp2, ..., xpn
xpi on processor i
xpi lt xpi1

8
Primitives

Reduction
Broadcast
Gather/Scatter
All gather
Prefix

9
Reduction -- 1
x1
Compute x1 x2 ... xn
xn
x2
x4
x3

Tn n-1 (n-1)(tstb)
Sn 1/(1 ts tb)

10
Reduction -- 2
x1
Reduction-1 for x1, ... xn/2
xn/21
Reduction-1 for xn/21, ... xn

Tn n/2-1 (n/2-1)(ts tb) (ts tb) 1
n/2 n/2 (ts tb)
Sn 2/(1 ts tb)

11
Reduction -- 3
xn/21
x1
Reduction-1 for x1, ... xn/2
Reduction-1 for xn/21, ... xn
xn/21
xn/41
x3n/41
x1
Reduction-1 for x1, ... xn/4
Reduction-1 for xn/41, ... xn/2
Reduction-1 for xn/21, ... x3n/4
Reduction-1 for x3n/41, ... xn

Apply reduction-2 recursively
Divide and conquer
Tn log2n (ts tb) log2n
Sn (n/ log2n) x 1/(1 ts tb)
Note that any associative operator can be used in
place of

12
Parallel addition features

If n gtgt P
Each processor adds n/P distinct numbers
Perform parallel reduction on P numbers
TP n/P (1 ts tb) log P
Optimal P obtained by differentiating wrt P
Popt n/(1 ts tb)
If communication cost is high, then fewer
processors ought to be used
E 1 (1 ts tb) P log P/n-1
As problem size increases, efficiency
increases
As number of processors increases, efficiency
decreases

13
Some common collective operations
14
Broadcast
x1
x8
x7
x2
x1
x4
x3
x3
x4
x1
x2
x5
x6
x8
x7
x5
x6
x1
x4
x3
x2
x2
x1

T (ts Ltb) log P
L Length of data

15
Gather/Scatter
Note Si0log P1 2i (2 log P 1)/(21)
P-1 P
4L
2L
2L
L
L
L
L

Gather Data move towards the root
Scatter Review question
T ts log P PLtb

16
All gather
x8
x7
x4
x3
x5
x6
L
x2
x1

Equivalent to each processor broadcasting to all
the processors

17
All gather
x78
x78
x34
x34
2L
x56
x56
L
x12
x12
18
All gather
x58
x58
x14
x14
2L
x58
x58
L
4L
x14
x14
19
All gather
x18
x18
x18
x18
2L
x18
x18
L
4L
x18
x18

Tn ts log P PLtb

20
Review question Pipelining

Useful when repeatedly and regularly
performing a large number of primitive operations
Optimal time for a broadcast log P
But doing this n times takes n log P time
Pipelining the broadcasts takes n P time
Almost constant amortized time per broadcast
if n gtgt P
n P ltlt n log P when n gtgt P
Review question How can you accomplish this time
complexity?

21
Sequential prefix