ECE 1747H : Parallel Programming - PowerPoint PPT Presentation

About This Presentation

Title:

ECE 1747H : Parallel Programming

Description:

(this is on our research cluster for the purpose of the homework). Other than that ... May not be wise to change the program (sequential execution would take ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 56

Provided by: CITI

Category:

more less

Transcript and Presenter's Notes

Title: ECE 1747H : Parallel Programming

1
ECE 1747H Parallel Programming

Lecture 1-2 Overview

2
ECE 1747H

Meeting time Mon 4-6 PM
Instructor Cristiana Amza,
http//www.eecg.toronto.edu/amza
amza_at_eecg.toronto.edu, office Pratt 484E

3
Material

Course notes
Web material (e.g., published papers)
No required textbook, some recommended

4
Prerequisites

Programming in C or C
Data structures
Basics of machine architecture
Basics of network programming
Please send e-mail to ecehelp_at_ece.toronto.edu
to get an eecg account !! (name, stuid,
class, instructor)
madalin_at_cs.toronto.edu to get a cluster
account
(this is on our research cluster for the purpose
of the homework).

5
Other than that

No written homeworks, no exams
10 for each small programming assignments
(expect 1)
10 class participation
Rest comes from major course project

6
Programming Project

Parallelizing a sequential program, or improving
the performance or the functionality of a
parallel program
Project proposal and final report
In-class project proposal and final report
presentation
Sample project presentation can be posted

7
Parallelism (1 of 2)

Ability to execute different parts of a single
program concurrently on different machines
Goal shorter running time
Grain of parallelism how big are the parts?
Can be instruction, statement, procedure,
Will mainly focus on relative coarse grain

8
Parallelism (2 of 2)

Coarse-grain parallelism mainly applicable to
long-running, scientific programs
Examples weather prediction, prime number
factorization, simulations,

9
Lecture material (1 of 4)

Parallelism
What is parallelism?
What can be parallelized?
Inhibitors of parallelism dependences

10
Lecture material (2 of 4)

Standard models of parallelism
shared memory (Pthreads)
message passing (MPI)
shared memory data parallelism (OpenMP)
Classes of applications
scientific
servers

11
Lecture material (3 of 4)

Transaction processing
classic programming model for databases
now being proposed for scientific programs

12
Lecture material (4 of 4)

Perf. of parallel distributed programs
architecture-independent optimization
architecture-dependent optimization

13
Course Organization

First 2-3 weeks of semester
lectures on parallelism, patterns, models
small programming assignment done individually or
in teams of up to 3
Rest of the semester
major programming project, done individually or
in small group
Research paper discussions

14
Parallel vs. Distributed Programming

Parallel programming has matured
Few standard programming models
Few common machine architectures
Portability between models and architectures

15
Bottom Line

Programmer can now focus on program and use
suitable programming model
Reasonable hope of portability
Problem much performance optimization is still
platform-dependent
Performance portability is a problem

16
ECE 1747H Parallel Programming

Lecture 1-2 Parallelism, Dependences

17
Parallelism

Ability to execute different parts of a program
concurrently on different machines
Goal shorten execution time

18
Measures of Performance

To computer scientists speedup, execution time.
To applications people size of problem, accuracy
of solution, etc.

19
Speedup of Algorithm

Speedup of algorithm sequential execution time
/ execution time on p processors (with the same
data set).

speedup
p
20
Speedup on Problem

Speedup on problem sequential execution time of
best known sequential algorithm / execution time
on p processors.
A more honest measure of performance.
Avoids picking an easily parallelizable algorithm
with poor sequential execution time.

21
What Speedups Can You Get?

Linear speedup
Confusing term implicitly means a 1-to-1 speedup
per processor.
(almost always) as good as you can do.
Sub-linear speedup more normal due to overhead
of startup, synchronization, communication, etc.

22
Speedup
speedup
linear
actual
p
23
Scalability

No really precise decision.
Roughly speaking, a program is said to scale to a
certain number of processors p, if going from p-1
to p processors results in some acceptable
improvement in speedup (for instance, an increase
of 0.5).

24
Super-linear Speedup?

Due to cache/memory effects
Subparts fit into cache/memory of each node.
Whole problem does not fit in cache/memory of a
single node.
Nondeterminism in search problems.
One thread finds near-optimal solution very
quickly gt leads to drastic pruning of search
space.

25
Cardinal Performance Rule

Dont leave (too) much of your code sequential!

26
Amdahls Law

If 1/s of the program is sequential, then you can
never get a speedup better than s.
(Normalized) sequential execution time 1/s
(1- 1/s) 1
Best parallel execution time on p processors
1/s (1 - 1/s) /p
When p goes to infinity, parallel execution
1/s
Speedup s.

27
Why keep something sequential?

Some parts of the program are not parallelizable
(because of dependences)
Some parts may be parallelizable, but the
overhead dwarfs the increased speedup.

28
When can two statements execute in parallel?

On one processor
statement 1
statement 2
On two processors
processor1 processor2
statement1 statement2

29
Fundamental Assumption

Processors execute independently no control over
order of execution between processors

30
When can 2 statements execute in parallel?

Possibility 1
Processor1 Processor2
statement1
statement2
Possibility 2
Processor1 Processor2
statement2
statement1

31
When can 2 statements execute in parallel?

Their order of execution must not matter!
In other words,
statement1 statement2
must be equivalent to
statement2 statement1

32
Example 1

a 1
b a
Statements cannot be executed in parallel
Program modifications may make it possible.

33
Example 2

a f(x)
b a
May not be wise to change the program (sequential
execution would take longer).

34
Example 3

a 1
a 2
Statements cannot be executed in parallel.

35
True dependence

Statements S1, S2
S2 has a true dependence on S1
iff
S2 reads a value written by S1

36
Anti-dependence

Statements S1, S2.
S2 has an anti-dependence on S1
iff
S2 writes a value read by S1.

37
Output Dependence

Statements S1, S2.
S2 has an output dependence on S1
iff
S2 writes a variable written by S1.

38
When can 2 statements execute in parallel?

S1 and S2 can execute in parallel
iff
there are no dependences between S1 and S2
true dependences
anti-dependences
output dependences
Some dependences can be removed.

39
Example 4

Most parallelism occurs in loops.
for(i0 ilt100 i)
ai i
No dependences.
Iterations can be executed in parallel.

40
Example 5

for(i0 ilt100 i)
ai i
bi 2i
Iterations and statements can be executed in
parallel.

41
Example 6

for(i0ilt100i) ai i
for(i0ilt100i) bi 2i
Iterations and loops can be executed in parallel.

42
Example 7

for(i0 ilt100 i)
ai ai 100
There is a dependence on itself!
Loop is still parallelizable.

43
Example 8

for( i0 ilt100 i )
ai f(ai-1)
Dependence between ai and ai-1.
Loop iterations are not parallelizable.

44
Loop-carried dependence

A loop carried dependence is a dependence that is
present only if the statements are part of the
execution of a loop.
Otherwise, we call it a loop-independent
dependence.
Loop-carried dependences prevent loop iteration
parallelization.

45
Example 9