EECE 571L: Parallel Programming - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

EECE 571L: Parallel Programming

Description:

Source: El-Rewini and Abd-El-Barr - Advanced Computer ... Nonuniform Memory Access (NUMA) E.g. Intel Quad Core. Loosely-Coupled: Disjoint Address Space ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 33
Provided by: chris475
Category:

less

Transcript and Presenter's Notes

Title: EECE 571L: Parallel Programming


1
EECE 571L Parallel Programming Reconfigurable
Computing
  • Lecture 1
  • 2009-01-07

UBC EECE571L Prof. Guy Lemieux
2
Goal
  • Flynns Taxonomy
  • Types of Parallelism
  • Limits Dependence
  • Limits Amdahl

3
Classes of Parallel Architecture
  • Flynns Taxonomy
  • SISD
  • SIMD
  • MISD
  • MIMD
  • New one?
  • SPMD

UBC EECE571L Prof. Guy Lemieux
4
SISD
  • Single Instruction Single Data

Source El-Rewini and Abd-El-Barr - Advanced
Computer Architecture and Parallel Processing
UBC EECE571L Prof. Guy Lemieux
5
SIMD
  • Single Instruction Multiple Data

Source El-Rewini and Abd-El-Barr - Advanced
Computer Architecture and Parallel Processing
UBC EECE571L Prof. Guy Lemieux
6
MISD
  • Multiple Instruction Single Data
  • Exist in Concept, Not Implemented

UBC EECE571L Prof. Guy Lemieux
7
MIMD
  • Multiple Instruction Multiple Data

Source El-Rewini and Abd-El-Barr - Advanced
Computer Architecture and Parallel Processing
UBC EECE571L Prof. Guy Lemieux
8
MIMD
  • Can either be tightly or loosely-coupled
  • Tightly-Coupled
  • Share Address Space
  • Symmetric Multiprocessors (SMPs)
  • Uniform Memory Access (UMA)
  • Nonuniform Memory Access (NUMA)
  • E.g. Intel Quad Core
  • Loosely-Coupled
  • Disjoint Address Space
  • Distributed SISDs
  • Message Passing
  • E.g. Network Clusters

UBC EECE571L Prof. Guy Lemieux
9
SPMD
  • Single Program, Multiple Data
  • A special case of MIMD
  • MIMD compute nodes can run completely different
    programs
  • 3D physics on node 1
  • graphics rendering on node 2
  • SPMD compute nodes run identical programs
  • Free-running, out-of-sync programs
  • At any point in time, each node may run a
    different instruction

UBC EECE571L Prof. Guy Lemieux
10
Parallelism Levels
  • Task
  • Thread
  • Data
  • Loop
  • Instruction
  • Bit

UBC EECE571L Prof. Guy Lemieux
11
Task-level Parallelism
  • Function Level of a Program
  • Example
  • Given data A and B, find func1(A,B) and
    func2(A,B)
  • Two tasks
  • Assume two processors available
  • func1(A,B) on CPU1
  • func2(A,B) on CPU2

UBC EECE571L Prof. Guy Lemieux
12
Thread-level Parallelism
  • Similar to Task-level, but finer grain
  • Thread could be independent or cooperating to
    achieve a greater goal

UBC EECE571L Prof. Guy Lemieux
13
Data-level Parallelism
  • Distribution of Data among Processors
  • Example
  • Given an array of n elements,
  • multiply each element by 2
  • Divide n by the number of processors p
  • Each processor perform division on n/p elements

UBC EECE571L Prof. Guy Lemieux
14
Loop-level Parallelism
  • Exploit Concurrency in Loops
  • Possible examples
  • For-loop to calculate dot product of array A and
    B
  • Is this really data parallelism?
  • Overlap loop iteration i with iteration i1 by
    starting next iteration as early as possible (but
    no earlier than any loop-carried dependence)
  • Is this pipeline parallelism?

UBC EECE571L Prof. Guy Lemieux
15
Instruction-level Parallelism
  • Machine Instruction Level
  • Identify independent instructions within an
    instruction window
  • Superscalar done at run-time by cpu
  • VLIW done at compile-time
  • Dynamic optimizations by the run-time software
    system are also possible (eg, JIT)
  • Example
  • ADD R1, R2, R3
  • LOAD R4, R2

UBC EECE571L Prof. Guy Lemieux
16
Bit-level Parallelism
  • Example
  • 16-bit addition
  • Two instructions on a 8-bit ALU
  • One instruction on a 16-bit ALU

UBC EECE571L Prof. Guy Lemieux
17
Dependence
  • Does the result of the current instruction depend
    on the previous result?
  • Yes Previous result must be computed first
  • No Instructions can be computed in parallel

UBC EECE571L Prof. Guy Lemieux
18
Type of Dependencies
  • RAR
  • RAW
  • WAR
  • WAW

UBC EECE571L Prof. Guy Lemieux
19
RAR no dependence
  • Read after Read
  • No Dependency
  • Example
  • R2 lt R1 1
  • R3 lt R1 2

UBC EECE571L Prof. Guy Lemieux
20
RAW true dependence
  • Read after Write
  • Producer/consumer relationship
  • Example
  • R2 lt R1 1
  • R3 lt R2 2

UBC EECE571L Prof. Guy Lemieux
21
WAR false dependence
  • Write after Read
  • Aka anti-dependence
  • Example
  • R2 lt R1 1
  • R1 lt R3 2
  • Can these be avoided?

UBC EECE571L Prof. Guy Lemieux
22
Avoid WAR false dependence
  • Avoid by allocating new storage
  • Register renaming
  • Separate memory locations
  • Example
  • R2 lt R1 1
  • R1' lt R3 2

UBC EECE571L Prof. Guy Lemieux
23
WAW output dependence
  • Write after Write
  • What happens if you reorder the output going to a
    printer?
  • Example
  • R2 lt R1 1
  • R2 lt R3 2

UBC EECE571L Prof. Guy Lemieux
24
Avoid WAW output dependence
  • Avoid by optimizing away earlier computation?
  • Avoid by allocating new storage?
  • Register renaming
  • Separate memory locations
  • Example
  • R2 lt R1 1
  • R2' lt R3 2

UBC EECE571L Prof. Guy Lemieux
25
The Ultimate Speed Limit
  • beep, beep!

UBC EECE571L Prof. Guy Lemieux
26
Amdahls Law
  • Question
  • If you improve part of the system, how much
    faster does the entire system run?

Gene Amdahl Famous computersystems architect
atIBM in 60s and 70s.
  • Amdahls Law gives us the speed limit!
  • Given Enhancement E, define
  • Speedup(E) PerformanceAfter(E) /
    PerformanceBefore(E)
  • ExecutionTimeBefore(E
    ) / ExecutionTimeAfter(E)

UBC EECE571L Prof. Guy Lemieux
27
Amdahls Law
  • More detail.
  • Enhancement E
  • results in a speedup of S
  • to only some fraction of the program F
  • ExecutionTimeAfter(E) (1-F) F/S
    ExecutionTimeBefore(E)
  • (derivation on next slide)
  • Usually expressed as a speedup
  • Speedup(E)
  • ExecutionTimeBefore(E) / ExecutionTimeAfter(E
    )
  • 1 / (1-F) F/S

UBC EECE571L Prof. Guy Lemieux
28
Amdahls Law Derivation
  • Before E
  • (1-F) portion untouched
  • ExecutionTimeBefore(E) (1-F) F 1
  • F portion improved by S times, to F/S
  • ExecutionTimeAfter(E) (1-F) F/S
  • Therefore
  • Speedup(E)
    Before/After 1 / (1-F) F/S
  • Lesson when speeding up a computer system,
    work on the part with the biggest F

UBC EECE571L Prof. Guy Lemieux
29
Amdahls Law Speed Limits!
F is portion of program that can be sped up.
UBC EECE571L Prof. Guy Lemieux
30
Amdahls Law Summary
  • Amdahls Law
  • Designers Mantra Make the common case fast
  • Applies to all engineering optimizations !!!
  • Corollary
  • Rare cases dont matter
  • Students Corollary
  • On a test, do the easy stuff for the most marks
    first

UBC EECE571L Prof. Guy Lemieux
31
Amdahls Law Rebuttal ?
  • Does Amdahl always win?
  • Gustafsons Law
  • As the number of processors increases, you can
    scale the problem size
  • As the problem size grows, ideally the sequential
    part will shrink

UBC EECE571L Prof. Guy Lemieux
32
Summary
  • Concurrency try to identify independent elements
    that can be performed in parallel
  • Only parallelize the common case, and make sure
    it is frequent enough to matter

UBC EECE571L Prof. Guy Lemieux
Write a Comment
User Comments (0)
About PowerShow.com