Title: INTRODUCTION TO
1INTRODUCTION TO PARALLEL COMPUTING
Instructor Stefan Dobrev E-mail
sdobrev_at_site.uottawa.ca Office SITE 5043 Office
hours Thursday 1400-1600 Course page
http//www.site.uottawa.ca/sdobrev/CSI4140 Lectu
res Wednesday 1730 2030 STE J0106
2Course objective
- You are expected to
- learn basic concepts of parallel computing
- understand various approaches to parallel
hardware architectures and their strong/weak
points - become familiar with typical software/programmin
g approaches - learn basic parallel algorithms and algorithmic
techniques - learn the jargon so you understand what people
are talking about - be able to apply this knowledge
3Course objective (cont.)
- Familiarity with Parallel Concepts and Techniques
- drastically flattening the learning curve in a
parallel environment - Broad Understanding of Parallel Architectures and
Programming Techniques - be able to quickly adapt to any parallel
processor/programming environment
Flexibility
4Topics Covered
- Introduction - What and Why
- Parallel Architectures, Performance Criteria
Models - Basic Parallelization Concepts Techniques
- Communication and Synchronization
- Load Balancing, Scheduling and Data
Partitioning - Basic Parallelization Techniques and Paradigms
- Basic Programming Tools PVM, MPI OpenMP
- Parallel Algorithms and Applications
- Searching, Sorting
- Matrix Algorithms
- Graph Algorithms
5Topic objectives
- Parallel Architectures
- to understand weak and strong points of each
architecture - for a given parallel computer to understand for
which kind of problems it is well suited for, and
which parallelization approach is the best for
the given computer - for a given problem to understand which type of
parallel computer is the best/most efficient
platform for it - Parallelization Concepts Techniques
- to be able to understand and design parallel
programs - Programming Tools
- to be able to quickly write real parallel
programs - Parallel Algorithms and Applications
- to get deeper knowledge in designing efficient
parallel programs
6Textbook and Reading
Textbook Parallel Programming Techniques and
Applications using Networked Workstations and
Parallel Computers Barry Wilkinson, Michael
Allen Prentice Hall 1999, ISBN
0-13-671710-1 Further recommended
reading Introduction to Parallel Computing
Design and Analysis of Algorithms Vipin Kumar,
Ananth Grama, Anshul Gupta, George
Kyrypis Benjamin/Cummings 1994, ISBN
0-805303179-0 Other sources on the course web
page
7Grading (tentative)
- 20 3 assignments (A)
- 20 midterm (M)
- 15 group project (P)
- 45 final exam (E)
- You have to get at least 50 on ME (32.5) to
count AP. - So
- if ME32.5, then
- final mark AMPE
- else
- final mark 100/65(ME)
8Assignments (tentative)
- Assignment 1 (6 points)
- posted January 20, due January 27
- some questions about architecture/basic concepts
- simple programming problem to get familiar with
the MPI - Assignment 2 (7 points)
- posted February 13 due February 26
- more involved programming, including load
balancing - Assignment 3 (7 points)
- posted March 13, due March 24
- some programming parallel algorithm design and
analysis in pseudocode a variant of some of the
algorithms presented in the lecture possibly
discussion/justification of the design choices
9Midterm and Final Exam (tentative)
- Midterm (20 points)
- about mid February
- all topics covered up to that moment
- multiple choice part pseudocode design and
analysis design choice justification - Final Exam (45 points)
- all topics, with emphasis on post-midterm
material - multiple choice, pseudocode design and analysis,
design justification
10Project (tentative)
- Project (15 points)
- in groups of 3
- written part 25?min in-class presentation (end
of semester) - several possible types
- report on interesting architectures/hardware/TOP5
00news? - report on abstract models (PRAM, BSP)
- report on interesting non-covered
algorithms/topics - not-so-trivial programming project
- report on cluster issues (network technologies,
operating system, single system image, system
management, parallel I/O) - one page progress notice beginning of March
- written report end of March
11Introduction
- What is parallel computing
- using several processors/execution units in
parallel to collectively solve a problem - the processors are contributing to the solution
of the same problem - What is not parallel computing
- threads time-sharing on one processor
- WWW server farm - distributed processing
12Why do we need powerful computers?
- To solve much bigger problems much faster!
- Performance, performance, performace
- there are problems which can use any amount of
computing (i.e. simulation) - Capability
- to solve previously unsolvable problems
- too big data sizes, real time constraints
- Capacity
- - to handle a lot of processing much faster
13Simulation Third Pillar of Science
Traditional scientific and engineering
approach 1) Do theory or paper design often too
complex 2) Build systems and perform experiments
Limitations - Too difficult Build large wind
tunnels - Too expensive Build a throwaway
passenger jet - Too slow Wait for climate or
galactic evolution - Too dangerous Weapons,
drug design, climate experimentation
Computational science paradigm 3) Use high
performance computer systems to simulate the
phenomenon
14Examples of Challenging Computations
Science - Global climate modeling - Astrophysical
modeling - Biology Genome analysis, protein
folding Engineering - Earthquake and
structural modeling - Crash simulation -
Semiconductor design Defense - Nuclear weapons
test by simulation - Cryptography Business -
Financial and economic modeling
15Detailed Example Climate Modeling
Problem to compute f(latitude, longitude,
elevation, time) (temperature, pressure,
humidity, wind velocity) Approach - Discretize
the domain, e.g. measurement point every
kilometer - Devise an algorithm to predict
weather at time t1 given the data for time
t Basic step - modeling fluid flow in atmosphere
(Navier Stokes problem) - cca. 100 flops per
grid point
16Flop/s - a Measure of Performance... !
- Some views of flop/s as a performance measure.
- The results of unscientific survey of harassed
delegates at a recent computer conference... - "Floating point? yeah, that's in football. A
defensive play. A quick guy goes deep on his own,
looking for some receiver who thinks he's open,
but then the floater spoils his day." - "That's easy. We had it in Physics. Floating
point is a narrow range in liquids, just below
the boiling point" - "I didn't read the book, and it was late when
they showed the movie on TV. I fell asleep.
Sharon Stone bores me, anyway."
17Detailed Example Climate Modeling (cont.)
Computational requirements (with 1 minute
timestep) - to match real time, need 5x1011
flops in 60 seconds 8 Gflop/s - Weather
forecasting (7 days in 24 hours)
56 Gflop/s - Climate modeling (50 years in 30
days) 4.8 Tflop/s - To
use in policy negotiations (50years in 12 hours)
288 Tflop/s To double grid resolution, the
computation is at least 8x - even more, as the
time step should be reduced as well The fastest
current supercomputer 35Tflop (NEC Earth
Simulator)
18Why are powerful computers parallel ?
- Physical limits to the speed of a single
processor - Speed of light (30 cm/nanosecond).
- Copper wire (9 cm/nanosecond).
- Silicon technology (0.3 micron at present).
- Diminishing returns in speeding-up sequential
processors - it is increasingly more expensive to make a
single processor faster. - Parallel computer can have a lot of memory
- i.e. 1000 processors, each processor with 1GB
memory
19A bit of historical perspective
- Parallel computing has been here since the early
days of computing. - Traditionally custom HW, custom SW, high
- The doom of the Moore law
- custom HW has hard time catching up with the
commodity processors - Current trend use commodity HW components,
standardize SW - Market size of High Performance Computing
- the market size for disposable diapers
- Parallel computing has never been mainstream.
- Perhaps it will never be.
20A bit of historical perspective (cont.)
- Parallelism sneaking into commodity computers
- Instruction Level Parallelism - wide issue,
pipelining, OOO - data level parallelism SSE, 3DNow, Altivec
- thread level parallelism Hyperthreading in
Pentium IV - Transistor budgets allow for multiple processor
cores on a chip. - Most applications would benefit from being
parallelised and executed on a parallel computer. - even PC applications, especially the most
demanding ones - games, multimedia
- Chicken Egg Problem
- Why build parallel computers when the
applications are sequential? - Why parallelize applications when there are no
parallel commodity computers?
21The beauty and challenge of parallel algorithms
- Problems that are trivial in sequential setting
can be quite interesting and challenging to
parallelize. - Very simple example Computing sum of n numbers
- How would you do it in parallel?
- using n processors
- using p processors
- when communication is cheap
- when communication is expensive