Companion slides for

About This Presentation

Title:

Companion slides for

Description:

... can be at many levels (e.g. bit-level, instruction-level, data path) ... instruction. Art of Multiprocessor Programming. 42. An Aside: Java public class Counter ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 61

Provided by: mauriceher

Learn more at: https://www.cis.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Companion slides for

1
Introduction

Companion slides for
The Art of Multiprocessor Programming
by Maurice Herlihy Nir Shavit
Modified by Rajeev Alur
for CIS 640 at Penn, Spring 2009

2
Moores Law
Transistor count still rising
Clock speed flattening sharply
3
Still on some of your desktops The Uniprocesor
cpu
memory
4
In the Enterprise The Shared Memory
Multiprocessor(SMP)
5
Your New Desktop The Multicore Processor(CMP)
Sun T2000 Niagara
All on the same chip
cache
cache
cache
Bus
Bus
shared memory
6
Multicores Are Here

Intel's Intel ups ante with 4-core chip. New
microprocessor, due this year, will be faster,
use less electricity... San Fran Chronicle
AMD will launch a dual-core version of its
Opteron server processor at an event in New York
on April 21. PC World
Suns Niagarawill have eight cores, each core
capable of running 4 threads in parallel, for 32
concurrently running threads. . The Inquirer

7
Why do we care?

Time no longer cures software bloat
The free ride is over
When you double your programs path length
You cant just wait 6 months
Your software must somehow exploit twice as much
concurrency

8
Traditional Scaling Process
7x
Speedup
3.6x
1.8x
User code
Traditional Uniprocessor
Time Moores law
9
Multicore Scaling Process
User code
Multicore
Unfortunately, not so simple
10
Real-World Scaling Process
Speedup
2.9x
2x
1.8x
User code
Multicore
Parallelization and Synchronization require
great care
11
Multicore Programming Course Overview

Fundamentals
Models, algorithms, impossibility
Real-World programming
Architectures
Techniques
Topics not in Textbook
Memory models and system-level concurrency
libraries
High-level programming abstractions

12
A Zoo of Terms

Concurrent
Parallel
Distributed
Multicore
What do they all mean? How do they differ?

13
Concurrent Computing

Programs designed as a collection of interacting
threads/processes
Logical/programming abstraction
May be implemented on single processor by
interleaving or on multiple processors or on
distributed computers
Coordination/synchronization mechanism in a model
of concurrency may be realized in many ways in an
implementation

14
Parallel Computing

Computations that execute simultaneously to solve
a common problem (more efficiently)
Parallel algorithms Which problems can have
speed-up given multiple execution units?
Parallelism can be at many levels (e.g.
bit-level, instruction-level, data path)
Grid computing Branch of parallel computing
where problems are solved on clusters of
computers (interacting by message passing)
Multicore computing Branch of parallel computing
focusing on multiple execution units on same chip
(interacting by shared memory)

15
Distributed Computing

Involves multiple agents/programs (possibly with
different computational tasks) with multiple
computational resources (computers,
multiprocessors, network)
Many examples of contemporary software (e.g. web
services) are distributed systems
Heterogeneous nature, and range of time scales
(web access vs local access), make
design/programming more challenging

16
Sequential Computation
thread
memory
object
object
17
Concurrent Computation
threads
memory
object
object
18
Asynchrony

Sudden unpredictable delays
Cache misses (short)
Page faults (long)
Scheduling quantum used up (really long)

19
Model Summary

Multiple threads
Sometimes called processes
Single shared memory
Objects live in memory
Unpredictable asynchronous delays

20
Road Map

Textbook focuses on principles first, then
practice
Start with idealized models
Look at simplistic problems
Emphasize correctness over pragmatism
Correctness may be theoretical, but
incorrectness has practical impact
In course, interleaving of chapters from the two
parts

21
Concurrency Jargon

Hardware
Processors
Software
Threads, processes
Sometimes OK to confuse them, sometimes not.

22
Parallel Primality Testing

Challenge
Print primes from 1 to 1010
Given
Ten-processor multiprocessor
One thread per processor
Goal
Get ten-fold speedup (or close)

23
Load Balancing

109
2109

P0
P1
P9

Split the work evenly
Each thread tests range of 109

24
Procedure for Thread i
void primePrint int i ThreadID.get() //
IDs in 0..9 for (j i1091, jlt(i1)109
j) if (isPrime(j)) print(j)
25
Issues

Higher ranges have fewer primes
Yet larger numbers harder to test
Thread workloads
Uneven
Hard to predict

26
Issues

Higher ranges have fewer primes
Yet larger numbers harder to test
Thread workloads
Uneven
Hard to predict
Need dynamic load balancing

rejected
27
Shared Counter
19
each thread takes a number
18
17
28
Procedure for Thread i
int counter new Counter(1) void
primePrint long j 0 while (j lt 1010)
j counter.getAndIncrement() if
(isPrime(j)) print(j)
29
Procedure for Thread i
Counter counter new Counter(1) void
primePrint long j 0 while (j lt 1010)
j counter.getAndIncrement() if
(isPrime(j)) print(j)
Shared counter object
30
Where Things Reside
void primePrint int i ThreadID.get() //
IDs in 0..9 for (j i1091, jlt(i1)109
j) if (isPrime(j)) print(j)
Local variables
code
shared memory
1
shared counter
31
Procedure for Thread i
Counter counter new Counter(1) void
primePrint long j 0 while (j lt 1010)
j counter.getAndIncrement() if
(isPrime(j)) print(j)
Stop when every value taken
32
Procedure for Thread i
Counter counter new Counter(1) void
primePrint long j 0 while (j lt 1010)
j counter.getAndIncrement() if
(isPrime(j)) print(j)
Increment return each new value
33
Counter Implementation
public class Counter private long value
public long getAndIncrement() return
value
34
Counter Implementation
public class Counter private long value
public long getAndIncrement() return
value
OK for single thread, not for concurrent threads
35
What It Means
public class Counter private long value
public long getAndIncrement() return
value
36
What It Means
public class Counter private long value
public long getAndIncrement() return
value
temp value value temp 1 return temp
37
Not so good
Value 1
2
3
2
read 1
write 2
read 2
write 3
read 1
write 2
38
Is this problem inherent?
write
read
read
write
If we could only glue reads and writes
39
Challenge
public class Counter private long value
public long getAndIncrement() temp
value value temp 1 return temp

40
Challenge
public class Counter private long value
public long getAndIncrement() temp
value value temp 1 return temp

Make these steps atomic (indivisible)
41
Hardware Solution
public class Counter private long value
public long getAndIncrement() temp
value value temp 1 return temp

ReadModifyWrite() instruction
42
An Aside Java
public class Counter private long value
public long getAndIncrement() synchronized
temp value value temp 1
return temp
43
An Aside Java
public class Counter private long value
public long getAndIncrement() synchronized
temp value value temp 1
return temp
Synchronized block
44
An Aside Java
public class Counter private long value
public long getAndIncrement() synchronized
temp value value temp 1
return temp
Mutual Exclusion
45
Why do we care?

We want as much of the code as possible to
execute concurrently (in parallel)
A larger sequential part implies reduced
performance
Amdahls law this relation is not linear

46
Amdahls Law
Speedup
of computation given n CPUs instead of 1
47
Amdahls Law
Speedup
48
Amdahls Law
Parallel fraction
Speedup
49
Amdahls Law
Sequential fraction
Parallel fraction
Speedup
50
Amdahls Law
Sequential fraction
Parallel fraction
Speedup
Number of processors
51
Example