Companion slides for - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Companion slides for

Description:

... can be at many levels (e.g. bit-level, instruction-level, data path) ... instruction. Art of Multiprocessor Programming. 42. An Aside: Java public class Counter ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 61
Provided by: mauriceher
Category:
Tags: companion | slides

less

Transcript and Presenter's Notes

Title: Companion slides for


1
Introduction
  • Companion slides for
  • The Art of Multiprocessor Programming
  • by Maurice Herlihy Nir Shavit
  • Modified by Rajeev Alur
  • for CIS 640 at Penn, Spring 2009

2
Moores Law
Transistor count still rising
Clock speed flattening sharply
3
Still on some of your desktops The Uniprocesor
cpu
memory
4
In the Enterprise The Shared Memory
Multiprocessor(SMP)
5
Your New Desktop The Multicore Processor(CMP)
Sun T2000 Niagara
All on the same chip
cache
cache
cache
Bus
Bus
shared memory
6
Multicores Are Here
  • Intel's Intel ups ante with 4-core chip. New
    microprocessor, due this year, will be faster,
    use less electricity... San Fran Chronicle
  • AMD will launch a dual-core version of its
    Opteron server processor at an event in New York
    on April 21. PC World
  • Suns Niagarawill have eight cores, each core
    capable of running 4 threads in parallel, for 32
    concurrently running threads. . The Inquirer

7
Why do we care?
  • Time no longer cures software bloat
  • The free ride is over
  • When you double your programs path length
  • You cant just wait 6 months
  • Your software must somehow exploit twice as much
    concurrency

8
Traditional Scaling Process
7x
Speedup
3.6x
1.8x
User code
Traditional Uniprocessor
Time Moores law
9
Multicore Scaling Process
User code
Multicore
Unfortunately, not so simple
10
Real-World Scaling Process
Speedup
2.9x
2x
1.8x
User code
Multicore
Parallelization and Synchronization require
great care
11
Multicore Programming Course Overview
  • Fundamentals
  • Models, algorithms, impossibility
  • Real-World programming
  • Architectures
  • Techniques
  • Topics not in Textbook
  • Memory models and system-level concurrency
    libraries
  • High-level programming abstractions

12
A Zoo of Terms
  • Concurrent
  • Parallel
  • Distributed
  • Multicore
  • What do they all mean? How do they differ?

13
Concurrent Computing
  • Programs designed as a collection of interacting
    threads/processes
  • Logical/programming abstraction
  • May be implemented on single processor by
    interleaving or on multiple processors or on
    distributed computers
  • Coordination/synchronization mechanism in a model
    of concurrency may be realized in many ways in an
    implementation

14
Parallel Computing
  • Computations that execute simultaneously to solve
    a common problem (more efficiently)
  • Parallel algorithms Which problems can have
    speed-up given multiple execution units?
  • Parallelism can be at many levels (e.g.
    bit-level, instruction-level, data path)
  • Grid computing Branch of parallel computing
    where problems are solved on clusters of
    computers (interacting by message passing)
  • Multicore computing Branch of parallel computing
    focusing on multiple execution units on same chip
    (interacting by shared memory)

15
Distributed Computing
  • Involves multiple agents/programs (possibly with
    different computational tasks) with multiple
    computational resources (computers,
    multiprocessors, network)
  • Many examples of contemporary software (e.g. web
    services) are distributed systems
  • Heterogeneous nature, and range of time scales
    (web access vs local access), make
    design/programming more challenging

16
Sequential Computation
thread
memory
object
object
17
Concurrent Computation
threads
memory
object
object
18
Asynchrony
  • Sudden unpredictable delays
  • Cache misses (short)
  • Page faults (long)
  • Scheduling quantum used up (really long)

19
Model Summary
  • Multiple threads
  • Sometimes called processes
  • Single shared memory
  • Objects live in memory
  • Unpredictable asynchronous delays

20
Road Map
  • Textbook focuses on principles first, then
    practice
  • Start with idealized models
  • Look at simplistic problems
  • Emphasize correctness over pragmatism
  • Correctness may be theoretical, but
    incorrectness has practical impact
  • In course, interleaving of chapters from the two
    parts

21
Concurrency Jargon
  • Hardware
  • Processors
  • Software
  • Threads, processes
  • Sometimes OK to confuse them, sometimes not.

22
Parallel Primality Testing
  • Challenge
  • Print primes from 1 to 1010
  • Given
  • Ten-processor multiprocessor
  • One thread per processor
  • Goal
  • Get ten-fold speedup (or close)

23
Load Balancing

109
2109

P0
P1
P9
  • Split the work evenly
  • Each thread tests range of 109

24
Procedure for Thread i
void primePrint int i ThreadID.get() //
IDs in 0..9 for (j i1091, jlt(i1)109
j) if (isPrime(j)) print(j)
25
Issues
  • Higher ranges have fewer primes
  • Yet larger numbers harder to test
  • Thread workloads
  • Uneven
  • Hard to predict

26
Issues
  • Higher ranges have fewer primes
  • Yet larger numbers harder to test
  • Thread workloads
  • Uneven
  • Hard to predict
  • Need dynamic load balancing

rejected
27
Shared Counter
19
each thread takes a number
18
17
28
Procedure for Thread i
int counter new Counter(1) void
primePrint long j 0 while (j lt 1010)
j counter.getAndIncrement() if
(isPrime(j)) print(j)
29
Procedure for Thread i
Counter counter new Counter(1) void
primePrint long j 0 while (j lt 1010)
j counter.getAndIncrement() if
(isPrime(j)) print(j)
Shared counter object
30
Where Things Reside
void primePrint int i ThreadID.get() //
IDs in 0..9 for (j i1091, jlt(i1)109
j) if (isPrime(j)) print(j)
Local variables
code
shared memory
1
shared counter
31
Procedure for Thread i
Counter counter new Counter(1) void
primePrint long j 0 while (j lt 1010)
j counter.getAndIncrement() if
(isPrime(j)) print(j)
Stop when every value taken
32
Procedure for Thread i
Counter counter new Counter(1) void
primePrint long j 0 while (j lt 1010)
j counter.getAndIncrement() if
(isPrime(j)) print(j)
Increment return each new value
33
Counter Implementation
public class Counter private long value
public long getAndIncrement() return
value
34
Counter Implementation
public class Counter private long value
public long getAndIncrement() return
value
OK for single thread, not for concurrent threads
35
What It Means
public class Counter private long value
public long getAndIncrement() return
value
36
What It Means
public class Counter private long value
public long getAndIncrement() return
value
temp value value temp 1 return temp
37
Not so good
Value 1
2
3
2
read 1
write 2
read 2
write 3
read 1
write 2
38
Is this problem inherent?
write
read
read
write
If we could only glue reads and writes
39
Challenge
public class Counter private long value
public long getAndIncrement() temp
value value temp 1 return temp

40
Challenge
public class Counter private long value
public long getAndIncrement() temp
value value temp 1 return temp

Make these steps atomic (indivisible)
41
Hardware Solution
public class Counter private long value
public long getAndIncrement() temp
value value temp 1 return temp

ReadModifyWrite() instruction
42
An Aside Java
public class Counter private long value
public long getAndIncrement() synchronized
temp value value temp 1
return temp
43
An Aside Java
public class Counter private long value
public long getAndIncrement() synchronized
temp value value temp 1
return temp
Synchronized block
44
An Aside Java
public class Counter private long value
public long getAndIncrement() synchronized
temp value value temp 1
return temp
Mutual Exclusion
45
Why do we care?
  • We want as much of the code as possible to
    execute concurrently (in parallel)
  • A larger sequential part implies reduced
    performance
  • Amdahls law this relation is not linear

46
Amdahls Law
Speedup
of computation given n CPUs instead of 1
47
Amdahls Law
Speedup
48
Amdahls Law
Parallel fraction
Speedup
49
Amdahls Law
Sequential fraction
Parallel fraction
Speedup
50
Amdahls Law
Sequential fraction
Parallel fraction
Speedup
Number of processors
51
Example
  • Ten processors
  • 60 concurrent, 40 sequential
  • How close to 10-fold speedup?

52
Example
  • Ten processors
  • 60 concurrent, 40 sequential
  • How close to 10-fold speedup?

53
Example
  • Ten processors
  • 80 concurrent, 20 sequential
  • How close to 10-fold speedup?

54
Example
  • Ten processors
  • 80 concurrent, 20 sequential
  • How close to 10-fold speedup?

55
Example
  • Ten processors
  • 90 concurrent, 10 sequential
  • How close to 10-fold speedup?

56
Example
  • Ten processors
  • 90 concurrent, 10 sequential
  • How close to 10-fold speedup?

57
Example
  • Ten processors
  • 99 concurrent, 01 sequential
  • How close to 10-fold speedup?

58
Example
  • Ten processors
  • 99 concurrent, 01 sequential
  • How close to 10-fold speedup?

59
The Moral
  • Making good use of our multiple processors
    (cores) means
  • Finding ways to effectively parallelize our code
  • Minimize sequential parts
  • Reduce idle time in which threads wait

60
Multicore Programming
  • This is what this course is about
  • The that is not easy to make concurrent yet may
    have a large impact on overall speedup
Write a Comment
User Comments (0)
About PowerShow.com