Title: Memory Models: A Case for Rethinking Parallel Languages and Hardware
1Memory Models A Case for Rethinking Parallel
Languages and Hardware
- Sarita Adve
- University of Illinois
- sadve_at_illinois.edu
- Acks Mark Hill, Kourosh Gharachorloo, Jeremy
Manson, Bill Pugh, - Hans Boehm, Doug Lea, Herb Sutter, Vikram Adve,
Rob Bocchino, Marc Snir - PODC, SPAA keynote, August 2009
Also a paper by S. Adve H. Boehm,
http//denovo.cs.illinois.edu/papers/memory-models
.pdf
2Memory Consistency Models
- Parallelism for the masses!
- Shared-memory most common
- Memory model Legal values for reads
3Memory Consistency Models
- Parallelism for the masses!
- Shared-memory most common
- Memory model Legal values for reads
4Memory Consistency Models
- Parallelism for the masses!
- Shared-memory most common
- Memory model Legal values for reads
5Memory Consistency Models
- Parallelism for the masses!
- Shared-memory most common
- Memory model Legal values for reads
6Memory Consistency Models
- Parallelism for the masses!
- Shared-memory most common
- Memory model Legal values for reads
720 Years of Memory Models
- Memory model is at the heart of concurrency
semantics - 20 year journey from confusion to convergence at
last! - Hard lessons learned
- Implications for future
- Current way to specify concurrency semantics is
too hard - Fundamentally broken
- Must rethink parallel languages and hardware
- Implications for broader CS disciplines
8What is a Memory Model?
- Memory model defines what values a read can
return
Initially ABCFlag0
Thread 1 Thread 2
A 26
while (Flag ! 1)
B 90 r1 B
r2 A Flag
1
90
0
26
9Memory Model is Key to Concurrency Semantics
- Interface between program and transformers of
program - Defines what values a read can return
Dynamic optimizer
C program
Compiler
Assembly
Hardware
- Weakest system component exposed to the
programmer - Language level model has implications for
hardware - Interface must last beyond trends
10Desirable Properties of a Memory Model
- 3 Ps
- Programmability
- Performance
- Portability
- Challenge hard to satisfy all 3 Ps
- Late 1980s - 90s Largely driven by hardware
- Lots of models, little consensus
- 2000 onwards Largely driven by
languages/compilers - Consensus model for Java, C (C, others ongoing)
- Had to deal with mismatches in hardware models
- Path to convergence has lessons for future
11Programmability SC Lamport79
- Programmability Sequential consistency (SC) most
intuitive - Operations of a single thread in program order
- All operations in a total order or atomic
- But Performance?
- Recent (complex) hardware techniques boost
performance with SC - But compiler transformations still inhibited
- But Portability?
- Almost all h/w, compilers violate SC today
- ?SC not practical, but
12Next Best Thing SC Almost Always
- Parallel programming too hard even with SC
- Programmers (want to) write well structured code
- Explicit synchronization, no data races
- Thread 1
Thread 2 - Lock(L)
Lock(L) - Read Data1 Read Data2
- Write Data2 Write Data1
-
- Unlock(L)
Unlock(L) - SC for such programs much easier can reorder
data accesses - Data-race-free model AdveHill90
- SC for data-race-free programs
- No guarantees for programs with data races
13Definition of a Data Race
- Distinguish between data and non-data
(synchronization) accesses - Only need to define for SC executions ? total
order - Two memory accesses form a race if
- From different threads, to same location, at
least one is a write - Occur one after another
- Thread 1 Thread 2
- Write, A, 26
- Write, B, 90
-
Read, Flag, 0 - Write, Flag, 1
- Read, Flag, 1
- Read, B, 90
Read, A, 26 - A race with a data access is a data race
- Data-race-free-program No data race in any SC
execution
14Data-Race-Free Model
- Data-race-free model SC for data-race-free
programs - Does not preclude races for wait-free constructs,
etc. - Requires races be explicitly identified as
synchronization - E.g., use volatile variables in Java, atomics in
C - Dekkers algorithm
- Initially
Flag1 Flag2 0 -
volatile Flag1, Flag2 - Thread1
Thread2
- Flag1 1
Flag2 1 - if Flag2 0
if Flag1 0 - //critical
section //critical
section - SC prohibits
both loads returning 0
15Data-Race-Free Approach
- Programmers model SC for data-race-free
programs - Programmability
- Simplicity of SC, for data-race-free programs
- Performance
- Specifies minimal constraints (for SC-centric
view) - Portability
- Language must provide way to identify races
- Hardware must provide way to preserve ordering on
races - Compiler must translate correctly
161990's in Practice (The Memory Models Mess)
- Hardware
- Implementation/performance-centric view
- Different vendors had different models most
non-SC - Alpha, Sun, x86, Itanium, IBM, AMD, HP, Cray,
- Various ordering guarantees fences to impose
other orders - Many ambiguities - due to complexity, by
design(?), - High-level languages
- Most shared-memory programming with Pthreads,
OpenMP - Incomplete, ambiguous model specs
- Memory model property of language, not library
Boehm05 - Java commercially successful language with
threads - Chapter 17 of Java language spec on memory model
- But hard to interpret, badly broken
LD
LD
ST
ST
Fence
LD
ST
ST
LD
172000 2004 Java Memory Model
- 2000 Bill Pugh publicized fatal flaws in Java
model - Lobbied Sun to form expert group to revise Java
model - Open process via mailing list
- Diverse participants
- Took 5 years of intense, spirited debates
- Many competing models
- Final consensus model approved in 2005 for Java
5.0 - MansonPughAdve POPL 2005
18Java Memory Model Highlights
- Quick agreement that SC for data-race-free was
required - Missing piece Semantics for programs with data
races - Java cannot have undefined semantics for ANY
program - Must ensure safety/security guarantees
- Limit damage from data races in untrusted code
- Goal Satisfy security/safety, w/ maximum system
flexibility - Problem safety/security, limited damage w/
threads very vague
19Java Memory Model Highlights
- Initially XY0
-
Thread 1 Thread 2
- r1
X r2 Y - Y
r1 X r2 -
Is r1r242 allowed? - Data races produce causality loop!
- Definition of a causality loop was surprisingly
hard - Common compiler optimizations seem to
violatecausality
20Java Memory Model Highlights
- Final model based on consensus, but complex
- Programmers can (must) use SC for
data-race-free - But system designers must deal with complexity
- Correctness tools, racy programs, debuggers, ??
- Recent discovery of bugs SevcikAspinall08
212005 - C, Microsoft Prism, Multicore
- 2005 Hans Boehm initiated C concurrency
model - Prior status no threads in C, most concurrency
w/ Pthreads - Microsoft concurrently started its own internal
effort - C easier than Java because it is unsafe
- Data-race-free is plausible model
- BUT multicore ? New h/w optimizations, more
scrutiny - Mismatched h/w, programming views became
painfully obvious - Debate that SC for data-race-free inefficient w/
hardware models
22C Challenges
- 2006 Pressure to change Java/C to remove SC
baseline - To accommodate some hardware vendors
- But what is alternative?
- Must allow some hardware optimizations
- But must be teachable to undergrads
- Showed such an alternative (probably) does not
exist
23C Compromise
- Default C model is data-race-free
- AMD, Intel, on board
- But
- Some systems need expensive fence for SC
- Some programmers really want more flexibility
- C specifies low-level atomics only for experts
- Complicates spec, but only for experts
- We are not advertising this part
- BoehmAdve PLDI 2008
24Summary of Current Status
- Convergence to SC for data-race-free as
baseline - For programs with data races
- Minimal but complex semantics for safe languages
- No semantics for unsafe languages
25Lessons Learned
- Specifying semantics for programs with data races
is HARD - But no semantics for data races also has
problems - Not an option for safe languages
- Debugging, correctness checking tools
- Hardware-software mismatch for some code
- Simple optimizations have unintended
consequences - State-of-the-art is fundamentally broken
26Lessons Learned
- Specifying semantics for programs with data races
is HARD - But no semantics for data races also has
problems - Not an option for safe languages
- Debugging, correctness checking tools
- Hardware-software mismatch for some code
- Simple optimizations have unintended
consequences - State-of-the-art is fundamentally broken
Banish shared-memory?
27Lessons Learned
- Specifying semantics for programs with data races
is HARD - But no semantics for data races also has
problems - Not an option for safe languages
- Debugging, correctness checking tools
- Hardware-software mismatch for some code
- Simple optimizations have unintended
consequences - State-of-the-art is fundamentally broken
- We need
- Higher-level disciplined models that enforce
discipline - Hardware co-designed with high-level models
Banish wild shared-memory!
28Research Agenda for Languages
- Disciplined shared-memory models
- Simple
- Enforceable
- Expressive
- Performance
-
- Key What discipline?
- How to enforce it?
29Data-Race-Free
- A near-term discipline Data-race-free
- Enforcement
- Ideally, language prohibits by design
- e.g., ownership types Boyapati02
- Else, runtime catches as exception
- e.g., Goldilocks Elmas07
- But work still needed for expressivity and/or
performance - But data-race-free still not sufficiently high
level
30Deterministic-by-Default Parallel Programming
- Even data-race-free parallel programs are too
hard - Multiple interleavings due to unordered
synchronization (or races) - Makes reasoning and testing hard
- But many algorithms are deterministic
- Fixed input gives fixed output
- Standard model for sequential programs
- Also holds for many transformative parallel
programs - Parallelism not part of problem specification,
only for performance - Why write such an algorithm in non-deterministic
style, then struggle to understand and
control its behavior?
31Deterministic-by-Default Model
- Parallel programs should be deterministic-by-defau
lt - Sequential semantics (easier than SC!)
- If non-determinism is needed
- should be explicitly requested, encapsulated
- should not interfere with guarantees for the rest
of the program - Enforcement
- Ideally, language prohibits by design
- Else, runtime catches violations as exceptions
32State-of-the-art
- Many deterministic languages today
- Functional, pure data parallel, some
domain-specific, - Much recent work on runtime, library-based
approaches - E.g., Allen09, Divietti09, Olszewski09,
- Our work Language approach for modern O-O
methods - Deterministic Parallel Java (DPJ) V. Adve et
al. -
33Deterministic Parallel Java (DPJ)
- Object-oriented type and effect system
- Aliasing information partition the heap into
regions - Effect specifications regions read or written by
each method - Language guarantees determinism through type
checking - Side benefit regions, effects are valuable
documentation - Implemented as extension to base Java type system
- Initial evaluation for expressivity, performance
Bocchino09 - Semi-automatic tool for region annotations
Vakilian09 - Recent work on encapsulating frameworks and
unchecked code - Ongoing work on integrating non-determinism
34Implications for Hardware
- Current hardware not matched even to current
model - Near term ISA changes, speculation
- Long term Co-design hardware with new software
models - Use disciplined software to make more efficient
hardware - Use hardware to support disciplined software
35Illinois DeNovo Project
- How to design hardware from the ground up to
- Exploit disciplined parallelism
- for better performance, power,
- Support disciplined parallelism
- for better dependability
- Working with DPJ to exploit region, effect
information - Software-assisted coherence, communication,
scheduling - New hardware/software interface
- Opportune time as we determine how to scale
multicore
36Conclusions
- Current way to specify concurrency semantics
fundamentally broken - Best we can do is SC for data-race-free
- But cannot hide from programs with data races
- Mismatched hardware-software
- Simple optimizations give unintended consequences
- Need
- High-level disciplined models that enforce
discipline - Hardware co-designed with high-level models
- E.g., DPJ, DeNovo
- Implications for many CS communities