Memory Models: A Case for Rethinking Parallel Languages and Hardware - PowerPoint PPT Presentation

About This Presentation
Title:

Memory Models: A Case for Rethinking Parallel Languages and Hardware

Description:

Memory Models: A Case for Rethinking Parallel Languages and Hardware Sarita Adve University of Illinois sadve_at_illinois.edu Acks: Mark Hill, Kourosh Gharachorloo ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 37
Provided by: MarcS176
Learn more at: http://rsim.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Memory Models: A Case for Rethinking Parallel Languages and Hardware


1
Memory Models A Case for Rethinking Parallel
Languages and Hardware
  • Sarita Adve
  • University of Illinois
  • sadve_at_illinois.edu
  • Acks Mark Hill, Kourosh Gharachorloo, Jeremy
    Manson, Bill Pugh,
  • Hans Boehm, Doug Lea, Herb Sutter, Vikram Adve,
    Rob Bocchino, Marc Snir
  • PODC, SPAA keynote, August 2009

Also a paper by S. Adve H. Boehm,
http//denovo.cs.illinois.edu/papers/memory-models
.pdf
2
Memory Consistency Models
  • Parallelism for the masses!
  • Shared-memory most common
  • Memory model Legal values for reads

3
Memory Consistency Models
  • Parallelism for the masses!
  • Shared-memory most common
  • Memory model Legal values for reads

4
Memory Consistency Models
  • Parallelism for the masses!
  • Shared-memory most common
  • Memory model Legal values for reads

5
Memory Consistency Models
  • Parallelism for the masses!
  • Shared-memory most common
  • Memory model Legal values for reads

6
Memory Consistency Models
  • Parallelism for the masses!
  • Shared-memory most common
  • Memory model Legal values for reads

7
20 Years of Memory Models
  • Memory model is at the heart of concurrency
    semantics
  • 20 year journey from confusion to convergence at
    last!
  • Hard lessons learned
  • Implications for future
  • Current way to specify concurrency semantics is
    too hard
  • Fundamentally broken
  • Must rethink parallel languages and hardware
  • Implications for broader CS disciplines

8
What is a Memory Model?
  • Memory model defines what values a read can
    return


Initially ABCFlag0
Thread 1 Thread 2

A 26
while (Flag ! 1)
B 90 r1 B

r2 A Flag
1
90
0
26
9
Memory Model is Key to Concurrency Semantics
  • Interface between program and transformers of
    program
  • Defines what values a read can return

Dynamic optimizer
C program
Compiler
Assembly
Hardware
  • Weakest system component exposed to the
    programmer
  • Language level model has implications for
    hardware
  • Interface must last beyond trends

10
Desirable Properties of a Memory Model
  • 3 Ps
  • Programmability
  • Performance
  • Portability
  • Challenge hard to satisfy all 3 Ps
  • Late 1980s - 90s Largely driven by hardware
  • Lots of models, little consensus
  • 2000 onwards Largely driven by
    languages/compilers
  • Consensus model for Java, C (C, others ongoing)
  • Had to deal with mismatches in hardware models
  • Path to convergence has lessons for future

11
Programmability SC Lamport79
  • Programmability Sequential consistency (SC) most
    intuitive
  • Operations of a single thread in program order
  • All operations in a total order or atomic
  • But Performance?
  • Recent (complex) hardware techniques boost
    performance with SC
  • But compiler transformations still inhibited
  • But Portability?
  • Almost all h/w, compilers violate SC today
  • ?SC not practical, but

12
Next Best Thing SC Almost Always
  • Parallel programming too hard even with SC
  • Programmers (want to) write well structured code
  • Explicit synchronization, no data races
  • Thread 1
    Thread 2
  • Lock(L)
    Lock(L)
  • Read Data1 Read Data2
  • Write Data2 Write Data1

  • Unlock(L)
    Unlock(L)
  • SC for such programs much easier can reorder
    data accesses
  • Data-race-free model AdveHill90
  • SC for data-race-free programs
  • No guarantees for programs with data races

13
Definition of a Data Race
  • Distinguish between data and non-data
    (synchronization) accesses
  • Only need to define for SC executions ? total
    order
  • Two memory accesses form a race if
  • From different threads, to same location, at
    least one is a write
  • Occur one after another
  • Thread 1 Thread 2
  • Write, A, 26
  • Write, B, 90

  • Read, Flag, 0
  • Write, Flag, 1
  • Read, Flag, 1
  • Read, B, 90
    Read, A, 26
  • A race with a data access is a data race
  • Data-race-free-program No data race in any SC
    execution

14
Data-Race-Free Model
  • Data-race-free model SC for data-race-free
    programs
  • Does not preclude races for wait-free constructs,
    etc.
  • Requires races be explicitly identified as
    synchronization
  • E.g., use volatile variables in Java, atomics in
    C
  • Dekkers algorithm
  • Initially
    Flag1 Flag2 0

  • volatile Flag1, Flag2
  • Thread1
    Thread2
  • Flag1 1
    Flag2 1
  • if Flag2 0
    if Flag1 0
  • //critical
    section //critical
    section
  • SC prohibits
    both loads returning 0

15
Data-Race-Free Approach
  • Programmers model SC for data-race-free
    programs
  • Programmability
  • Simplicity of SC, for data-race-free programs
  • Performance
  • Specifies minimal constraints (for SC-centric
    view)
  • Portability
  • Language must provide way to identify races
  • Hardware must provide way to preserve ordering on
    races
  • Compiler must translate correctly

16
1990's in Practice (The Memory Models Mess)
  • Hardware
  • Implementation/performance-centric view
  • Different vendors had different models most
    non-SC
  • Alpha, Sun, x86, Itanium, IBM, AMD, HP, Cray,
  • Various ordering guarantees fences to impose
    other orders
  • Many ambiguities - due to complexity, by
    design(?),
  • High-level languages
  • Most shared-memory programming with Pthreads,
    OpenMP
  • Incomplete, ambiguous model specs
  • Memory model property of language, not library
    Boehm05
  • Java commercially successful language with
    threads
  • Chapter 17 of Java language spec on memory model
  • But hard to interpret, badly broken

LD
LD
ST
ST
Fence
LD
ST
ST
LD
17
2000 2004 Java Memory Model
  • 2000 Bill Pugh publicized fatal flaws in Java
    model
  • Lobbied Sun to form expert group to revise Java
    model
  • Open process via mailing list
  • Diverse participants
  • Took 5 years of intense, spirited debates
  • Many competing models
  • Final consensus model approved in 2005 for Java
    5.0
  • MansonPughAdve POPL 2005

18
Java Memory Model Highlights
  • Quick agreement that SC for data-race-free was
    required
  • Missing piece Semantics for programs with data
    races
  • Java cannot have undefined semantics for ANY
    program
  • Must ensure safety/security guarantees
  • Limit damage from data races in untrusted code
  • Goal Satisfy security/safety, w/ maximum system
    flexibility
  • Problem safety/security, limited damage w/
    threads very vague

19
Java Memory Model Highlights
  • Initially XY0

  • Thread 1 Thread 2
  • r1
    X r2 Y
  • Y
    r1 X r2

  • Is r1r242 allowed?
  • Data races produce causality loop!
  • Definition of a causality loop was surprisingly
    hard
  • Common compiler optimizations seem to
    violatecausality

20
Java Memory Model Highlights
  • Final model based on consensus, but complex
  • Programmers can (must) use SC for
    data-race-free
  • But system designers must deal with complexity
  • Correctness tools, racy programs, debuggers, ??
  • Recent discovery of bugs SevcikAspinall08

21
2005 - C, Microsoft Prism, Multicore
  • 2005 Hans Boehm initiated C concurrency
    model
  • Prior status no threads in C, most concurrency
    w/ Pthreads
  • Microsoft concurrently started its own internal
    effort
  • C easier than Java because it is unsafe
  • Data-race-free is plausible model
  • BUT multicore ? New h/w optimizations, more
    scrutiny
  • Mismatched h/w, programming views became
    painfully obvious
  • Debate that SC for data-race-free inefficient w/
    hardware models

22
C Challenges
  • 2006 Pressure to change Java/C to remove SC
    baseline
  • To accommodate some hardware vendors
  • But what is alternative?
  • Must allow some hardware optimizations
  • But must be teachable to undergrads
  • Showed such an alternative (probably) does not
    exist

23
C Compromise
  • Default C model is data-race-free
  • AMD, Intel, on board
  • But
  • Some systems need expensive fence for SC
  • Some programmers really want more flexibility
  • C specifies low-level atomics only for experts
  • Complicates spec, but only for experts
  • We are not advertising this part
  • BoehmAdve PLDI 2008

24
Summary of Current Status
  • Convergence to SC for data-race-free as
    baseline
  • For programs with data races
  • Minimal but complex semantics for safe languages
  • No semantics for unsafe languages

25
Lessons Learned
  • Specifying semantics for programs with data races
    is HARD
  • But no semantics for data races also has
    problems
  • Not an option for safe languages
  • Debugging, correctness checking tools
  • Hardware-software mismatch for some code
  • Simple optimizations have unintended
    consequences
  • State-of-the-art is fundamentally broken

26
Lessons Learned
  • Specifying semantics for programs with data races
    is HARD
  • But no semantics for data races also has
    problems
  • Not an option for safe languages
  • Debugging, correctness checking tools
  • Hardware-software mismatch for some code
  • Simple optimizations have unintended
    consequences
  • State-of-the-art is fundamentally broken

Banish shared-memory?
27
Lessons Learned
  • Specifying semantics for programs with data races
    is HARD
  • But no semantics for data races also has
    problems
  • Not an option for safe languages
  • Debugging, correctness checking tools
  • Hardware-software mismatch for some code
  • Simple optimizations have unintended
    consequences
  • State-of-the-art is fundamentally broken
  • We need
  • Higher-level disciplined models that enforce
    discipline
  • Hardware co-designed with high-level models

Banish wild shared-memory!
28
Research Agenda for Languages
  • Disciplined shared-memory models
  • Simple
  • Enforceable
  • Expressive
  • Performance
  • Key What discipline?
  • How to enforce it?

29
Data-Race-Free
  • A near-term discipline Data-race-free
  • Enforcement
  • Ideally, language prohibits by design
  • e.g., ownership types Boyapati02
  • Else, runtime catches as exception
  • e.g., Goldilocks Elmas07
  • But work still needed for expressivity and/or
    performance
  • But data-race-free still not sufficiently high
    level

30
Deterministic-by-Default Parallel Programming
  • Even data-race-free parallel programs are too
    hard
  • Multiple interleavings due to unordered
    synchronization (or races)
  • Makes reasoning and testing hard
  • But many algorithms are deterministic
  • Fixed input gives fixed output
  • Standard model for sequential programs
  • Also holds for many transformative parallel
    programs
  • Parallelism not part of problem specification,
    only for performance
  • Why write such an algorithm in non-deterministic
    style, then struggle to understand and
    control its behavior?

31
Deterministic-by-Default Model
  • Parallel programs should be deterministic-by-defau
    lt
  • Sequential semantics (easier than SC!)
  • If non-determinism is needed
  • should be explicitly requested, encapsulated
  • should not interfere with guarantees for the rest
    of the program
  • Enforcement
  • Ideally, language prohibits by design
  • Else, runtime catches violations as exceptions

32
State-of-the-art
  • Many deterministic languages today
  • Functional, pure data parallel, some
    domain-specific,
  • Much recent work on runtime, library-based
    approaches
  • E.g., Allen09, Divietti09, Olszewski09,
  • Our work Language approach for modern O-O
    methods
  • Deterministic Parallel Java (DPJ) V. Adve et
    al.

33
Deterministic Parallel Java (DPJ)
  • Object-oriented type and effect system
  • Aliasing information partition the heap into
    regions
  • Effect specifications regions read or written by
    each method
  • Language guarantees determinism through type
    checking
  • Side benefit regions, effects are valuable
    documentation
  • Implemented as extension to base Java type system
  • Initial evaluation for expressivity, performance
    Bocchino09
  • Semi-automatic tool for region annotations
    Vakilian09
  • Recent work on encapsulating frameworks and
    unchecked code
  • Ongoing work on integrating non-determinism

34
Implications for Hardware
  • Current hardware not matched even to current
    model
  • Near term ISA changes, speculation
  • Long term Co-design hardware with new software
    models
  • Use disciplined software to make more efficient
    hardware
  • Use hardware to support disciplined software

35
Illinois DeNovo Project
  • How to design hardware from the ground up to
  • Exploit disciplined parallelism
  • for better performance, power,
  • Support disciplined parallelism
  • for better dependability
  • Working with DPJ to exploit region, effect
    information
  • Software-assisted coherence, communication,
    scheduling
  • New hardware/software interface
  • Opportune time as we determine how to scale
    multicore

36
Conclusions
  • Current way to specify concurrency semantics
    fundamentally broken
  • Best we can do is SC for data-race-free
  • But cannot hide from programs with data races
  • Mismatched hardware-software
  • Simple optimizations give unintended consequences
  • Need
  • High-level disciplined models that enforce
    discipline
  • Hardware co-designed with high-level models
  • E.g., DPJ, DeNovo
  • Implications for many CS communities
Write a Comment
User Comments (0)
About PowerShow.com