Deterministic Execution of Nondeterministic Shared-Memory Programs - PowerPoint PPT Presentation

About This Presentation
Title:

Deterministic Execution of Nondeterministic Shared-Memory Programs

Description:

... produce different result than serial execution. In fact, execution not necessarily equivalent with any ... Dynamic table checked and updated during execution ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 30
Provided by: dangro
Category:

less

Transcript and Presenter's Notes

Title: Deterministic Execution of Nondeterministic Shared-Memory Programs


1
Deterministic Execution of Nondeterministic
Shared-Memory Programs
  • Dan Grossman
  • University of Washington
  • Dagstuhl Seminar on
  • Design and Validation of Concurrent Systems
  • August 2009

2
What if
  • What if you could run the same multithreaded
    program on the same inputs twice and know you
    would get the same results?
  • What exactly does that mean?
  • Why might you want that?
  • How can we do that (semi-efficiently)?
  • But first
  • Some background on me and the talks Im not
    giving
  • Key terminology and perspectives
  • More important than technical details at this
    event

3
Biography / group names
  • Me
  • Programming-languages person
  • Type systems, compilers for memory-safe C dialect
    200-2004
  • 30 ? 80 focus on multithreading, 2005-
  • Co-advising 3-4 students with computer architect
    Luis Ceze, 2007-
  • Two groups for marketing purposes
  • WASP, wasp.cs.washington.edu
  • SAMPA, sampa.cs.washington.edu

4
The talk you wont see
void transferFrom(int amt, Acct other)
atomic other.withdraw(amt)
this.deposit(amt)
  • Transactions are to shared-memory concurrency as
    garbage
  • collection is to memory management OOPSLA 07
  • Semantic problems with nontransactional accesses
    worse than locks!
  • Fix with stronger guarantees and compiler opts
    PLDI07
  • Or static type system, formal semantics, and
    proof POPL08
  • Or more dynamic approach adapting to Haskell
    submitted
  • Prototypes for OCaml, Java, Scheme, and Haskell

5
This talk
  • Take an arbitrary C/C program with POSIX
    threads
  • Locks, barriers, condition variables, data races,
    whatever
  • Compile it funny
  • Link it against a funny run-time system
  • Get deterministic behavior
  • Well, as deterministic as a sequential C program
  • Joint work Luis Ceze, Tom Bergan, Joe Devietti,
    Owen Anderson

6
Terminology
  • Essential perspectives, not just definitions
  • Parallelism vs. concurrency
  • Or different terms if you prefer
  • Sequential semantics vs. determinism vs.
    nondeterminism
  • What is an input?
  • Level of abstraction
  • Which one do you care about?

7
Concurrency
  • Working definition
  • Software is concurrent if a primary intellectual
    challenge is responding to external events from
    multiple sources in a timely manner.
  • Examples operating system, shared hashtable,
    version control
  • Key challenge is responsiveness
  • often leads to threads or asynchrony
  • Correctness usually requires synchronization
    (e.g., locks)

8
Parallelism
  • Working definition
  • Software is parallel if a primary intellectual
    challenge is using extra computational resources
    to do more useful work per unit time.
  • Examples scientific computing, most graphics, a
    lot of servers
  • Key challenge is Amdahls Law
  • No sequential bottlenecks, no imbalanced load
  • When pure fork-join isnt correct, need
    synchronization

9
The confusion
  • First, this use of terms isnt standard
  • Many systems are both
  • And its really a matter of degree
  • Similar lower-level mechanisms, such as threads
    and locks
  • And similar errors (race conditions, deadlocks,
    etc.)
  • Our work determinizes these lower-level
    mechanisms, so we determinize concurrent and
    parallel applications
  • But purely parallel ones probably benefit less

10
Terminology
  • Essential perspectives, not just definitions
  • Parallelism vs. concurrency
  • Or different terms if you prefer
  • Sequential semantics vs. determinism vs.
    nondeterminism
  • What is an input?
  • Level of abstraction
  • Which one do you care about?

11
Sequential semantics
  • Some languages can have results defined purely
    sequentially, but are designed to have better
    parallel-performance guarantees (thanks to a cost
    model)
  • Examples DPJ, Cilk, NESL,
  • For correctness, reason sequentially
  • For performance, reason in parallel
  • Really designed for parallelism, not concurrency
  • Not our work

12
Sequential isnt always deterministic
  • Surprisingly easy to forget this

int f1() print(A) print(B) return 0 int
f2() print(C) print(D) return 0 int g()
return f1() f2()
  • Must g() print ABCD?
  • Java yes
  • C/C no, CDAB allowed, but not ACBD, ACDB, etc.

13
Another example
  • Dijkstras guarded-command conditionals

if x 2 1 -gt y x - 1 x lt 10 -gt y
7 x gt 10 -gt y 0 fi
  • We might still expect a particular language
    implementation (compiler) to be deterministic
  • May choose any deterministic result consistent
    with the nondeterministic semantics
  • Presumably doesnt change choice across
    executions, but may across compiles (including
    butterfly effects)
  • Our work does this

14
Why helpful?
  • So programmer gets a deterministic executable,
    but doesnt know which one
  • Key degree of freedom for automated performance
  • Still helpful for
  • Whole-program testing and debugging
  • Automated replicas
  • In general, repeatability and reducing possible
    executions

15
Define deterministic, part 1
  • Deterministic outputs depend only on inputs
  • Thats right, but means must clearly specify what
    is an input (and an output)
  • Can define away anything you want
  • Example All syscall results are inputs, so
    seeding the pseudorandom number generator with
    time-of-day is deterministic
  • We mean what you think we mean
  • Inputs command-line, I/O, syscalls
  • Not inputs cache state, hardware timing, thread
    scheduler

16
Terminology
  • Essential perspectives, not just definitions
  • Parallelism vs. concurrency
  • Or different terms if you prefer
  • Sequential semantics vs. determinism vs.
    nondeterminism
  • What is an input?
  • Level of abstraction
  • Which one do you care about?

17
Define deterministic, part 2
  • Is it deterministic? depends crucially on your
    abstraction level
  • Another obvious easy-to-forget thing
  • Examples
  • File systems
  • Memory-allocation (Java vs. C)
  • Set implemented as a list
  • Quantum mechanics
  • Our work
  • The language level state of logical memory,
    program output
  • Application may care only about a higher level
    (future work)

18
Okay how?
  • Trade-off between complexity and performance

PERFORMANCE
COMPLEXITY
  • Performance
  • Overhead (single-thread slowdown)
  • Scalability (minimize extra synchronization,
    waiting)

19
Starting serial
  • Determinization is easy!
  • Run one thread at a time in round-robin order
  • Context-switch after N basic blocks for
    deterministic N
  • Cannot use a timer use compiler and run-time
  • Races in source program are irrelevant locks
    still respected
  • Example with 3 threads running (time moves with
    arrows)

T1
T2
T3
1 quantum
1 round
20
Parallel quanta
  • The quanta in a round can start to run in
    parallel provided they stop before any
    communication occurs (see how next)
  • So each round has two stages, parallel then serial

T1
T2
T3
Parallel stage ends with global barrier
load A
load A
Serial stage ends next round starts
store B
store C


21
Is that legal?
T1
T2
T3
load A
load A
store B
store C
  • Can produce different result than serial
    execution
  • In fact, execution not necessarily equivalent
    with any serialization of quanta
  • But it doesnt matter as long as we are
    deterministic! Just need
  • Parallel stages do no communication
  • Parallel stages end at deterministic points

22
Performance
T1
T2
T3
load A
load A
store B
store C
  • Keys to scalability
  • Run almost everything in the parallel stage
  • Keep quanta balanced
  • Assume (1), use rough instruction costs

23
Memory ownership
  • To avoid communication during parallel stage
  • Every memory location is shared or owned by 1
    thread T
  • Dynamic table checked and updated during
    execution
  • Can read only memory that is shared or
    owned-by-you
  • Can write only memory owned-by-you
  • Locks just like memory locations blocking ends
    quantum
  • In our example, perhaps A is shared, B and C are
    owned by T2

T1
T2
T3
load A
load A
store B
store C
24
Changing ownership
  • Policy
  • For each location (any deterministic granularity
    is correct),
  • First owner is first thread to allocate in the
    location
  • On read in serial stage, if owned-by-other set to
    shared
  • One write in serial stage, set to owned-by-self
  • Correctness
  • Ownership immutable in parallel stages (so no
    communication)
  • Serial-stage changes are deterministic
  • So many, many polices are correct
  • Chose the obvious one for temporal locality
    read-sharing
  • Must have good locality for scalability!

25
Overhead
  • Significant overhead
  • All reads/writes consult ownership information
  • All basic blocks subtract from a thread-local
    quantum counter
  • Reduce via
  • Lots of run-time engineering and data structures
    (not too much magic, but most important)
  • Obvious compiler optimizations like escape
    analysis and hoisting counter-subtractions
  • Specialized compiler optimizations like
    Subsequent Access Optimization Dont recheck
    same ownership unless a quantum boundary might
    intervene.
  • Correctness of this is a subtle argument and
    slightly affects the ownership-change policy
    (deterministically!)

26
Brittle
  • Change any line of code, command-line argument,
    environment variable, etc. and you can get a
    different deterministic program ?
  • We are mostly robust to memory-safety errors ?,
  • except ?
  • Bounds errors that corrupt ownership information
  • Bounds errors that write to another threads
    allegedly-thread-local data

27
Results
  • Overhead Varies a lot, but about 3x at 8 threads
  • Scalability Varies a lot, but on average with
    parsec suite ()
  • nondet 8 threads vs. nondet 2 threads 2.4
    (linear 4)
  • det 8 threads vs. det 2 threads
    2.0
  • det 8 threads vs. nondet 2 threads 0.91
    (range 0.41 - 2.75)
  • How do you want to spend Moores Dividend?
  • subset runnable no mpi, no C exceptions, no
    32-bit assumptions

28
Buffering
  • Actually, ownership is only one approach
  • Second approach relies on buffering and a commit
    stage
  • Even higher overhead (to consult buffers)
  • Even better scalability (block only for
    synchronization commits)
  • And a third hybrid approach
  • Hopefully more details soon

29
Conclusion
  • The fundamental assumption that nondeterministic
    shared-memory programs must be run
    nondeterministically is false
  • A fun problem to throw principled compiler and
    run-time optimizations at.
  • Could dramatically change how we test and debug
    parallel and concurrent programs
  • Most-related work
  • Kendo from MIT done concurrently (in parallel?
    ?), requires knowing about data races statically,
    different approach
  • Colleagues in ASPLOS09 hardware support for
    ownership
  • Record replay systemswe can replay without the
    record
Write a Comment
User Comments (0)
About PowerShow.com