Memoizing MultiThreaded Transactions - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Memoizing MultiThreaded Transactions

Description:

Advent of multi-core architectures encourages development of new applications ... function is stalled, we can fail, and resume execution from the stall point ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 34
Provided by: csPu
Category:

less

Transcript and Presenter's Notes

Title: Memoizing MultiThreaded Transactions


1
Memoizing Multi-ThreadedTransactions
  • Suresh Jagannathan

joint work with Suresh Jagannathan and Jeremy
Orlow
2
Motivation
  • Advent of multi-core architectures encourages
    development of new applications and abstractions.
  • Concurrent stream-based programs
  • Avoid recomputing outputs for previously seen
    inputs
  • Speculative computation
  • Reuse results computed by completed applications
    within a failed speculation
  • Transactions
  • Substantial amount of wasted work when a
    (long-lived) transaction aborts
  • Reduce this overhead by avoiding re-execution of
    computation not affected by the reasons for
    commit failure
  • Apply well-known sequential optimizations
    Memoization

2
3
Programming Model
  • Pure CML
  • First-class threads
  • Message-based synchronous communication
  • First-class synchronous events
  • All effects manifest through channel
    communication
  • No polling
  • Augmented with atomic
  • No constraints on atomic regions
  • Allow thread creation multi-threaded
    transactions
  • Nested atomic regions nested transactions
  • Strong atomicity

4
Memoization
  • Consider a pure function
  • When applied to v
  • If we apply f to v again, can simply return the
    previous result, without having to re-evaluate
    fs body

fun f(x) e
4
5
Memoization
  • What if f has side-effects?
  • Cannot simply elide the second call

val x ref 0 fun f(y) ... x !x y ...
f(v) value of x is v f(v) value
of x is now 2v
-gt v
-gt v
5
6
Observation potential solution
  • Can split function body into two parts
  • Expressions that have no effect, nor depend on
    any effectful expression
  • Effectful expressions
  • Record effectful computations (and any expression
    that has a dependence with them) in the memo
    table
  • When applying a memoized function
  • Avoid execution of effect-free expressions
  • Such expressions have no side-effect, nor depend
    on expressions that do
  • Execute all effectful expressions
  • Return value is
  • value stored in memo table if the return
    expression was effect-free
  • value yielded by evaluating the return
    expression otherwise

7
Challenges
  • References provide a form of implicit, non-local
    communication
  • Should only re-evaluate effect-dependent
    computation when necessary
  • Does the problem become more tractable when
    communication is explicit?

let val x ref true
fun f y ... if (!x) then e else
e in f v
... f v
end
? re-evaluation necessary only if x is modified
7
8
Memoization and Concurrency
Scheduling decisions introduce non-determinism of
thread interleavings, making it non-trivial to
draw such conclusions
9
Example
T3
f(v)
10
Tracking Communication
let val (c1,c2) (mkCh(),mkCh()) fun f()
(... send(c1,v1) ...) fun g()
(recv(c1) send(c2,v2) ... g()) in spawn(g)
f() recv(c2) f() end
g()
What if there is no waiting receiver for the
send performed by f?
Should enforce a schedule that allows the thread
computing g() to proceed to the recv on c1
11
An Approach
  • Maintain a memo store that records communication
    actions performed within a procedure through
    constraints.
  • Constraints ensure communication and
    synchronization take place in a specific order.
  • At a call, consult the memo store.
  • If all constraints are satisfiable in the current
    global state, elide the call.
  • Otherwise, explore the state space of possible
    interleavings to discover a global state in which
    remaining constraints can be satisfied.
  • Fail if no such state exists.

12
Challenges
  • Finding an interleaving that satisfies all
    dependencies may involve unbounded search
  • Even when a schedule is discovered, starvation
    may be introduced
  • Similar situation arises for recv
  • Want to utilize memoization without compromising
    fairness guarantees

let val c mkCh() fun p1() (send(c,1)
p1()) fun p2() (send(c,2) p2()) fun
f() (recv(c) ...) fun g() (f()...,
g()) in spawn(p1) spawn(p2) spawn(g) end
13
Partial Memoization
  • Utilizing memoized information requires
    discovering a path in the state space in which
    memoization constraints can be discharged.

let val (c1,c2) (mkCh(), mkCh()) fun f()
(send(c1,v1) ... recv(c2)) fun g()
(recv(c1) ... recv(c2) g()) fun h()
(... send(c2,v2) send(c2,v3) ... h())
fun i() (recv(c2) i()) in spawn(g) spawn(h)
spawn(i) f() ... send (c2,v3) ...
f() end
Instead, match send constraint, elide pure
computation upto recv(c2), and resume execution
14
Implementation
  • Incorporated within MLton
  • insertion of barriers to monitor function
    arguments and returns
  • hooks into CML to monitor channel communication
    and to record constraints
  • Constraint matching can fail on a receive
    constraint
  • Receive constraints are obligated to read a
    specific value
  • Send constraints can only fail if
  • there are no matching receive constraints on the
    sending channel or
  • no receive operations on the same channel
  • A receive operation (not constraint) is
    ambivalent about the value it reads
  • When an application of a memoized function is
    stalled, we can fail, and resume execution from
    the stall point
  • Heuristic record the number of context switches
    to a thread attempting to discharge a constraint.

15
Case Study
  • STM-Bench7
  • A tunable multi-threaded benchmark designed to
    compare different software transactional memory
    (STM) implementations and designs.
  • Simulates data storage and access patterns of a
    CAD/CAM application
  • Benchmark builds a tree of assemblies
  • leaves contain bags of components
  • components form highly-connected graphs of
    atomic parts
  • Roughly 1.5K lines of CML
  • Nodes in the graph are represented as
    message-passing servers
  • Receiving channel for input
  • Output channels to connect to adjacent nodes in
    the tree

16
Example
Traverses the graph, and changes a components
height
Establishes a transaction
Searching the graph for different components can
be performed concurrently
Memoization helps avoid unnecessary re-traversal
of the graph if the transaction fails.
17
Results
  • Consider two configurations of the benchmark
  • Transactional Use STM without memoization
  • Memoized Use STM with memoization of atomic
    sections
  • Goal
  • Measure performance improvement as a function of
    transaction aborts
  • Parameters
  • A graph of 1M nodes
  • 280K complex assemblies
  • 140 assemblies
  • bags reference one of 100 components, each
    containing 100 nodes
  • Execution creates roughly 500K threads, and 1M
    channels
  • Each transaction performs 7 channel operations
    on average, and traverses roughly 20 nodes of the
    parts graph

18
Runtime Improvement
19
Runtime Improvement
20
Related Work
  • Self-adjusting computation and change propagation
  • Leverages memoization to automatically alter a
    programs execution to a change of inputs given
    an initial execution run.
  • Key distinction no maintenance of dynamic
    dependence graphs.
  • Effectiveness of memoization only dependent on
    values stored in constraints, not where those
    values came from.
  • Transactional Events
  • Require arbitrary look-ahead to determine if a
    complex transactional event can commit.
  • Similar property is necessary to determine if a
    call can be elided based on communication actions
    performed by its memoized version.
  • Selective memoization
  • Addresses a complementary problem that can be
    used to improve memoization efficiency.

21
Conclusions
  • Memoizing communication can be an effective
    dynamic optimization to improve re-execution
    overheads for optimistic or speculative
    concurrency abstractions.
  • Partial memoization allows these techniques to be
    useful in practice.
  • Future Work
  • Opportunities for static analysis
  • Detect communication patterns to aggregate
    (bundle) constraints
  • Identify partial memoization points
  • Runtime profiling

22
Questions?
23
STM
  • We implement an eager-versioning, lazy conflict
    detection STM protocol.
  • Isolation and atomicity guarantees within a
    transaction
  • Shared references implemented in terms of
    channel-based communication
  • Track updates to channels in the same way that
    updates to shared memory is tracked by a typical
    STM
  • Build an STM-aware shared-memory server
    abstraction on top of channel communication
  • The STM supports nested, multi-threaded
    transactions
  • Multiple threads within a transaction must join
    at commit point before transaction can complete
  • Memoization helps reduce abort overheads in the
    presence of communication among threads within
    the transaction

24
Schedulability
  • Reasoning about whether a feasible schedule
    exists is typically more difficult.

let val (c1,c2) (mkCh(), mkCh() fun f()
(... send(c1,v1) ... recv(c2)) fun g()
(recv(c1) recv(c2) ... g()) fun h()
(send(c2,v2) send(c2,v3)
h()) in (spawn(g) spawn(h)
f() ... f()) end
25
Utilization
26
Example
let val ch mkCh() fun f() let val _
recv(ch) val _ recv(ch)
in () end
fun g() let val _ send(ch,1)
val _ send(ch,2) in
() end fun f()
(spawn(f) f()) fun g() (spawn(g)
g()) in (spawn(f) spawn(g)) end
g()
send(ch,1)
send(ch,2)
send(ch,2)
send(ch,2)
send(ch,1)
g()
g()
g()
g()
  • Four possible memoized versions of f, one for
    each pair of values it may receive.
  • Force a thread schedule that guarantees calls to
    g supply values recorded in memoized version of f.

27
Evaluation Rules
28
Evaluation Rules
29
Safety
  • Using memo information to elide calls only yields
    states realizable under non-memoized evaluation.
  • Introduce two auxiliary operators
  • transforms process states (and terms)
    defined under memo evaluation to process states
    and terms defined under non-memoized evaluation.
  • translates constraints in the memo store
    to core language terms.

If
then
30
Program States
  • s Memo store
  • Given an id (for a procedure), and an argument
    value, returns a set of constraints and a return
    value.
  • T Memo state
  • Associates a set of constraints with a call.
  • C Constraint
  • for a communication operation
  • channel location, action (Send/Recv), value sent
    or received, continuation
  • for a spawn operation
  • thunk spawned
  • for a channel creation operation
  • channel location

31
Correspondence
  • Partial memoization is sound with respect to full
    memoization.
  • There exists a transition sequence from the
    global state yielded by a Fail transition to a
    global state representing successful discharge of
    all memoization constraints.

If
and
then
32
Utilization
33
Example
Memoized information about sclHgt can help elide
the first call on re-execution if -- arguments
remain the same -- the object yielded by the
traversal has not changed Can elide the second
call if -- communication via channel c2 is
consistent with previous execution as determined
by behavior of the first call to sclHgt
Write a Comment
User Comments (0)
About PowerShow.com