Title: Inferring Locks for Atomic Sections
1Inferring Locks for Atomic Sections
Sigmund Cherem
Trishul Chilimbi
Sumit Gulwani
Cornell University (summer intern at Microsoft
Research)
Microsoft Research
Microsoft Research
2What Is This Talk About?
- Multi-cores widely available
- Developing concurrent software is not trivial
- Many challenges parallelization, synch.,
isolation - Manual locking is error prone, non compositional
- Recent proposal atomic sections
- Raising the level of abstraction, is
compositional - Optimistic (transactions) implementations
- Herlihy, Moss ISCA93 Hammond et al. ISCA04
Shavit, Touitou PDC95 Dice et al. DISC06
Fraser, Harris TOPLAS07 - Limitations non-reversible ops, overhead
- This talk compiler support for atomic sections
via pessimistic concurrency
3Static Lock Inference Framework
- Compiler support for atomic sections based on
pessimistic concurrency - Prevent conflicts using locks, no deadlocks
- Goal reduce contention while avoiding deadlocks
Lock Inference Compiler
Concurrent program with atomic sections (runs on
STM)
Same program with locks for implementing atomic
sections
- Specifies where, but not how
- Lightweight runtime support (locking library)
- Automatically supports non-reversible ops.
4Moving List Elements
move (list to, list from) atomic elem
x to-gthead elem y from-gthead
from-gthead null while (x-gtnext !
null) x x-gtnext x-gtnext y
to
from
head
head
x
y
5Moving List Elements
move (list to, list from) atomic elem
x to-gthead elem y from-gthead
from-gthead null while (x-gtnext !
null) x x-gtnext x-gtnext y
to
from
head
x
y
6Attempt 1 Global Lock
move (list to, list from) elem x
to-gthead elem y from-gthead
from-gthead null while (x-gtnext !
null) x x-gtnext x-gtnext y
to
from
acquire( GLOBAL )
head
x
y
Global lock protects entire memory
Problem with Attempt 1 No parallelism with any
other atomic sections
release( GLOBAL )
7Attempt 2 Fine-Grain Locks
move (list to, list from) elem x
to-gthead elem y from-gthead
from-gthead null while (x-gtnext !
null) x x-gtnext x-gtnext y
releaseAll()
to
from
acquire( (to-gthead) )
head
acquire( (from-gthead) )
x
y
acquire( (x-gtnext) )
acquire( (x-gtnext) )
8Attempt 2 Fine-Grain Locks
move (list to, list from) elem x
to-gthead elem y from-gthead
from-gthead null while (x-gtnext !
null) x x-gtnext x-gtnext
y releaseAll()
to
from
acquire( (to-gthead) )
head
acquire( (from-gthead) )
x
y
Problem with Attempt 2 may lead to deadlock
acquire( (x-gtnext) )
acq((a-gthead) ) // deadlock here acq((b-gthead
) )
acquire( (x-gtnext) )
acq((b-gthead) ) acq((a-gthead) )
9Attempt 3 Fine-Grain Locks at Entry
move (list to, list from) while
(x-gtnext ! null) x x-gtnext
x-gtnext y releaseAll()
to
from
acquireAll( )
acquire( (to-gthead) )
elem x to-gthead
head
acquire( (from-gthead) )
elem y from-gthead
from-gthead null
x
y
acquire( (x-gtnext) )
Challenge 1 Protect locations ahead of time (at
entry of atomic), i.e., find which addresses will
be used inside atomic
acquire( (x-gtnext) )
10Protect when Entering Atomic Block
- Find corresponding expressions
- Acquire a lock for each shared location
accessed within the atomic section, expressed in
terms of expressions valid at the entry of the
atomic block
atomic list x y5 list d x
d-gthead NULL
acquire( (y5-gthead) )
acquire( (x-gthead) )
acquire( (d-gthead) )
Contribution 1 Identifying appropriate
fine-grain locks at entry (via inter-procedural
backward data-flow analysis)
11Attempt 3 Fine-Grain at Entry
move (list to, list from) acquireAll(
) elem x to-gthead
elem y from-gthead from-gthead null
while (x-gtnext ! null) x
x-gtnext x-gtnext y releaseAll()
to
from
(to-gthead)
(from-gthead)
head
(to-gthead-gtnext)
head
Problem with Attempt 3 Cant protect unbounded
number of locations
12Attempt 4 Multi-Grain Locks at Entry
move (list to, list from) acquireAll(
) elem x to-gthead
elem y from-gthead from-gthead null
while (x-gtnext ! null) x
x-gtnext x-gtnext y releaseAll()
to
from
head
head
Challenge 2 Mixing locks of multiple
granularities while avoiding deadlocks
13Defining Multi-Grain Locks
- A fine-grain lock protects a single location
- A coarse-grain lock protects a set of locations
- Any traditional heap abstraction can be used to
define coarse-grain locks - E.g. types, points-to sets, shape abstractions
- Our compiler framework is parameterized
- Clients can specify the kind locks they want to
use
14Mixing Locks of Multiple Granularities
Memory locations
- Borrow Databases locking protocol based on
intention locks Gray 76
Global lock
Coarse-grain locks
Fine-grain locks
Contribution 2 We allow mixing locks of
multiple granularities and avoid deadlocks
15Soundness Results
- Sound locking structure provided
- Protected by child is alsoprotected by parent
- Map of expressions to locks
- Bounded (for termination)
- Soundness Theorem
- Compiler chooses set of locks protecting all
memory accesses within atomic block
(-gtnext)
(to-gthead-gtnext)
(to-gthead-gtnext-gtnext)
Contribution 3 Framework is sound (for any
sound lock structure instantiation)
16Experimental Evaluation
- Lock structure instance 3-level locks effects
- Experiments
- Concurrent data-structures rb-tree, hashtable
- Concurrent get (read-only), put, and remove
operations - 1.86Gz Intel Xeon dual-quad core machine
Global lock
rw
Points-to set locks Steensgards 96
ro
rw
Expression locks (limited in size)
ro
17Scalability Results
70 60 50 40 30 20 10 0
Execution time (sec)
1 2 3 4 5 6
7 8
Number of threads
18TH (rb-tree hash w/rehash) 80 gets
70 60 50 40 30 20 10 0
Global lock (exclusive) doesnt scale
Execution time (sec)
Scalability comparable to STM
Compiler didnt use fine-grain locks
1 2 3 4 5 6
7 8
Number of threads
19TH (rb-tree hash w/rehash) 80 puts
High contention from re-hashing degrades STM
performance
70 60 50 40 30 20 10 0
Execution time (sec)
2 coarse-grain (exclusive) locks are better than
a single global lock
1 2 3 4 5 6
7 8
Number of threads
20simple-hashtable 80 gets
45 40 35 30 25 20 15 10 5 0
Compiler didnt use fine-grain locks for gets
Execution time (sec)
STM allows put and get concurrently
1 2 3 4 5 6
7 8
Number of threads
21simple-hashtable 80 puts
45 40 35 30 25 20 15 10 5 0
Compiler uses fine-grain locks for puts
Execution time (sec)
1 2 3 4 5 6
7 8
Number of threads
22Differences with Recent Work
- No programmer annotations (other than atomic)
- Autolocker McCloskey et al POPL06 requires
programmer annotations to choose appropriate
granularity - Moving fine-grain lock acquisitions to entry of
atomic - Acquiring fine-grain locks right before first use
Hindman, Grossman MSPC06 is not fully
pessimistic - may generate deadlocks and need rollbacks
- Multi-grain locks without deadlocks
- Several pessimistic approaches use coarse-grained
locks only Hicks et al 06 Halpert et al. 07
Emmi et al.07
23Conclusions and Future Work
- Lock inference framework for atomic sections
- Multi-grain locks to reduce contention and avoid
deadlocks - Soundness accesses are protected, atomicity
preserved - Validation resulting performance depends on
application - Locks preferable for non-reversible ops. or
high-contention - Future directions
- Better locking hierarchy instantiations (e.g.
ownership) - Optimizations (e.g. delay lock acquisitions)
- Hybrid systems (e.g. compiler support to optimize
STMs)
24?