Title: A Qualitative Survey of Modern Software Transactional Memory Systems
1A Qualitative Survey of Modern Software
Transactional Memory Systems
- Virendra J. Marathe
- Michael L. Scott
2Concepts and Background Non-blocking
Synchronization Algorithms
- Wait-freedom all processes contending for a set
of objects make progress in a finite number of
steps. This rules out deadlock and starvation. - Lock-freedom at least one process makes
progress. This rules out deadlock but not
starvation. - Obstruction-freedom guarantees progress of a
process in absence of contention. Rules out
deadlocks, but livelocks are possible.
3Concepts and Background Non-blocking
Synchronization Algorithms
- Blocking vs. non-blocking
- The wait state disappears in non-blocking
- No deadlock, priority inversion or convoying in
non-blocking - Livelock can be addressed via contention
management - Tradeoffs between -freedom properties
flexibility, simplicity and performance vs.
desirability (strongest property)
4Transactions and STM
- Transaction (Tx) sequence of instructions that
atomically modifies a set of concurrent objects - Transaction satisfies linearizability and
atomicity properties remember ACID (atomicity,
consistency, isolation, durability) - Software Transactional Memory (STM) generic
non-blocking synchronization construct
5Original STM
- A transaction updates a concurrent object only
after declaring its intention system-wide
(transaction is owner of the object) - Atomic acquiring/release of ownership CAS, LL/SC
- At most one transaction at a time can own an
object ownership records - An ownership record is null or points to its
owner's transaction record
6Original STM Shared Data Structures
7Original STM
- Tx commits only if it acquires all desired
ownerships - Otherwise, it aborts and releases all its
ownerships - On success change state to COMMITTED, make
updates, and release ownerships (mCAS update) - Avoiding livelock non-recursive helping
mechanism based on total global ordering - Limitations double memory space reqs., and
pre-knowledge of all objects accessed is required
to ensure ordering
8Hashtable-Based STM
- Hashtable used to store ownership records (orecs)
- STMStart, STMRead, STMWrite, STMAbort, STMCommit,
STMValidate, STMWait - 3 main data structures
- Application Heap
- Hashtable of orecs
- Transaction descriptors
9Hash STM STM Heap showing an Active Transaction
10Hash STM
- Acquiring orecs only takes place during STMCommit
(commit multi-word CAS) - Commit
- Acquire all desired ownerships
- Set status to COMMITTED
- Release phase write back new value/version
number in memory/orec - Conflicts when a Tx finds another Tx's
descriptor in one of the orecs it reads
(STMRead/Write) or acquires (STMCommit)
11Hash STM Conflict Resolution
- Read conflict if conflicting Tx is ACTIVE, we
abort it -gt hence, obstruction-free design - Acquire conflict if conflicting Tx is ACTIVE, we
abort it otherwise, we could try to help the
other Tx - But ! helping causes a lot of contention gt
stealing we copy merge the conflicting Tx's
orecs into our descriptor
12Hash STM Conflict Resolution
- Stale updates during release, replace a newer
value with an older one when stealer Tx1 makes
its updates before (older) updates of the victim
Tx2 - Solution redo current Tx redoes the updates
from the stolen orec iff stealer is no longer in
the ACTIVE state
13Hash STM Contention Management
- Tx1 aborts conflicting Tx2 aggressive policy
- But polite contention management is more
efficient ! - Do not abort the other backoff (exponential),
and only abort the other after maximum backoff
limit reached
14Hash STM Memory Blow-up
- During stealing, Tx merges all the orecs from the
other Tx (including orecs that it doesn't need) - Scalability issue for the merging step !
- Moreover, this false sharing leads to merging
long chains in a transaction descriptor - This may become unacceptable in moderate/high
contention - More side-effects
- Release phase becomes longer
- Long chains may thrash cache
15Hash STM LL/SC Approach
- Replace merge-redo (which requires mCAS usually
unavailable) with helping - Instead of merging, the stealer writes the
updates to memory from the conflicting Tx's
descriptor - Writing takes place as follows
- LL on the target memory location
- Double-check the orec (was it stolen in the
meantime ?) - Do an SC to the memory location
16Hash STM LL/SC Approach
- Benefits
- Reduced and simplified data structures (ref.
counts not needed anymore) - Greatly reduced complexity in the stealing
process - Significantly diminished space overhead of the
hashtable - Reduced cache thrashing
- Eliminates memory blow-up problem
17Object-based STMs
- Object level synchronization
- Better than word-based STMs especially for
dynamic data structures - Word-based STMs better for higher levels of
granularity - Conventional approaches use synchronization
- But this is difficult and error-prone for
complicated structures like (red-black) trees
18Dynamic STM (DSTM)
Transactional Memory Object (TM Object) Structure
19DSTM Design The Locator
- Most recent valid version of data object is
determined by the state of the most recently
modifying Tx - Locator vs orec
- Locator is referenced by a TM Object, orec is
found through a hash function - Locator points to old new versions of object
orec points to a transaction descriptor which
contains them - Locator does not require a version number it
stores a pointer to the most recent valid version
of the object
20DSTM Opening a TM Object
Opening of a TM Object recently modified by a
committed transaction
21DSTM
- Data access is only through TM Objects
- In case of conflict while opening TM Objects
one of the two transactions is aborted (early
conflict resolution) - After updating the new version, a Tx tries to
replace the old locator with the new one (CAS) - Contention management protocol abort itself or
the other Tx, aggressive/polite
22DSTM Early Release
- Release an open object before committing to
reduce contention - Very helpful for tree-like structures
- Many transactions require only read access
- These would cause unnecessary contention
- So use separate semantics for read-only
transactions
23DSTM Early Release
- DSTM uses a separate read-list of objects open in
read-only mode - Not visible/ accessible from the TM Object
locators - The others Txs are unaware of this Txs
read-list, so they may change some of the objects - To avoid inconsistencies, incremental validation
is used - Verify for consistency all objects in the
read-list before opening a TM Object - Also validate before committing
24FSTM
The basic Transactional Memory Structure in FSTM
25FSTM
- Concurrent objects wrapped in object headers
- Transactions access objects by opening object
headers - Transaction descriptors maintain lists of in-use
concurrent objects - Read-only list and read-write list contain object
handles - Object handle contain a shadow copy (local to
each Tx), upon which all updates are made
26FSTM Accessing Objects
- Tx states UNDECIDED, ABORTED, COMMITTED,
READ-CHECKING - Open object using object header
- Create object handle in Txs descriptor and
place in appropriate list (read-only/read-write) - Tx becomes visible to others only during commit
- So conflicts appear only with other Txs that are
trying to commit
27FSTM Commit Operation
- Is a multi-word CAS, with 3 phases
- Acquire phase Tx gains exclusive ownership of
opened objects - Decision point Tx commits or aborts
- Release phase Tx releases ownership of all
acquired objects
28FSTM Conflict Resolution
- Conflict resolution a total global ordering is
used to acquire concurrent objects - If we conflict with a committed Tx, we abort
- If we conflict with an uncommitted Tx, we help it
(recursive helping) - Cyclical recursive helping is prevented by the
total global ordering - But we may still have livelocks !
29FSTM The Read Phase
- To avoid contention, we do not acquire and
release objects opened in read-only mode - Instead, we simply check the read-only list for
consistency upon committing (have they changed ?)
the read phase - Upon conflict with UNDECIDED/ABORTED Tx, we check
its read-write list for consistency - This may lead to non-serializability, and is
avoided by the additional READ-CHECKING state
30FSTM The Read Phase
- Tx1 in READ-CHECKING
- Tx2 UNDECIDED in case of inconsistency, Tx1
aborts (doesnt help Tx2) - Tx2 READ-CHECKING Tx1 helps or aborts Tx2,
based on global ordering of transactions - Lower global numbers abort higher numbers
- Higher global numbers help their predecessors
31Qualitative Comparison 1 Object Acquire Semantics
- DSTM Tx acquires an object using a CAS, hence Tx
becomes visible early (eager acquire) - FSTM, HashSTM acquiring is done at commit time
(lazy acquire) - Eager semantics -gt early conflict detection -gt
early conflict resolution (good) - Lazy acquire -gt long transactions (bad)
- But ! Eager acquire may lead to unnecessary
aborts - B aborts A, C aborts B, but C had no conflicts
with A, so A was aborted unnecessarily
32Qualitative Comparison 1 Object Acquire Semantics
- HashSTM can be modified to use eager acquire
semantics - DSTM can be modified to use lazy acquire
- FSTM cannot be changed to use eager acquire
though (because lock-freedom is guaranteed by the
global ordering) - To preserve that, wed need pre-knowledge about
all objects a Tx will access - Proof that obstruction-freedom is flexible,
whereas lock-freedom is not
33Qualitative Comparison 2 Indirection Overhead
- To update N objects (with no contention)
- DSTM requires N1 CAS ops
- FSTM and HashSTM require 2N1
- Cause an additional level of indirection in DSTM
- This will result in slower transactions in DSTM,
but also in cheaper commit operations
34Qualitative Comparison 3 Space Usage
- DSTM requires more than twice the space of FSTM
for an object - This may be bad for very large concurrent objects
because much space would be used up by invalid
copies
35Qualitative Comparison 4 Search Overhead
- FSTM, HashSTM maintain lists of acquired objects
that have to be parsed/search at commit, to get
to the object that there is a conflict for - This overhead does not exist in DSTM
- HashSTM could be improved by adding an extra
pointer in the lists
36Qualitative Comparison 5 Contention Management
- FSTM uses recursive helping to ensure
lock-freedom - However obstruction-freedom allows for greater
simplicity and flexibility - Helping also produces high contention for cache
blocks among processors - Need empirical comparison between contention
management and helping, though
37Qualitative Comparison 6 Transaction Validation
- DSTM incremental validation performs validation
for each STM operation - Great for safety/consistency and for making
programmers life easier, but has some overhead - FSTM provides a separate function for Tx
validation to the programmer - Also proposes a scheme to reduce incremental
validation cost
38Conclusions
- Presented/discussed three modern STMs
- Need experiments to quantitatively evaluate the
tradeoffs presented in our qualitative comparison - Need to study which data structures are best
suited to which STM system - Need to compare performance vs. performant
locking-based algorithms