Title: Hardware Transactional Memory
1Hardware Transactional Memory
Instructor Yehuda Afek Tel Aviv University
10/06/2007
2References
- Thread-Level Transactional Memory
- Kevin E. Moore, Mark D. Hill David A. Wood
2005 - LogTM Log-based Transactional Memory
- Kevin E. Moore, Jayaram Bobba, Michelle J.
Moravam, Mark D. Hill David A. Wood 2006
3Outline
- Locks Vs. Transactional Memory
- Introduction to LogTM
- LogTM Version Management
- LogTM Conflict Detection
- Conclusions
4The Challenge of Multithreaded SW
- Goal Parallelization
- Problem Unrestricted concurrency ? bugs
- Solution Synchronization
- New problem Synchronization
- Tension between performance and correctness
5Current Mechanism Locks
- Locks objects only one thread can hold at a time
- Organization lock for each shared structure
- Usage (block) ? acquire ? access ? release
- Correctness issues
- Under-locking ? data races
- Acquires in different orders ? deadlock
- Performance issues
- Conservative serialization
- Overhead of acquiring
- Difficult to find right granularity
6Transactions vs. Locks
- Lock issues
- Under-locking
- Acquires in different orders
- Blocking
- Conservative serialization
- How transactions help
- Simpler interface
- No ordering
- Can cancel transactions
- Serialization only on conflicts
Locks ? simplicity/performance tension Transaction
s ? (potentially) simple and efficient
7Transaction Semantics -ACI Properties
- Atomicity All or Nothing
- Consistency Correct at beginning and end
- Isolation Partially done work not visible to
other threads
8Thread-Level Transactional Memory
- Separate semantics from implementation
- Adapt DBMS(database management systems) concepts
- Concurrency control algorithms
- Conflict detection
- Taking the appropriate action (commit\abort\delay)
Main challenge Reduce the overhead of enforcing
the ACI properties!
9Basic Idea
- Module TM like virtual memory
- A thread level abstraction
- Use 3 types of interfaces User, System\Library,
Low-level - An interface independent of implementation
- Combine HW and SW in implementation
10How Do Transactional Memory Systems Differ?
- (Data) Version Management
- Keep old values for abort AND new values for
commit - Eager record old values elsewhere update in
place - Lazy update elsewhere keep old values in
place - (Data) Conflict Detection
- Find read-write, write-read or write-write
conflictsamong concurrent transactions - Eager detect conflict on every read/write
- Lazy detect conflict at end (commit/abort)
? Fastcommit
? Less wasted work
11Outline
- Locks Vs. TM
- Introduction to LogTM
- LogTM Version Management
- LogTM Conflict Detection
- Conclusions
12Log Based Transactional Memory LogTM
- (Hardware) Transactional Memory promising
- Most use lazy version management
- Old values in place
- New values elsewhere
- Commits slower than aborts
- New LogTM Log-based Transactional Memory
- Uses eager version management (like most
databases) - Old values to log in thread-private virtual
memory - New values in place
- Makes common commits fast!
- Hardware traps to Software handler
- Aborts handled in software
13Outline
- Locks Vs. TM
- Introduction to LogTM
- LogTM Version Management
- LogTM Conflict Detection
- Conclusions
14LogTMs Eager Version Management
- Old values stored in the transaction log
- A per-thread linear (virtual) address space (like
the stack) - Filled by hardware (during transactions)
- Read by software (on abort)
- New values stored in place
15Transaction Log Example
Data Block
VA
R W
- Initial State
- LogBase LogPointer
- TM count gt 0
12--------------
00
0
0
--------------23
40
0
0
34--------------
C0
0
0
1000
Log Base
1000
1040
Log Ptr
1000
1080
0
TM count
1
16Transaction Log Example
- Store r2, (c0) / r2 56 /
- Set W bit for block (c0)
- Store address (c0) and old data on the log
- Increment Log Ptr to 1048
- Update memory
Data Block
VA
R W
12--------------
00
0
0
--------------23
40
0
0
34--------------
56--------------
C0
0
0
1
34------------
1000
c0
Log Base
1000
1040
--
1000
1048
Log Ptr
1080
TM count
1
17Transaction Log Example
Data Block
VA
R W
- Commit transaction
- Clear R W for all blocks
- Reset Log Ptr to Log Base (1000)
- Clear TM count
12--------------
00
0
0
--------------23
40
0
0
56--------------
C0
0
0
1000
34------------
c0
Log Base
1000
1040
--
Log Ptr
1000
1048
1080
TM count
0
1
18Transaction Log Example
- Abort transaction
- Replay log entries to undo the transaction
- Reset Log Ptr to Log Base (1000)
- Clear R W bits for all blocks
- Clear TM count
Data Block
VA
R W
12--------------
00
0
0
--------------23
40
0
0
34--------------
C0
56--------------
0
0
1000
c0
Log Base
1000
1040
1000
1048
Log Ptr
1048
1080
1
TM count
0
19Eager Version Management Discussion
- Advantages
- Fast Commits
- No copying
- Common case
- Disadvantages
- Slow/Complex Aborts
- Undo aborting transaction
- Relies on Eager Conflict Detection/Prevention
20Outline
- Locks Vs. TM
- Introduction to LogTM
- LogTM Version Management
- LogTM Conflict Detection
- Conclusions
21LogTMs Eager Conflict Detection
- Requesting processor sends a coherence request to
the directory. - The directory responds and possibly forwards the
request to one or more processors. - Each responding processor examines some local
state to detect a conflict. - The responding processors each ack or nack the
request. - The requesting processor resolves any conflict.
22Conflict Detection
- Validation is retained by using the R,W bits and
the directory MOESI states. - A Sticky State is used to detect possible
conflicts from overflows
23Conflict Detection (example)
- P0 store
- P0 sends get exclusive (GETX) request
- Directory responds with data (old)
- P0 executes store
Directory
I old
M_at_P0 old
P1
P0
TM mode
TM mode
0
0
1
Overflow
Overflow
0
0
I (--) none
M (--) old
M (-W) new
I (--) none
24Conflict Detection (example)
- In-cache transaction conflict
- P1 sends get shared (GETS) request
- Directory forwards to P0
- P1 detects conflict and sends NACK
Directory
Fwd_GETS
M_at_P0 old
GETS
P1
P0
TM mode
TM mode
0
0
1
Overflow
Overflow
0
0
M (-W) new
M (-W) new
I (--) none
NACK
25Conflict Detection (example)
- Cache overflow
- P0 sends put exclusive (PUTX) request
- Directory acknowledges
- P0 sets overflow bit
- P0 writes data back to memory
Directory
PUTX
M_at_P0 old
Msticky_at_P0 new
ACK
DATA
P0
P1
TM mode
TM mode
0
0
1
Overflow
Overflow
0
1
0
M (-W) new
I (--) none
I (--) none
26Conflict Detection (example)
- Out-of-cache conflict
- P1 sends GETS request
- Directory forwards to P0
- P0 detects a (possible) conflict
- P0 sends NACK
Directory
M_at_P0 old
Msticky_at_P0 new
GETS
Fwd_GETS
P1
P0
TM mode
TM mode
0
0
1
Overflow
Overflow
0
0
1
1
I (--) none
I (--) none
M (--) old
M (-W) new
I (--) none
NACK
Conflict!
27Conflict Detection (example)
- Commit
- P0 clears TM mode and Overflow bits
Directory
M_at_P0 old
Msticky_at_P0 new
P1
P0
TM mode
TM mode
0
0
1
Overflow
Overflow
0
0
1
I (--) none
I (--) none
M (--) old
M (-W) new
I (--) none
28Conflict Detection (example)
- Lazy cleanup
- P1 sends GETS request
- Directory forwards request to P0
- P0 detects no conflict, sends CLEAN
- Directory sends Data to P1
Directory
Fwd_GETS
Msticky_at_P0 new
S(P1) new
GETS
CLEAN
DATA
P1
P0
TM mode
TM mode
0
0
0
Overflow
Overflow
0
0
0
I (--) none
I (--) none
M (--) old
M (-W) new
I (--) none
S (--) new
29LogTMs Conflict Detection w/ Cache Overflow
- At overflow at processor P
- Set Ps overflow bit (1 bit per processor)
- Allow writeback, but set directory state to
Sticky_at_P - At transaction end (commit or abort) at processor
P - Reset Ps overflow bit
- At (potential) conflicting request by processor R
- Directory forwards Rs request to P.
- P tells R no conflict if overflow is reset
- But asserts conflict if set (w/ small chance of
false positive)
30Conflict Resolution
- Conflict Resolution
- Can wait risking deadlock
- Can abort risking livelock
- Wait/abort transaction at requesting or
responding proc? - LogTM resolves conflicts at requesting processor
- Requesting processor waits (using coherence
nacks/retries) - But aborts if other processor is waiting
(deadlock possible) it is logically younger
(using timestamps) -
- Future Requesting processor traps to software
contention manager that decides who waits/aborts
31Outline
- Locks Vs. TM
- Introduction to LogTM
- LogTM Version Management
- LogTM Conflict Detection
- Conclusions
32Conclusion
- Commits are far more common than aborts
- Conflicts are rare
- Most conflicts can be resolved w/o aborts
- Software aborts do not impact performance
- Overflows are rare (in current benchmarks)
- LogTM
- Eager Version Management makes the common case
(commit) fast - Sticky States/Lazy Cleanup detects conflicts
outside the cache (if overflows are infrequent)
33QUESTIONS?
34Break Time!
35References
- LogTM Log-based Transactional Memory
- Kevin E. Moore, Jayaram Bobba, Michelle J.
Moravam, Mark D. Hill David A. Wood 2006 - Supporting Nested Transactional Memory in LogTM
- Michelle J. Moravam, Jayaram Bobba, Kevin E.
Moore, Luke Yen, Mark D. Hill, Ben Liblit,
Michael M. Swift David A. Wood 2006
36Motivation
- Till now Transactional Memory promises lock-free
atomic, consistent and isolated execution.
- But what should occur when a transaction
executes another transaction within ?
37LogTM enables flattening
- In the last lecture weve introduced LogTM which
enables subsuming inner transactions into the
top-level transaction. - A counter is used to count the nesting level,
Transaction_begin() increments and
Transaction_end() decrements. - A conflict on an inner transaction may cause a
complete abort to the beginning of the top-level
one.
38Challenges in nesting transactions
- Facilitating Software Composition.
- Enhancing Concurrency.
- Escaping to non-transactional systems.
39Facilitating Software Composition
- Calling modules that use locks within requires
caller knowledge of internal module
implementation details. - In order to aid modular programming,
transactional memory should support nesting.
40Challenges in nesting transactions
- Facilitating Software Composition.
- Enhancing Concurrency.
- Escaping to non-transactional systems.
41Enhancing Concurrency
- Closed nesting does not eliminate all problems
posed by modular software. - Concurrency is limited by maintaining isolation
until the top-level transaction commits.
42Example
P2
P1
Transaction L
Transaction T
conflict
conflict
Transaction S
pNextFree
Transaction S
M_at_P1
- How would you do it differently ?
Ideally S should release pNextFree so that other
transactions can access the allocator without
conflicting with transaction L.
43Challenges in nesting transactions
- Facilitating Software Composition.
- Enhancing Concurrency.
- Escaping to non-transactional systems.
44Escaping to non-transactional systems.
- Many TM systems will run on top of
non-transactional base systems that may include - Runtime libraries
- Operation systems
- Language virtual machines (e.g. JVM)
- STMs handle such escapes easily.
- An escape to non-transactional system must
disable HTM mechanisms to allow correct
operation. - Allow Inter-Transaction / Device communication.
45Outline
- Motivation and challenges.
- Closed vs. Open Nesting.
- Nested LogTM.
- Supporting Closed Nesting.
- Partial aborts.
- Supporting Open Nesting.
- Abort actions / Commit actions.
- Condition O1.
- Escape actions.
- Conclusions.
46Closed vs. Open nesting
- Closed Nested Transactions extends isolation of
an inner transaction until the top-level
transaction commits. - Open Nested Transactions allow committing inner
transaction to immediately release isolation.
47Closed Nested Transactions
- May flatten transactions into the top-level one
(as weve already seen) . - May allow partial roll-back.
48Open Nested Transactions
- Increase concurrency and expressiveness.
- May increase both SW HW complexity.
- Higher-level atomicity
- Childs memory updates not undone if parent
aborts - Use abort action to undo the childs forward
action at a higher-level of abstraction - E.g., malloc() compensated by free()
- Higher-level isolation
- Release memory-level isolation
- Programmer enforce isolation at higher level
(e.g., locks) - Use commit action to release isolation at parent
commit
49Outline
- Motivation and challenges.
- Closed vs. Open Nesting.
- Nested LogTM.
- Supporting Closed Nesting.
- Partial aborts.
- Supporting Open Nesting.
- Compensating actions / commit actions.
- Condition O1.
- Escape actions.
- Conclusions.
50Nested LogTM ?
- Nested LogTM extends Flat LogTM (last lecture).
- Splits the log into frames.
- Header contains Frame Pointer to the parents
Header. - Header contains register checkpoint.
Header
Undo record
Undo record
Header
Log Frame
Undo record
Level 1
Undo record
Log Ptr
51Nested LogTM ?
- Replicates R/W bits.
- Maintains a separate Read set, Write set for each
nesting level. - Use constant (k) number of R/W sets, and flatten
transactions whose nesting level is bigger than k.
52Closed Nested LogTM
On Commit
- Top Level Transactions commit normally.
t
If ( 1 lt curr_level k)
- Merge the current log frame with parents.
- Flash OR R/W bits of curr_level 1 with
curr_level s. - Decrement curr_level .
Otherwise
- Merge the current log frame with parents.
- Decrement curr_level .
53Closed Nested LogTM
Conflict detection
- An incoming read from memory location m conflicts
with another threads level j transaction if j is
the minimal level where block(m)s Write bit is
set. - An incoming write to memory location m conflicts
with another threads level j transaction if j is
the minimal level where block(m)s Write or Read
bit is set.
54Closed Nested LogTM
On Abort
- An abort of the current transaction at curr_level
traps to a software handler. - Suppose the transaction aborts for a conflict in
abort_level transaction. - The software handler walks the log frame
backwards and undoes curr_level abort_level 1
log frames. - Finally it restores the register state save in
header.
55frame pointer
end pointer
2, a
garbage header
6, c
4, b
- // thread i at level 0 (Non-transactional)
- a 2 b 4 c 6 // Initialize
- transaction_begin() // top-level (level 1)
- a b 1 // a gets 5.
- transaction_begin() // level 2
- c b 3 // c gets 1.
- b a 2 // b gets 7.
- a c 7 // a gets 8.
- transaction_commit() // level 2.
- transaction_commit() // level 1.
5, a
56Supporting Open Transactions
- When an open nested transaction Topen at level j
commits
- Its frame is discarded from the log.
- R/W bits for level j are cleared.
- (Optionally) Append commit and abort action
records, Copen and Aopen to the newly exposed end
of Topens parents frame.
57Commit and Abort Actions
- To ensure consistency, open nested transactions
must raise the abstraction level of both
isolation and rollback. - Commit actions are executed in FIFO order while
Abort actions are executed in LIFO order.
58frame pointer
end pointer
2, a
Aopen
6, c
4, b
- // thread i at level 0 (Non-transactional)
- a 2 b 4 c 6 // Initialize
- transaction_begin() // top-level (level 1)
- a b 1 // a gets 5.
- transaction_begin() // level 2
- c b 3 // c gets 1.
- b a 2 // b gets 7.
- a c 7 // a gets 8.
- transaction_commit() // level 2.
- transaction_commit() // level 1.
5, a
59Condition O1
- No Writes to Data Written by Ancestors
Neither an open transaction Topen nor its commit
and abort actions, Copen and Aopen writes any
data written by Topens ancestors.
60Example
counter 0 // initialize transacti
on_begin ( ) // top-level counter //
counter gets 1. open_begin ( ) // level
2 counter // counter gets 2. // commit
with an abort action. open_commit (
abort_action( decr(counter) ) ) .. // Abort
and run abort action // Expect counter to be
0. . transaction_commit() // not executed.
61Escape Actions
- Real world is not transactional
- Current OSs are not transactional
- Systems should allow non-transactional escapes
from a transaction - Interact with OS, VM, devices, etc.
62Escape Actions First Class
- Keep a per-thread Escape bit.
- Escape Actions read most recent values from
memory (Even uncommitted). - Escape Actions never aborts or stalls.
- Similar to Open Transaction, an escape action may
register Commit/Abort actions.
63Conclusions
- Closed Nesting is easy to implement, and may
allow partial rollback to improve efficiency. - Open Nesting improves concurrency in cost for
higher level atomicity and isolation and the
complexity of software implementation. - Using open nesting it is possible to provide
non-transactional operations inside transactions.
64QUESTIONS?
65The End
10/06/2007