Hardware Transactional Memory - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Hardware Transactional Memory

Description:

Requesting processor waits (using coherence nacks/retries) ... Future: Requesting processor traps to software contention manager that decides who waits/aborts ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 66
Provided by: eyalw
Category:

less

Transcript and Presenter's Notes

Title: Hardware Transactional Memory


1
Hardware Transactional Memory
  • Eyal Widder
  • Nimrod Reiss

Instructor Yehuda Afek Tel Aviv University
10/06/2007
2
References
  • Thread-Level Transactional Memory
  • Kevin E. Moore, Mark D. Hill David A. Wood
    2005
  • LogTM Log-based Transactional Memory
  • Kevin E. Moore, Jayaram Bobba, Michelle J.
    Moravam, Mark D. Hill David A. Wood 2006

3
Outline
  • Locks Vs. Transactional Memory
  • Introduction to LogTM
  • LogTM Version Management
  • LogTM Conflict Detection
  • Conclusions

4
The Challenge of Multithreaded SW
  • Goal Parallelization
  • Problem Unrestricted concurrency ? bugs
  • Solution Synchronization
  • New problem Synchronization
  • Tension between performance and correctness

5
Current Mechanism Locks
  • Locks objects only one thread can hold at a time
  • Organization lock for each shared structure
  • Usage (block) ? acquire ? access ? release
  • Correctness issues
  • Under-locking ? data races
  • Acquires in different orders ? deadlock
  • Performance issues
  • Conservative serialization
  • Overhead of acquiring
  • Difficult to find right granularity

6
Transactions vs. Locks
  • Lock issues
  • Under-locking
  • Acquires in different orders
  • Blocking
  • Conservative serialization
  • How transactions help
  • Simpler interface
  • No ordering
  • Can cancel transactions
  • Serialization only on conflicts

Locks ? simplicity/performance tension Transaction
s ? (potentially) simple and efficient
7
Transaction Semantics -ACI Properties
  • Atomicity All or Nothing
  • Consistency Correct at beginning and end
  • Isolation Partially done work not visible to
    other threads

8
Thread-Level Transactional Memory
  • Separate semantics from implementation
  • Adapt DBMS(database management systems) concepts
  • Concurrency control algorithms
  • Conflict detection
  • Taking the appropriate action (commit\abort\delay)

Main challenge Reduce the overhead of enforcing
the ACI properties!
9
Basic Idea
  • Module TM like virtual memory
  • A thread level abstraction
  • Use 3 types of interfaces User, System\Library,
    Low-level
  • An interface independent of implementation
  • Combine HW and SW in implementation

10
How Do Transactional Memory Systems Differ?
  • (Data) Version Management
  • Keep old values for abort AND new values for
    commit
  • Eager record old values elsewhere update in
    place
  • Lazy update elsewhere keep old values in
    place
  • (Data) Conflict Detection
  • Find read-write, write-read or write-write
    conflictsamong concurrent transactions
  • Eager detect conflict on every read/write
  • Lazy detect conflict at end (commit/abort)

? Fastcommit
? Less wasted work
11
Outline
  • Locks Vs. TM
  • Introduction to LogTM
  • LogTM Version Management
  • LogTM Conflict Detection
  • Conclusions

12
Log Based Transactional Memory LogTM
  • (Hardware) Transactional Memory promising
  • Most use lazy version management
  • Old values in place
  • New values elsewhere
  • Commits slower than aborts
  • New LogTM Log-based Transactional Memory
  • Uses eager version management (like most
    databases)
  • Old values to log in thread-private virtual
    memory
  • New values in place
  • Makes common commits fast!
  • Hardware traps to Software handler
  • Aborts handled in software

13
Outline
  • Locks Vs. TM
  • Introduction to LogTM
  • LogTM Version Management
  • LogTM Conflict Detection
  • Conclusions

14
LogTMs Eager Version Management
  • Old values stored in the transaction log
  • A per-thread linear (virtual) address space (like
    the stack)
  • Filled by hardware (during transactions)
  • Read by software (on abort)
  • New values stored in place

15
Transaction Log Example
Data Block
VA
R W
  • Initial State
  • LogBase LogPointer
  • TM count gt 0

12--------------
00
0
0
--------------23
40
0
0
34--------------
C0
0
0
1000
Log Base
1000
1040
Log Ptr
1000
1080
0
TM count
1
16
Transaction Log Example
  • Store r2, (c0) / r2 56 /
  • Set W bit for block (c0)
  • Store address (c0) and old data on the log
  • Increment Log Ptr to 1048
  • Update memory

Data Block
VA
R W
12--------------
00
0
0
--------------23
40
0
0
34--------------
56--------------
C0
0
0
1
34------------
1000
c0
Log Base
1000
1040
--
1000
1048
Log Ptr
1080
TM count
1
17
Transaction Log Example
Data Block
VA
R W
  • Commit transaction
  • Clear R W for all blocks
  • Reset Log Ptr to Log Base (1000)
  • Clear TM count

12--------------
00
0
0
--------------23
40
0
0
56--------------
C0
0
0
1000
34------------
c0
Log Base
1000
1040
--
Log Ptr
1000
1048
1080
TM count
0
1
18
Transaction Log Example
  • Abort transaction
  • Replay log entries to undo the transaction
  • Reset Log Ptr to Log Base (1000)
  • Clear R W bits for all blocks
  • Clear TM count

Data Block
VA
R W
12--------------
00
0
0
--------------23
40
0
0
34--------------
C0
56--------------
0
0
1000
c0
Log Base
1000
1040
1000
1048
Log Ptr
1048
1080
1
TM count
0
19
Eager Version Management Discussion
  • Advantages
  • Fast Commits
  • No copying
  • Common case
  • Disadvantages
  • Slow/Complex Aborts
  • Undo aborting transaction
  • Relies on Eager Conflict Detection/Prevention

20
Outline
  • Locks Vs. TM
  • Introduction to LogTM
  • LogTM Version Management
  • LogTM Conflict Detection
  • Conclusions

21
LogTMs Eager Conflict Detection
  1. Requesting processor sends a coherence request to
    the directory.
  2. The directory responds and possibly forwards the
    request to one or more processors.
  3. Each responding processor examines some local
    state to detect a conflict.
  4. The responding processors each ack or nack the
    request.
  5. The requesting processor resolves any conflict.

22
Conflict Detection
  • Validation is retained by using the R,W bits and
    the directory MOESI states.
  • A Sticky State is used to detect possible
    conflicts from overflows

23
Conflict Detection (example)
  • P0 store
  • P0 sends get exclusive (GETX) request
  • Directory responds with data (old)
  • P0 executes store

Directory
I old
M_at_P0 old
P1
P0
TM mode
TM mode
0
0
1
Overflow
Overflow
0
0
I (--) none
M (--) old
M (-W) new
I (--) none
24
Conflict Detection (example)
  • In-cache transaction conflict
  • P1 sends get shared (GETS) request
  • Directory forwards to P0
  • P1 detects conflict and sends NACK

Directory
Fwd_GETS
M_at_P0 old
GETS
P1
P0
TM mode
TM mode
0
0
1
Overflow
Overflow
0
0
M (-W) new
M (-W) new
I (--) none
NACK
25
Conflict Detection (example)
  • Cache overflow
  • P0 sends put exclusive (PUTX) request
  • Directory acknowledges
  • P0 sets overflow bit
  • P0 writes data back to memory

Directory
PUTX
M_at_P0 old
Msticky_at_P0 new
ACK
DATA
P0
P1
TM mode
TM mode
0
0
1
Overflow
Overflow
0
1
0
M (-W) new
I (--) none
I (--) none
26
Conflict Detection (example)
  • Out-of-cache conflict
  • P1 sends GETS request
  • Directory forwards to P0
  • P0 detects a (possible) conflict
  • P0 sends NACK

Directory
M_at_P0 old
Msticky_at_P0 new
GETS
Fwd_GETS
P1
P0
TM mode
TM mode
0
0
1
Overflow
Overflow
0
0
1
1
I (--) none
I (--) none
M (--) old
M (-W) new
I (--) none
NACK
Conflict!
27
Conflict Detection (example)
  • Commit
  • P0 clears TM mode and Overflow bits

Directory
M_at_P0 old
Msticky_at_P0 new
P1
P0
TM mode
TM mode
0
0
1
Overflow
Overflow
0
0
1
I (--) none
I (--) none
M (--) old
M (-W) new
I (--) none
28
Conflict Detection (example)
  • Lazy cleanup
  • P1 sends GETS request
  • Directory forwards request to P0
  • P0 detects no conflict, sends CLEAN
  • Directory sends Data to P1

Directory
Fwd_GETS
Msticky_at_P0 new
S(P1) new
GETS
CLEAN
DATA
P1
P0
TM mode
TM mode
0
0
0
Overflow
Overflow
0
0
0
I (--) none
I (--) none
M (--) old
M (-W) new
I (--) none
S (--) new
29
LogTMs Conflict Detection w/ Cache Overflow
  • At overflow at processor P
  • Set Ps overflow bit (1 bit per processor)
  • Allow writeback, but set directory state to
    Sticky_at_P
  • At transaction end (commit or abort) at processor
    P
  • Reset Ps overflow bit
  • At (potential) conflicting request by processor R
  • Directory forwards Rs request to P.
  • P tells R no conflict if overflow is reset
  • But asserts conflict if set (w/ small chance of
    false positive)

30
Conflict Resolution
  • Conflict Resolution
  • Can wait risking deadlock
  • Can abort risking livelock
  • Wait/abort transaction at requesting or
    responding proc?
  • LogTM resolves conflicts at requesting processor
  • Requesting processor waits (using coherence
    nacks/retries)
  • But aborts if other processor is waiting
    (deadlock possible) it is logically younger
    (using timestamps)
  • Future Requesting processor traps to software
    contention manager that decides who waits/aborts

31
Outline
  • Locks Vs. TM
  • Introduction to LogTM
  • LogTM Version Management
  • LogTM Conflict Detection
  • Conclusions

32
Conclusion
  • Commits are far more common than aborts
  • Conflicts are rare
  • Most conflicts can be resolved w/o aborts
  • Software aborts do not impact performance
  • Overflows are rare (in current benchmarks)
  • LogTM
  • Eager Version Management makes the common case
    (commit) fast
  • Sticky States/Lazy Cleanup detects conflicts
    outside the cache (if overflows are infrequent)

33
QUESTIONS?
34
Break Time!
35
References
  • LogTM Log-based Transactional Memory
  • Kevin E. Moore, Jayaram Bobba, Michelle J.
    Moravam, Mark D. Hill David A. Wood 2006
  • Supporting Nested Transactional Memory in LogTM
  • Michelle J. Moravam, Jayaram Bobba, Kevin E.
    Moore, Luke Yen, Mark D. Hill, Ben Liblit,
    Michael M. Swift David A. Wood 2006

36
Motivation
  • Till now Transactional Memory promises lock-free
    atomic, consistent and isolated execution.
  • But what should occur when a transaction
    executes another transaction within ?

37
LogTM enables flattening
  • In the last lecture weve introduced LogTM which
    enables subsuming inner transactions into the
    top-level transaction.
  • A counter is used to count the nesting level,
    Transaction_begin() increments and
    Transaction_end() decrements.
  • A conflict on an inner transaction may cause a
    complete abort to the beginning of the top-level
    one.

38
Challenges in nesting transactions
  • Facilitating Software Composition.
  • Enhancing Concurrency.
  • Escaping to non-transactional systems.

39
Facilitating Software Composition
  • Calling modules that use locks within requires
    caller knowledge of internal module
    implementation details.
  • In order to aid modular programming,
    transactional memory should support nesting.

40
Challenges in nesting transactions
  • Facilitating Software Composition.
  • Enhancing Concurrency.
  • Escaping to non-transactional systems.

41
Enhancing Concurrency
  • Closed nesting does not eliminate all problems
    posed by modular software.
  • Concurrency is limited by maintaining isolation
    until the top-level transaction commits.

42
Example
P2
P1
Transaction L
Transaction T
conflict
conflict
Transaction S
pNextFree
Transaction S
M_at_P1
  • How would you do it differently ?

Ideally S should release pNextFree so that other
transactions can access the allocator without
conflicting with transaction L.
43
Challenges in nesting transactions
  • Facilitating Software Composition.
  • Enhancing Concurrency.
  • Escaping to non-transactional systems.

44
Escaping to non-transactional systems.
  • Many TM systems will run on top of
    non-transactional base systems that may include
  • Runtime libraries
  • Operation systems
  • Language virtual machines (e.g. JVM)
  • STMs handle such escapes easily.
  • An escape to non-transactional system must
    disable HTM mechanisms to allow correct
    operation.
  • Allow Inter-Transaction / Device communication.

45
Outline
  • Motivation and challenges.
  • Closed vs. Open Nesting.
  • Nested LogTM.
  • Supporting Closed Nesting.
  • Partial aborts.
  • Supporting Open Nesting.
  • Abort actions / Commit actions.
  • Condition O1.
  • Escape actions.
  • Conclusions.

46
Closed vs. Open nesting
  • Closed Nested Transactions extends isolation of
    an inner transaction until the top-level
    transaction commits.
  • Open Nested Transactions allow committing inner
    transaction to immediately release isolation.

47
Closed Nested Transactions
  • May flatten transactions into the top-level one
    (as weve already seen) .
  • May allow partial roll-back.

48
Open Nested Transactions
  • Increase concurrency and expressiveness.
  • May increase both SW HW complexity.
  • Higher-level atomicity
  • Childs memory updates not undone if parent
    aborts
  • Use abort action to undo the childs forward
    action at a higher-level of abstraction
  • E.g., malloc() compensated by free()
  • Higher-level isolation
  • Release memory-level isolation
  • Programmer enforce isolation at higher level
    (e.g., locks)
  • Use commit action to release isolation at parent
    commit

49
Outline
  • Motivation and challenges.
  • Closed vs. Open Nesting.
  • Nested LogTM.
  • Supporting Closed Nesting.
  • Partial aborts.
  • Supporting Open Nesting.
  • Compensating actions / commit actions.
  • Condition O1.
  • Escape actions.
  • Conclusions.

50
Nested LogTM ?
  • Nested LogTM extends Flat LogTM (last lecture).
  • Splits the log into frames.
  • Header contains Frame Pointer to the parents
    Header.
  • Header contains register checkpoint.

Header
Undo record
Undo record
Header
Log Frame
Undo record
Level 1
Undo record
Log Ptr
51
Nested LogTM ?
  • Replicates R/W bits.
  • Maintains a separate Read set, Write set for each
    nesting level.
  • Use constant (k) number of R/W sets, and flatten
    transactions whose nesting level is bigger than k.

52
Closed Nested LogTM
On Commit
  • Top Level Transactions commit normally.

t
If ( 1 lt curr_level k)
  • Merge the current log frame with parents.
  • Flash OR R/W bits of curr_level 1 with
    curr_level s.
  • Decrement curr_level .

Otherwise
  • Merge the current log frame with parents.
  • Decrement curr_level .

53
Closed Nested LogTM
Conflict detection
  • An incoming read from memory location m conflicts
    with another threads level j transaction if j is
    the minimal level where block(m)s Write bit is
    set.
  • An incoming write to memory location m conflicts
    with another threads level j transaction if j is
    the minimal level where block(m)s Write or Read
    bit is set.

54
Closed Nested LogTM
On Abort
  • An abort of the current transaction at curr_level
    traps to a software handler.
  • Suppose the transaction aborts for a conflict in
    abort_level transaction.
  • The software handler walks the log frame
    backwards and undoes curr_level abort_level 1
    log frames.
  • Finally it restores the register state save in
    header.

55
frame pointer
end pointer
2, a
garbage header
6, c
4, b
  • // thread i at level 0 (Non-transactional)
  • a 2 b 4 c 6 // Initialize
  • transaction_begin() // top-level (level 1)
  • a b 1 // a gets 5.
  • transaction_begin() // level 2
  • c b 3 // c gets 1.
  • b a 2 // b gets 7.
  • a c 7 // a gets 8.
  • transaction_commit() // level 2.
  • transaction_commit() // level 1.

5, a
56
Supporting Open Transactions
  • When an open nested transaction Topen at level j
    commits
  • Its frame is discarded from the log.
  • R/W bits for level j are cleared.
  • (Optionally) Append commit and abort action
    records, Copen and Aopen to the newly exposed end
    of Topens parents frame.

57
Commit and Abort Actions
  • To ensure consistency, open nested transactions
    must raise the abstraction level of both
    isolation and rollback.
  • Commit actions are executed in FIFO order while
    Abort actions are executed in LIFO order.

58
frame pointer
end pointer
2, a
Aopen
6, c
4, b
  • // thread i at level 0 (Non-transactional)
  • a 2 b 4 c 6 // Initialize
  • transaction_begin() // top-level (level 1)
  • a b 1 // a gets 5.
  • transaction_begin() // level 2
  • c b 3 // c gets 1.
  • b a 2 // b gets 7.
  • a c 7 // a gets 8.
  • transaction_commit() // level 2.
  • transaction_commit() // level 1.

5, a
59
Condition O1
  • No Writes to Data Written by Ancestors

Neither an open transaction Topen nor its commit
and abort actions, Copen and Aopen writes any
data written by Topens ancestors.
60
Example
counter 0 // initialize transacti
on_begin ( ) // top-level counter //
counter gets 1. open_begin ( ) // level
2 counter // counter gets 2. // commit
with an abort action. open_commit (
abort_action( decr(counter) ) ) .. // Abort
and run abort action // Expect counter to be
0. . transaction_commit() // not executed.
61
Escape Actions
  • Real world is not transactional
  • Current OSs are not transactional
  • Systems should allow non-transactional escapes
    from a transaction
  • Interact with OS, VM, devices, etc.

62
Escape Actions First Class
  • Keep a per-thread Escape bit.
  • Escape Actions read most recent values from
    memory (Even uncommitted).
  • Escape Actions never aborts or stalls.
  • Similar to Open Transaction, an escape action may
    register Commit/Abort actions.

63
Conclusions
  • Closed Nesting is easy to implement, and may
    allow partial rollback to improve efficiency.
  • Open Nesting improves concurrency in cost for
    higher level atomicity and isolation and the
    complexity of software implementation.
  • Using open nesting it is possible to provide
    non-transactional operations inside transactions.

64
QUESTIONS?
65
The End
10/06/2007
Write a Comment
User Comments (0)
About PowerShow.com