Title: Transactional Memory An Overview of Hardware Alternatives
1Transactional MemoryAn Overview of Hardware
Alternatives
- David A. Wood
- University of Wisconsin
- Transactional Memory Workshop
- April 8th, 2005
2Whats database got to do with it?
- Atomicity
- All updates, or none
- Consistency
- Correct at begin and end
- Isolation
- Partial work not visible
- Inputs stay stable
- Durability
- Survive system failures
All (or some) memory ops, not just database
objects
Despite increasing awareness of failures
3801 Database Storage
- Lock bits on virtual memory
- 128 byte granularity
- Added to pagetable and TLB
- Caches users lock state
- Trap on lock conflict
- No h/w for logging, abort, etc.
- Only uniprocessors
- 801 and RS/6000
Memory
TLB
Tid
Was this transactional memory?
4SQL/801
- The development of SQL/801 was greatly
simplified because, with minor exceptions, it
considers only a single user. It achieves
multiuser concurrency on a uniprocessor by
running in multiple processes using the shared
database storage. Chang and Mergen, 88 - Largest transactional memory application
- Only real hardware transactional memory
implementation - No one seems to be looking at what they learned
5Basic Transactional Mechanisms
- Isolation
- Detect when transactions conflict
- Track read and write sets
- Version management
- Record new and old values
- Atomicity
- Commit new values
- Abort back to old values
6H/W Transactional Memory Systems
- Knights Lisp Work
- Transactional Memory
- Oklahoma Update
- SLE/TLR
- Transactional Coherence and Consistency
- Unbounded TM
- Virtual TM
- Thread-level TM
7Knights Lisp Work 86
- Parallel execution of sequential code
- Break program into transaction blocks
- Multiple loads in a transaction
- Exactly one store ends the transaction
- No register state passed between transactions
- Execute transactions in parallel
- Track dependences (i.e., read set)
- Abort and restart on conflicting write
- Transactions commit in sequential order
- Broadcast writes on commit
8Knights Hardware
- Two caches
- Dependency cache
- Tracks read set
- Bus monitor detects conflicts
- Confirm cache
- Holds write set
- Supports multiple writes
- Commits
- Check dep. cache
- Broadcast writes
- Fast aborts
- Invalidate Confirm cache
- Use old values in Dep. Cache
- Immediately restart execution
Memory
Confirm Cache
Dependency Cache
Spawned two threads TLS TM
9HMs Transactional Memory 93
- Targets explicitly parallel (non-functional)
codes - Motivated by lock-free data structures
- Transactions
- Read and write multiple locations
- Commit in arbitrary order
- Implicit begin, explicit commit operations
- Abort affects memory, not registers
- Software manages restarting execution
- Validate instruction detects pending abort
- Implementation extends cache coherence
- Read/Write locks correspond to MOESI states
- Add orthogonal transaction states
10HMs Transactional Memory
- Adds Transaction Cache
- Stores all data accessed by transactions
- 2 copies of each line
- Before and after image
- Even for read-only data
- Small, fully associative
- Abort on all conflicts
- NACK conflicting requests
- Abort NACKed transaction
- Fast commit and abort
- Change trans. cache state
Memory
Cache
Transaction Cache
11SLE/TLR
- Hardware exploits speculative processors
- Read sets tracked by coherence protocol
- Write set maintained in store queue
- Abort restarts execution, including register
state - Speculative lock elision (SLE)
- Elide locks from the dynamic execution stream
- Convert critical sections to optimistic
transactions - Concurrently execute non-conflicting transactions
- Fall back on explicit locks if conflicts
- Transactional Lock Removal (TLR)
- Resolve conflicts using priority ordering
(timestamps) - Delay lower priority transactions
- Deadlock and starvation free
12Transactional Coherence and Consistency 04
- TCC unifies coherence, memory consistency, and
transaction support - All transactions, all the time
- Transaction ordering
- Ordered, Unordered, Partially Ordered
- Supports thread-level speculation
- Optimistic concurrency model
- Unordered transactions serialize at commit
- Conflicts detected at commit
13TCC
On-Chip Interconnect Broadcast updates at commit
Write buffer 4 kB, holds new values until commit
Shadow register file checkpoints architectural
registers
L2 Cache Logically Shared
CPU
L1 D
L1 cache tracks read set, bit per line
SRF
14TCC
- Commits are sequential
- Broadcasts addresses of all updates
- Supports large transactions
- Serialize all other transactions
- Grabs and holds the commit bus
- Cannot abort large transactions
- Updates affect L2/Mem no undo
- Extensions forthcoming
- talk to Kunle and Christos
15Unbounded Transactional Memory (UTM)
- Unbounded transactions
- Arbitrary size
- Not limited by write buffer, cache, or memory
- Arbitrary duration
- Not limited by interrupts, context switch, etc.
- Complex implementation
- Not justified by performance
- Settle for nearly unbounded transactions
- Much simpler hardware
16Transactional Linux
Log-log scale
- Almost all of the transactions require lt 100
cache lines - 99.9 need fewer than 54 cache lines
- There are, however, some very large transactions!
- gt500k-byte fully-associative cache required
17Large Transaction Memory (LTM)
- Register checkpoints
- Snapshot of rename maps
- Cache tracks read and write sets
- T-bits mark transactional blocks
- Cache holds new data values in place
- O-bit indicates overflow to in-memory hashtable
- Memory holds committed state
- Abort invalidates all modified blocks
- Miss on re-execution
- Transactional writes force memory updates
- Repeated writes (e.g., to local data) are written
through
18Virtual Transactional Memory (VTM)
- Only an overflow mechanism
- No overhead on common in-cache case
- Check shared overflow counter on cache miss
- Low overhead when no conflict
- Shared Bloom Filter rules out conflicts
- Filter resides in virtual memory
- Higher overhead on possible conflict
- Hardware table walk to detect actual conflict
- Table resides in virtual memory
- Only incurred by large transactions with likely
conflict - Supports context switches and paging
19801 revisited
- Why didnt 801 database storage succeed?
- Lock bits helped performance and simplified
software - Answer 1
- Changing lock bits requires TLB shootdown
- Too complicated for the benefits?
- ? Not a current problem transaction h/w is easy
- Answer 2
- Not universally available
- DB2 was (is) multiplatform
- Cant rely on feature only available in one
architecture - ?Still a relevant concern
20Need Standard Transaction Interface
- Abstract away resource requirements
- Support large, long transactions
- Virtualize transactional memory
- Transaction semantics between threads
- NOT a hardware property
- Permit range of implementations
- Hardware, software, and combinations
21Thread-level Transactional Memory
- Abstract mechanisms
- Version management
- Update memory in place
- Log before images to thread level VM
- Isolation
- Logically extend memory words with read and write
bits - Implementations can be conservative (e.g.,
blocks) - Atomicity
- Commits easy due to in place updates
- Aborts trap to user-level software
- Hardware can accelerate common case
22Conclusions
- Make the common case fast
- 99 of transactions fit in hardware
- Lots of alternatives
- Make both commits and aborts fast
- Handle the uncommon case
- Large transactions will occur, deal with em
- Shouldnt be limited by hardware
- Agree on a common abstraction
- Success requires multi-platform support
- Let vendors compete on price-performance