Speculative Lock Elision - PowerPoint PPT Presentation

About This Presentation
Title:

Speculative Lock Elision

Description:

Lock can be elided, if memory operation remain atomic ... If validated, elide two stores. Atomicity prediction ... acquires/releases elided. SPLASH Performance ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 19
Provided by: TammyB69
Category:

less

Transcript and Presenter's Notes

Title: Speculative Lock Elision


1
Speculative Lock Elision
  • Enabling Highly Concurrent Multithreaded
    Execution
  • Ravi Rajwar and James R. Goodman
  • University of Wisconsin-Madison

2
Motivation
  • Multithreaded Programming gains Importance
  • SMPs, Multithreaded Architectures
  • Used in non-traditional fields (Desktop)
  • Synchronization Mechanisms required for exclusive
    access
  • Serialize access to shared data structures
  • Necessary to prevent conflicting updates

3
Dynamically Unnecessary Synchronization
  • LOCK(locks-gterror_lock)
  • if (local_errorgtmulti-gterr_multi)
  • multi-gterr_multilocal_err
  • UNLOCK(locks-gterror_lock)
  • Require no lock, if
  • no write access
  • different fields accessed
  • Examples from
  • ocean
  • SHORE (similar)
  • Thread 1
  • LOCK(hash_tbl.lock)
  • varhash_tbl.lookup(X)
  • if(!var)
  • hash_tbl.add(X)
  • UNLOCK(hash_tbl.lock)
  • Thread 2
  • LOCK(hash_tbl.lock)
  • varhash_tbl.lookup(Y)
  • if(!var)
  • hash_tbl.add(Y)
  • UNLOCK(hash_tbl.lock)

4
Multithreaded Programming
  • Conservative Locking
  • Easier to show correctness
  • Lock more often than necessary
  • Locking Granularity
  • Trade-Off between Performance and Complexity
  • Thread-unsafe legacy libraries
  • Require global locking

5
False Dependencies
  • Locks introduce Control and Data Dependence
  • Removal
  • Key is Appearance of Instantaneous Change
  • Lock can be elided, if memory operation remain
    atomic
  • Data read is not modified by other threads
  • Data written is not access by other threads
  • Any instruction that violates these conditions
    must not be retired

6
Silent store pairs
  • i16 and i6 are silent pair
  • i16 undoes i6
  • Not silent individually
  • No write to _lock_
  • Read _lock_ is ok
  • SLE does not depend on semantic information
  • Simply observe silent store pairs

7
Two SLE Predictions
  • Silent pair prediction
  • On a store
  • Predict another store will undo changes
  • Dont perform stores
  • Monitor memory location
  • If validated, elide two stores
  • Atomicity prediction
  • Predict all memory access within silent pair
    occur atomically
  • No partial updates visible to other threads

8
Initiating Speculation
  • Detect candidate pairs
  • Filter indexed by program counter
  • Add confidence metric for each pair
  • Predict, if lock held
  • If yes, dont speculate

9
Buffering Speculative State
  • Register state
  • Reorder Buffer
  • Already used for branch prediction
  • Register Checkpoint
  • Memory state
  • Augment write buffers
  • Keep speculative writes in write buffer, until
    validated
  • Allows merging of speculative writes

10
Misspeculation conditions
  • Atomicity violation
  • Detected by coherence protocol
  • Sufficient for ROB register state
  • For register checkpoint add speculative access
    bit to L1 cache
  • Resource constraints
  • Cache Size
  • Write Buffer Size

11
Committing Speculative Memory State
  • Commits must appear instantaneously
  • Cache state and Cache data
  • Can speculate on state
  • Cant speculate on data
  • Speculative store
  • Send GETX request
  • Block already exclusive when speculative writes
    commit

12
Evaluation
  • 3 Systems
  • CMP, SMP and DSM
  • Total Store Order
  • Single Register Checkpoint
  • 32 entry lock predictor
  • Run on SimpleMP
  • Based on Simplescalar
  • Benchmarks
  • SPLASH
  • mp3d, barnes, cholesky
  • SPLASH-2
  • Radiosity, water-nsq, ocean
  • Microbenchmark

13
Microbenchmark Results
14
Lock acquires/releases elided
15
SPLASH Performance
16
Reasons for Performance Gain
  • Concurrent critical section execution
  • Reduced observed memory latencies
  • Locks can remain shared
  • Reduced memory traffic
  • No transfer of locks over the bus

17
Conclusions
  • Remove unnecessary serialization
  • Locks do not need to be acquired, but only
    observed
  • Control dependence converted to data dependence
  • No Programmer-based static analysis
  • But Hardware-based dynamic analysis
  • No coherence protocol changes
  • Independent of consistency model
  • Accesses are atomic in any model
  • Permit programmers to used conservative
    synchronization

18
Questions
  • Are programmer really off the hook?
  • Are all critical sections short enough for SLE?
  • What about the legacy library example?
  • Will this work for OLTP?
  • Do all MP papers come from Wisconsin?
Write a Comment
User Comments (0)
About PowerShow.com