Title: Reactive Spin-locks: A Self-tuning Approach
1Reactive Spin-locks A Self-tuning Approach
- Phuong Hoai Ha
- Marina Papatriantafilou
- Philippas Tsigas
I-SPAN 05, Las Vegas, Dec. 7th 9th, 2005
2Outline
- Mutual exclusion
- Overhead
- Available reactive spin-locks
- New reactive spin-lock
- Model
- Algorithm
- Evaluation
- Conclusions
3Mutual exclusion
Entry section
Critical section
Exit section
Noncritical sec.
Requests issued
Lock released
Arbitration
Lock sent to winner
- Performance goals
- Low latency
- Low contention
4Spin-lock categories
- Arbitrating locks
- Determine who is the next lock-holder in advance,
e.g. ticket-locks, queue-locks. - Advantages
- Prevent processors from causing bursts in network
traffic and high contention on the lock. - Non-arbitrating locks
- E.g. Test-and-set locks
- Advantages
- Exploit locality/cache
- Tolerate failures in the Entry section.
5Arbitrating vs. non-arbitrating locks
1
3
5
Interconnection Network
Interconnection Network
2
4
6
6Available reactive spin-lock algorithms
- Drawbacks
- Their reactive schemes rely on
- Fixed experimental thresholds
- The thresholds frequently become inappropriate in
variable and unpredictable environments like
multiprogramming systems - E.g. ticket locks with proportional backoff,
test-and-test-and-set locks with exponential
backoff - Known probability distributions of some inputs
- The assumption is not usually feasible.
7New reactive spin-lock algorithm
- Ideas
- A non-arbitrating lock with adaptive sensible
backoff delay. - Advantages
- Its reactive scheme is self-tuning
- Neither experimentally tuned thresholds nor
probability distributions of inputs are needed - It combines advantages of both arbitrating and
non-arbitrating spin-lock categories. - It can exploit locality as well as reduce
contention on the lock.
8Find sensible backoff delay
- Need to optimize trade-off between
- Latency
- The interval between a pair of lock-release and
lock-acquisition - Contention on the lock
- This is an online problem.
?delay?
Load on the lock
9Reactive scheme
- Bounds for loads on the lock 1 ? lt ? P
- During a load-rising phase
- Similar for load-dropping phase
- In each load-rising/load-dropping phase, the
reactive scheme is competitive with competitive
ration c?(ln(P))
10Algorithm
- The algorithm guarantees mutual exclusion and
non-livelock. Its space complexity is log(P).
0
1
3
4
2
0
1
Interconnection Network
3
2
11Evaluation
- Benchmarks
- Spark98 kernel lmv
- SPLASH-2 suite Volrend and Radiosity
- Representatives
- Arbitrating ticket lock with (tuned)
proportional backoff - Non-arbitrating test-and-test-and-set lock with
(tuned) exponential backoff - System
- A ccNUMA SGI Origin2000 with 28 250MHz MIPS R1000
processors.
12Experimental results
13Experimental results (2)
14Experimetal results (3)
15Conclusions
- We have designed and implemented a new reactive
spin-lock - It is self-tuning.
- It combines advantages of both arbitrating and
non-arbitrating locks - Its reactive scheme is competitive with c
?(ln(P)) - ? The lock automatically adjusts its backoff
delay reasonably according to loads on the lock
as well as applications
16Thanks for your attention!
17Estimate delay bases
- Fairness
- A fair lock helps parallel application gain
performance since the application threads can
execute their non-critical section in parallel. - Definition
- Heuristic to estimate basel
, where ni is lock-acquisitions of a
processor in ?t and N is processors
, where a, b are system documented constants and
DoCS is the delay outside CS
18NUMA
- Another parameter that makes the problem harder
is NUMA - Latency is much different
- E.g. ccNUMA SGI Origin2000
19Model An online problem
- A sequence of loads on the lock are unfolded
on-the-fly. - When observing a load, the algorithm must decide
how much its current backoff delay should be
lengthened. - If increasing delay too soon, it will waste time
on a long delay when the lock becomes available - If not increasing delay in time, it will cause
high contention on the lock - ? it must increase delay at high loads
reasonably - ? Goal is to maximize ?t ?delayt .loadt ,where
?t ?delayt ? P
20Algorithm
- LockType
- ltlock, countergt
- Initial delay L.counter x basel
- The algorithm guarantees mutual exclusion and
non-livelock. Its space complexity is log(P).
- Acquire( Lock pL)
- L FAA(pL.L, lt1,1gt)
- if L.lock then
- delay ComputeDelay(L)
- cond lt1,0gt
- do
- sleep(delay)
- L pL.L
- if L.lock then
- delay ComputeDelay(L)
- continue
- cond FAA(pL.L, lt1,0gt)
- while cond.lock
- Release( Lock pL)
- do L pL.L
- while not CAS(pL.L,L,lt0,L.counter-1gt)