Title: Spin Locks and Contention
1Spin Locks and Contention
- Companion slides for
- The Art of Multiprocessor Programming
- by Maurice Herlihy Nir Shavit
2Kinds of Architectures
- SISD (Uniprocessor)
- Single instruction stream
- Single data stream
- SIMD (Vector)
- Single instruction
- Multiple data
- MIMD (Multiprocessors)
- Multiple instruction
- Multiple data.
3Kinds of Architectures
- SISD (Uniprocessor)
- Single instruction stream
- Single data stream
- SIMD (Vector)
- Single instruction
- Multiple data
- MIMD (Multiprocessors)
- Multiple instruction
- Multiple data.
Our space
(1)
4MIMD Architectures
memory
Shared Bus
Distributed
- Memory Contention
- Communication Contention
- Communication Latency
5What Should you do if you cant get a lock?
- Keep trying
- spin or busy-wait
- Good if delays are short
- Give up the processor
- Good if delays are long
- Always good on uniprocessor
(1)
6What Should you do if you cant get a lock?
- Keep trying
- spin or busy-wait
- Good if delays are short
- Give up the processor
- Good if delays are long
- Always good on uniprocessor
our focus
7Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
8Basic Spin-Lock
lock introduces sequential bottleneck
CS
Resets lock upon exit
spin lock
critical section
9Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
10Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
Seq Bottleneck ? no parallelism
11Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
Contention ? ???
12Test-and-Set
- Boolean value
- Test-and-set (TAS)
- Swap true with current value
- Return value tells if prior value was true or
false - Can reset just by writing false
- TAS aka getAndSet
13Test-and-Set
public class AtomicBoolean boolean value
public synchronized boolean getAndSet(boolean
newValue) boolean prior value value
newValue return prior
(5)
14Test-and-Set
public class AtomicBoolean boolean value
public synchronized boolean getAndSet(boolean
newValue) boolean prior value value
newValue return prior
Package java.util.concurrent.atomic
15Test-and-Set
public class AtomicBoolean boolean value
public synchronized boolean getAndSet(boolean
newValue) boolean prior value value
newValue return prior
Swap old and new values
16Test-and-Set
AtomicBoolean lock new AtomicBoolean(false) b
oolean prior lock.getAndSet(true)
17Test-and-Set
AtomicBoolean lock new AtomicBoolean(false) b
oolean prior lock.getAndSet(true)
Swapping in true is called test-and-set or TAS
(5)
18Test-and-Set Locks
- Locking
- Lock is free value is false
- Lock is taken value is true
- Acquire lock by calling TAS
- If result is false, you win
- If result is true, you lose
- Release lock by writing false
19Test-and-set Lock
class TASlock AtomicBoolean state new
AtomicBoolean(false) void lock() while
(state.getAndSet(true)) void unlock()
state.set(false)
20Test-and-set Lock
class TASlock AtomicBoolean state new
AtomicBoolean(false) void lock() while
(state.getAndSet(true)) void unlock()
state.set(false)
Lock state is AtomicBoolean
21Test-and-set Lock
class TASlock AtomicBoolean state new
AtomicBoolean(false) void lock() while
(state.getAndSet(true)) void unlock()
state.set(false)
Keep trying until lock acquired
22Test-and-set Lock
class TASlock AtomicBoolean state new
AtomicBoolean(false) void lock() while
(state.getAndSet(true)) void unlock()
state.set(false)
Release lock by resetting state to false
23Performance
- Experiment
- n threads
- Increment shared counter 1 million times
- How long should it take?
- How long does it take?
24Graph
no speedup because of sequential bottleneck
time
ideal
threads
25Mystery 1
TAS lock Ideal
time
What is going on?
threads
(1)
26Bus-Based Architectures
cache
cache
cache
Bus
memory
27Bus-Based Architectures
Random access memory (10s of cycles)
cache
cache
cache
Bus
memory
28Bus-Based Architectures
- Shared Bus
- Broadcast medium
- One broadcaster at a time
- Processors and memory all snoop
cache
cache
cache
Bus
memory
29Bus-Based Architectures
- Per-Processor Caches
- Small
- Fast 1 or 2 cycles
- Address state information
cache
cache
cache
Bus
memory
30Jargon Watch
- Cache hit
- I found what I wanted in my cache
- Good Thing
31Jargon Watch
- Cache hit
- I found what I wanted in my cache
- Good Thing
- Cache miss
- I had to shlep all the way to memory for that
data - Bad Thing
32Processor Issues Load Request
cache
cache
cache
Bus
memory
data
33Processor Issues Load Request
Gimme data
cache
cache
cache
Bus
Bus
memory
data
34Memory Responds
cache
cache
cache
Bus
Bus
Got your data right here
memory
data
data
35Processor Issues Load Request
cache
cache
data
Bus
memory
data
36Processor Issues Load Request
cache
cache
data
Bus
Bus
memory
data
37Processor Issues Load Request
I got data
cache
cache
data
Bus
Bus
memory
data
38Other Processor Responds
I got data
data
cache
cache
data
Bus
Bus
memory
data
39Other Processor Responds
data
cache
cache
data
Bus
Bus
memory
data
40Modify Cached Data
data
cache
data
Bus
memory
data
(1)
41Modify Cached Data
data
data
cache
data
Bus
memory
data
(1)
42Modify Cached Data
data
cache
data
Bus
memory
data
43Modify Cached Data
data
cache
data
Bus
Whats up with the other copies?
memory
data
44Cache Coherence
- We have lots of copies of data
- Original copy in memory
- Cached copies at processors
- Some processor modifies its own copy
- What do we do with the others?
- How to avoid confusion?
45Write-Back Caches
- Accumulate changes in cache
- Write back when needed
- Need the cache for something else
- Another processor wants it
- On first modification
- Invalidate other entries
- Requires non-trivial protocol
46Write-Back Caches
- Cache entry has three states
- Invalid contains raw seething bits
- Valid I can read but I cant write
- Dirty Data has been modified
- Intercept other load requests
- Write back to memory before using cache
47Invalidate
cache
data
data
Bus
memory
data
48Invalidate
Mine, all mine!
cache
data
data
Bus
Bus
memory
data
49Invalidate
Uh,oh
cache
data
data
cache
Bus
Bus
memory
data
50Invalidate
Other caches lose read permission
cache
cache
data
Bus
memory
data
51Invalidate
Other caches lose read permission
cache
cache
data
Bus
This cache acquires write permission
memory
data
52Invalidate
Memory provides data only if not present in any
cache, so no need to change it now (expensive)
cache
cache
data
Bus
memory
data
(2)
53Another Processor Asks for Data
cache
cache
data
Bus
Bus
memory
data
(2)
54Owner Responds
cache
data
cache
data
Bus
Bus
memory
data
(2)
55End of the Day
cache
data
data
data
Bus
memory
data
Reading OK, no writing
(1)
56Mutual Exclusion
- What do we want to optimize?
- Bus bandwidth used by spinning threads
- Release/Acquire latency
- Acquire latency for idle lock
57Simple TASLock
- TAS invalidates cache lines
- Spinners
- Miss in cache
- Go to bus
58NUMA Architecturs
- Acronym
- Non-Uniform Memory Architecture
- Illusion
- Flat shared memory
- Truth
- No caches (sometimes)
- Some memory regions faster than others
59NUMA Machines
Spinning on local memory is fast
60NUMA Machines
Spinning on remote memory is slow