Title: Using Speculation to Simplify Multiprocessor Design
1Using Speculation to Simplify Multiprocessor
Design
- Daniel J. Sorin1, Milo M. K. Martin2, Mark D.
Hill3, David A. Wood3 - 1Dept. of Electrical Computer Engineering, Duke
University - 2Dept. of Computer Information Science, Univ.
of Pennsylvania - 3Computer Sciences Dept., University of
Wisconsin-Madison
2My Talk in One Slide
- Shared memory multiprocessors are complicated
- Difficult to design for every possible corner
case - Proposal Use speculation to target the common
case - Speculate that corner cases wont happen
- Detect if they do occur and recover system
- Ensure forward progress
- Case studies
- Simplify cache coherence protocols
- Simplify the interconnection network
3Speculation for Simplicity
- Why we want to avoid complexity
- Time and money for design and verification
- Design for the common case
- But we have to make ALL cases work correctly
- Examples of this philosophy in uniprocessors
- Trapping to software for infrequent/obsolescent
instructions - Pentium4 recovers from edge case scheduler
deadlocks - But this idea hadnt been used for
multiprocessors - Key we now have efficient multiprocessor recovery
4Framework for Speculation
- Four keys to design simplification with
speculation - Ensure that mis-speculations are rare
- Detect all mis-speculations
- Recover from mis-speculations
- Ensure forward progress even for worst-case
5SafetyNet Checkpoint/Recovery
- We use SafetyNet ISCA 2002 for system recovery
- All-hardware checkpoint/recovery for shared
memory multiprocessors - Periodically, takes logical checkpoints of system
- Including caches, coherence state, memory,
directory state - Implements checkpointing with incremental logging
- Consistent checkpoints using logical time
coordination - Can recover 100,000 cycles
- Negligible performance impact
- Incremental logging performed off critical path
- Small log buffers (512 KB) at caches memories
6The Need for Multiprocessor Recovery
- Assumption multiprocessors will have system-wide
recovery mechanisms for purposes of availability - As fault rates keep increasing, recovery is
crucial - Will be all-hardware (like SafetyNet) for
performance - But many alternative designs are possible
- We leverage this recovery mechanism for
recovering from mis-speculations
7Outline
- A Framework for Speculation
- Simplifying Cache Coherence Protocols
- Simplifying the Interconnection Network
- Evaluation
- Conclusions
8Directory Protocol Complexity
- We want adaptive routing in interconnection
network - Better performance and availability
- But adaptive routing precludes point-to-point
ordering - So what?
- Point-to-point ordering simplifies protocol
design - Eliminates several potential corner case races
9Race Case in Directory Protocol
- Example race if no point-to-point ordering in
network
Dir
Forwarded RequestReadWrite
RequestReadWrite
Writeback
P1
P2
RequestReadWrite arrives first at Dir, gets
forwarded to P1
10Race Case in Directory Protocol
Dir
Forwarded RequestReadWrite
RequestReadWrite
Writeback
Writeback Ack
P1
P2
Forwarded RequestReadWrite arrives after
Writeback Ack
11Race Case in Directory Protocol
- Problem P1 sees Forwarded Request in state
Invalid
Dir
Forwarded RequestReadWrite
RequestReadWrite
Writeback
Writeback Ack
P1
P2
Not possible if point-to-point order in
interconnection network
12Simplifying a Directory Protocol
- Speculate that adaptive network provides ordering
- Why is mis-speculation rare?
- Not many re-orderings
- Most re-orderings dont matter!
- How do we detect all mis-speculations?
- If we get a Forwarded RequestReadWrite in state
Invalid - How do we recover?
- SafetyNet
- How do we ensure forward progress?
- Slow-start operation for a while after recovery
- Guarantees that this race cant keep recurring
13Simplifying a Snooping Coherence Protocol
- During design, we missed a corner case
Request ReadWrite
Request ReadWrite
Writeback
State M
State trans1
State trans2
???
- Solution its rare, treat it as mis-speculation
- Detect by seeing RequestReadWrite in state trans2
- Recovery with SafetyNet
- Forward progress with slow-start after recovery
14Outline
- A Framework for Speculation
- Simplifying Cache Coherence Protocols
- Simplifying the Interconnection Network
- Deadlock
- Avoiding deadlock
- Evaluation
- Conclusions
15Two Causes of Deadlock
full of requests
Response
P1
Endpoint Deadlock
full of requests
Response
P2
switch1
Message M1
Switch Deadlock
full of messages
switch2
Message M2
full of messages
16Avoiding Deadlock
- Simple but wasteful solution full buffering
- But its rare that we ever need full buffering
- More efficient solution virtual channels
(networks) - For endpoint deadlock
- Need a virtual network per type of message
- For switch deadlock
- Need some number of virtual channels per virtual
network - Depends on network topology and routing scheme
- A major source of design complexity
17Simplifying Deadlock Avoidance
- Speculate that deadlock wont occur, despite
using less than full buffering and no virtual
channels - Why is mis-speculation rare?
- Can usually avoid deadlock with reasonable
buffering - How do we detect all mis-speculations?
- Timeout mechanism for cache coherence
transactions - How do we recover?
- SafetyNet
- How do we ensure forward progress?
- Slow-start operation for a while after recovery
- Guarantees that deadlock cant keep recurring
18Outline
- A Framework for Speculation
- Simplifying Cache Coherence Protocols
- Simplifying the Interconnection Network
- Evaluation
- Goals
- Methodology
- Results
- Conclusions
19Goals
- Discover the point at which mis-speculation
recoveries impact performance - Determines whether our simplified snooping
protocol and our simplified interconnection
network are viable - Determine whether our simplified directory
protocol can usefully speculate on point-to-point
ordering
20Methodology
- Full-system simulation
- Simics provides full-system functionality
- We added detailed timing model for memory system
- Workloads
- Online transaction processing (OLTP) with DB2
- SPECjbb2000 java middleware
- Apache static web serving
- Slashcode dynamic web serving
- Barnes-Hut scientific simulation
21How Rare Must Mis-speculation Be?
We can tolerate high mis-speculation rates
these rates are much higher than what our
simplified designs incur
22Adaptive Routing with Speculative Ordering
Adaptive routing can provide better performance
by routing around congestion, even with
mis-speculations
23Conclusions
- Simplify multiprocessor design with speculation
- Treat corner cases as mis-speculations recover
from them - Must be able to ensure that
- Mis-speculations are sufficiently rare
- Can detect all mis-speculations
- Can recover from mis-speculations
- Can provide forward progress in all cases
- Showed how to simplify
- Cache coherence protocols
- Interconnection network deadlock avoidance
- Applicable to other complicated designs