RTR: 1 Byte/Kilo-Instruction Race Recording - PowerPoint PPT Presentation

About This Presentation
Title:

RTR: 1 Byte/Kilo-Instruction Race Recording

Description:

RTR: 1 ByteKiloInstruction Race Recording – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 24
Provided by: min9150
Category:

less

Transcript and Presenter's Notes

Title: RTR: 1 Byte/Kilo-Instruction Race Recording


1
RTR 1 Byte/Kilo-InstructionRace Recording
  • Min Xu

Rastislav Bodik
Mark D. Hill
2
Why Do You Need a Recorder?
  • gcc sim.c
  • a.out
  • Segmentation fault

gdb a.out gdbgt run Program received SIGSEGV. In
get() at hash.c45 45 a bucket-gtd
gdb a.out gdbgt run Program exited normally. gdbgt
gcc para-sim.c a.out Segmentation fault
gdb a.out log gdbgt run Program received
SIGSEGV. In get() at para-hash.c67 67 a
bucket-gtd
gcc para-sim.c a.out Segmentation fault Race
recorded in log
3
Ideally
Long recording small log
Low runtime overhead
Low cost
gdb a.out log gdbgt run Program received
SIGSEGV. In get() at para-hash.c67 67 a
bucket-gtd
gcc para-sim.c a.out Segmentation fault Race
recorded in log
4
Better and Better Recorders
5
A New Recorder
1 Byte/Kilo- Instruction ASPLOS06
  • This talk covers only RTR
  • Regulated Transitive Reduction algorithm

Result One more step toward practical
6
Outline
Race Recording
RTR Algorithm
Compress log during recording ? replay more
regularly
Results with Commercial Workloads
Conclusion
7
Technically, whats race recording?
8
Race Recording
Thread I
Thread J
Thread I
Thread J
X 1 X print(X)
- - - X X5 -
X 1 X print(X)
- X X5 - -
Original
Replay
X6
X10
9
Terminologies and Assumptions
Dependence (black)
Conflicts (red)
Thread I
Thread J
Thread I
Thread J
ld A
add
ld A
add
st B
st B
st C
st C
st C
Log
st C
ld B
ld B
ld D
st A
ld D
st A
sub
sub
st C
st C
ld B
ld B
st D
st D
Recording
Replay
Goal Reproduce same conflicts with minimum log
data
10
Regulated Transitive Reduction (RTR)
11
Log All Conflicts
Thread I
Thread J
ld A
add
st B
st C
st C
ld B
st A
ld D
sub
st C
ld B
st D
Replay
But too many conflicts
12
Netzers Transitive Reduction (TR)
Thread I
Thread J
TR reduced
1
1
ld A
add
st B
st C
2
2
st C
ld B
3
3
st A
ld D
4
4
sub
st C
5
5
ld B
st D
6
6
Replay
How to further reduce log size?
13
The Intuition of the RTR Algorithm
After Reduction
14
Stricter Dependences to Aid Vectorization
Thread I
Thread J
1
1
ld A
add
st B
st C
2
2
st C
ld B
3
3
st A
ld D
4
4
Replay
Fewer dependencies to log
15
Compress Vectorized Dependencies
Thread I
Thread J
1
1
ld A
add
st B
st C
2
2
st C
ld B
3
3
st A
ld D
4
4
sub
st C
5
5
ld B
st D
6
6
Replay
TR?RTR fewer deps fewer byte/dep
16
Deadlock Avoidance of RTR
Thread I
Thread J
1
1
ld A
add
st B
st C
2
2
st C
ld B
3
3
st A
ld D
4
4
sub
st C
5
5
ld B
st D
6
6
Recording
Limit the strict dependencies (see paper)
17
Results with Commercial Workloads
18
Full-system Simulation Method
  • Commercial server hardware
  • GEMS http//www.cs.wisc.edu/gems
  • Full-system (OS application) executions
  • 4-core CMP (Sequential Consistent)
  • 1-way in-order issue, 2 GHz,
  • 64KB I/D L1, 4MB L2, 64byte lines, MOSI directory
  • Commercial server software
  • Apache static web serving
  • SpecJBB middleware
  • OLTP TPC-C like
  • Zeus static web serving

19
Log Size 1 byte/KI
Less buffer, longer recording, smaller logs
20
RTR vs. Netzers TR
Log Size
  • 28 smaller log
  • TR was optimal

TR
RTR
21
Why Does RTR Work Well?
  • RTR
  • Instructions execute at similar speed
  • Dependencies are often vectorizable

22
A New Recorder
  • Less hardware TSO not covered
  • Equally important
  • More details in the paper

Less Hardware ASPLOS06
SC TSO ASPLOS06
Result One more step toward practical
23
Conclusion
  • Race recording ? Counter nondeterminism
  • RTR ? 1 byte/kilo-instruction
  • Based on Netzers transitive reduction
  • Create stricter dependencies
  • Vectorize dependencies to compress log
  • Avoid overly-strict hence no deadlock
  • Future work
  • Support snooping, SMT, replayer
Write a Comment
User Comments (0)
About PowerShow.com