Title: SigRace: Signature-Based Data Race Detection
1SigRace Signature-Based Data Race Detection
- Abdullah Muzahid, Dario Suarez, Shanxiang Qi
Josep Torrellas
Computer Science DepartmentUniversity of
Illinois at Urbana-Champaignhttp//iacoma.cs.uiuc
.edu
Universidad de Zaragoza, Spain
2Debugging Multithreaded Programs
- Debugging a multithreaded program has a lot in
- common with medieval torture methods
- -- Random quote found via Google search
3Data Race
- Two threads access the same variable without
intervening synchronization and at least one is a
write
- Hard to detect and reproduce
4Dynamic Data Race Detection
5Dynamic Data Race Detection
- Mainly two approaches
- Lockset Finds violation of locking discipline
6Dynamic Data Race Detection
- Mainly two approaches
- Lockset Finds violation of locking discipline
- Happened-Before Finds concurrent conflicting
accesses
7Happened-Before Approach
8Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
9Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
10Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
Unlock L
2, 0
11Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
Unlock L
2, 0
12Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
Unlock L
2, 0
Lock L
13Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
0, 1
Unlock L
2, 0
Lock L
14Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
0, 1
Unlock L
2, 0
Lock L
2, 1
15Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
Unlock L
2, 0
Lock L
2, 1
16Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
17Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
18Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
19Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
20Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
- Epoch sync to sync
- a, b happened before e
21Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
- Epoch sync to sync
- a, b happened before e
22Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
- Epoch sync to sync
- a, b happened before e
- c, d unordered
23Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
x
b
1, 0
Data Race
Unlock L
2, 0
c
Lock L
2, 1
e
x
- Epoch sync to sync
- a, b happened before e
- c, d unordered
24Software Implementation
- Need to instrument every memory access
- Not suitable for production runs
25Hardware Implementation
26Hardware Implementation
C1
C2
27Hardware Implementation
C1
C2
TS
TS
28Hardware Implementation
x
ts1
C1
C2
TS
TS
29Hardware Implementation
WR
x
x
ts1
ts2
C1
C2
TS
TS
ts2
30Hardware Implementation
check
WR
x
x
ts1
ts2
ts2
C1
C2
TS
TS
ts2
31Limitations of HW Approaches
32Limitations of HW Approaches
P1
P2
C1
C2
- Modify cache and coherence protocol
33Limitations of HW Approaches
P1
P2
check
ts1
C1
C2
ts2
ts2
- Modify cache and coherence protocol
- Perform checking at least on every coherence
transaction
34Limitations of HW Approaches
P1
P2
C1
C2
- Modify cache and coherence protocol
- Perform checking at least on every coherence
transaction - Lose detection ability when cache line is
displaced or invalidated
35Our Contributions
- SigRace Novel HW mechanism for race detection
based on signatures
- Simple HW
- Cache and coherence protocol are unchanged
- Higher coverage than existing HW schemes
- Detect races even if the line is
displaced/invalidated
- Usable on-the-fly in production runs
- SigRace finds 150 more injected races than a
state-of-the-art HW proposal
36Outline
- Motivation
- Main Idea
- Implementation
- Results
- Conclusions
37Main Idea
Address Signature Happened-before
38Hardware Address Signatures
39Hardware Address Signatures
40Hardware Address Signatures
- Logical AND for intersection
- Has false positives but not false negatives
41Using Signatures for Race Detection
42Using Signatures for Race Detection
sync
Sig
TS
sync
43Using Signatures for Race Detection
sync
Block
Sig1
TS1
sync
- Block is a fixed number of dynamic instructions
(not a cache block or basic block or
atomic block)
44Using Signatures for Race Detection
sync
Block
Sig1
TS1
sync
Race Detection Module
45Using Signatures for Race Detection
sync
Block
Ø
TS1
sync
Race Detection Module
46Using Signatures for Race Detection
sync
Block
TS1
Sig2
sync
Race Detection Module
47Using Signatures for Race Detection
sync
Block
TS1
Sig2
sync
Race Detection Module
48Using Signatures for Race Detection
sync
Block
TS1
Ø
sync
Race Detection Module
49Using Signatures for Race Detection
sync
sync
Block
TS2
Sig3
Race Detection Module
50Using Signatures for Race Detection
sync
sync
Block
TS2
Sig3
Race Detection Module
51Using Signatures for Race Detection
sync
sync
Block
TS2
Sig3
Race Detection Module
52Using Signatures for Race Detection
sync
sync
Sig ? Sig
Block
TS2
Sig3
Race Detection Module
53On Chip Race Detection Module (RDM)
54On Chip Race Detection Module (RDM)
P1
P2
Q1
Q2
RDM
Chip
55On Chip Race Detection Module (RDM)
P1
P2
T1 R1 W1
Q1
Q2
RDM
Chip
56On Chip Race Detection Module (RDM)
P1
P2
T1 R1 W1
Q1
Q2
RDM
Chip
57On Chip Race Detection Module (RDM)
P1
P2
T1 R1 W1
Q1
Q2
RDM
Chip
58On Chip Race Detection Module (RDM)
P1
P2
T2 R2 W2
T1 R1 W1
Q1
Q2
RDM
Chip
59On Chip Race Detection Module (RDM)
P1
P2
T2 R2 W2
T1 R1 W1
Q1
Q2
RDM
Chip
60On Chip Race Detection Module (RDM)
P1
P2
T2 R2 W2
T1 R1 W1
Q1
Q2
RDM
Chip
61On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ W2 ?
RJ Else stop
P1
P2
TJ RJ WJ
T2 R2 W2
T1 R1 W1
Q1
Q2
RDM
Chip
62On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ W2
? RJ Else stop
P1
P2
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
63On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ W2
? RJ Else stop
P1
P2
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
64On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ
W2 ? RJ Else stop
P1
P2
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
65On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ
W2 ? RJ Else stop
P1
P2
Done in Background
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
66On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ
W2 ? RJ Else stop
P1
P2
False Positives
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
67Re-execution
- Identify the accesses involved
- Discard if a false positive
68Support for Re-execution
- Take periodic checkpoints ReVive Prvulovic et
al, ISCA02
- Log inputs (interrupts, sys calls, etc)
- Save synchronization history in TS Log
- Timestamp at sync points
69Modes of Operation
- Re-Execution Bring the program to just before
the race
- Race Analysis Pinpoint the racy accesses or
discarding the false positive
70SigRace Re-execution Mode
- Can be done in another machine
71SigRace Re-execution Mode
- Can be done in another machine
- Periodic checkpoint of memory state
72SigRace Re-execution Mode
- Can be done in another machine
- Periodic checkpoint of memory state
73SigRace Re-execution Mode
- Can be done in another machine
- Periodic checkpoint of memory state
checkpoint
T0
T1
T2
sync
sync
sync
sync
s2
sync
Data Race
s1
?
Conflict Sig
Conflict Sig
74SigRace Re-execution Mode
- Can be done in another machine
- Periodic checkpoint of memory state
checkpoint
checkpoint
T0
T1
T2
T0
T1
T2
sync
sync
sync
sync
s2
sync
Data Race
s1
?
Conflict Sig
Conflict Sig
75SigRace Re-execution Mode
- Can be done in another machine
- Periodic checkpoint of memory state
checkpoint
checkpoint
T0
T1
T2
T0
T1
T2
sync
sync
sync
sync
sync
sync
sync
sync
s2
sync
sync
Data Race
s1
?
Use the TS Log
Conflict Sig
Conflict Sig
76SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
sync
sync
77SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
sync
sync
78SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
sync
ld
?
sync
Conflict Sig
Conflict Sig
79SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
log
sync
ld
?
sync
Conflict Sig
Conflict Sig
80SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
log
sync
ld
?
sync
Conflict Sig
Conflict Sig
sync
sync
81SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
log
sync
sync
log
sync
sync
82SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
log
sync
sync
log
sync
sync
- Pinpoints racy addresses or,
- Identifies and discards false positives
83Outline
- Motivation
- Main Idea
- Implementation
- Results
- Conclusions
84New Instructions
- collect_on
- Enable R and W address collection in current
thread
85New Instructions
- collect_on
- Enable R and W address collection in current
thread - collect_off
- Disable R and W address collection in current
thread
86New Instructions
87New Instructions
- sync_reached
- Dump TS, R and W
TS R W
P
Network
RDM
88New Instructions
- sync_reached
- Dump TS, R and W
- Clear signatures
TS Ø Ø
P
Network
RDM
89New Instructions
- sync_reached
- Dump TS, R and W
- Clear signatures
- Update TS
TS Ø Ø
P
Network
RDM
90Modifications in Sync Libraries
91Modifications in Sync Libraries
92Modifications in Sync Libraries
- Synchronization object
- Unlock macro
UNLOCK (
unlock(1.lock)
)
93Modifications in Sync Libraries
- Synchronization object
- Unlock macro
UNLOCK (
TS R W
P
sync_reached
Network
unlock(1.lock)
RDM
)
94Modifications in Sync Libraries
- Synchronization object
- Unlock macro
UNLOCK (
TS Ø Ø
P
sync_reached
Network
unlock(1.lock)
RDM
)
95Modifications in Sync Libraries
- Synchronization object
- Unlock macro
UNLOCK (
TS Ø Ø
P
sync_reached
Network
unlock(1.lock)
RDM
)
96Modifications in Sync Libraries
- Synchronization object
- Unlock macro
UNLOCK (
sync_reached
lock
TS
1.timestamp TS
unlock(1.lock)
)
97Modifications in Sync Libraries
- Synchronization object
- Unlock macro
UNLOCK (
sync_reached
1.timestamp TS
unlock(1.lock)
)
98Modifications in Sync Libraries
- Synchronization object
- Unlock macro
UNLOCK (
sync_reached
1.timestamp TS
unlock(1.lock)
AppendtoTSLog(TS)
TS Log
TS
)
99Modifications in Sync Libraries
- Synchronization object
- Lock macro
LOCK (
lock(1.lock)
)
100Modifications in Sync Libraries
- Synchronization object
- Lock macro
LOCK (
lock(1.lock)
TS GenerateTS (TS,
lock
timestamp
1.timestamp)
)
101Modifications in Sync Libraries
- Synchronization object
- Lock macro
Transparent to Application Code
LOCK (
lock(1.lock)
TS GenerateTS (TS,
lock
timestamp
1.timestamp)
)
102Other Topics in Paper
- Easy to virtualize
- Queue Overflow
- Detailed HW structures
103Outline
- Motivation
- Main Idea
- Implementation
- Results
- Conclusions
104Experimental Setup
- PIN Binary Instrumention Tool
- Default parameters
- Benchmarks SPLASH2, PARSEC
- of proc 8
- Signature size 2 Kbits
- Block size 2,000 ins
- Queue size 16 entries
- Checkpoint interval 1 Million ins
105Race Detection Ability
- SigRace Ideal Stores every signature between 2
checkpoints
- ReEnact Prvulovic et al, ISCA03 Cache based
approach with timestamp per word
106Race Detection Ability
App Ideal SigRace Default SigRce ReEnact
Cholesky 16 16 16
Barnes 11 11 6
Volrend 27 27 18
Ocean 1 1 1
Radiosity 15 15 12
Raytrace 4 4 3
Water-sp 8 4 2
Streamcluster 13 12 13
Total
95
90
70
107Race Detection Ability
- More coverage than ReEnact
App Ideal SigRace Default SigRce ReEnact
Cholesky 16 16 16
Barnes 11 11 6
Volrend 27 27 18
Ocean 1 1 1
Radiosity 15 15 12
Raytrace 4 4 3
Water-sp 8 4 2
Streamcluster 13 12 13
Total
95
90
70
108Race Detection Ability
- More coverage than ReEnact
- Coverage comparable to ideal configuration
App Ideal SigRace Default SigRce ReEnact
Cholesky 16 16 16
Barnes 11 11 6
Volrend 27 27 18
Ocean 1 1 1
Radiosity 15 15 12
Raytrace 4 4 3
Water-sp 8 4 2
Streamcluster 13 12 13
Total
95
90
70
109Injected Races
- Removed one dynamic sync per run
- Each application runs 25 times with diff sync
elimination
110Injected Races
111Injected Races
- More overall coverage than ReEnact
112Injected Races
- More overall coverage than ReEnact
113Injected Races
- More overall coverage than ReEnact
114Conclusions
- Simple HW
- Cache and coherence protocol are unchanged
- Higher coverage than existing HW schemes
- Detect races even if the line is
displaced/invalidated
- Usable on-the-fly in production runs
- SigRace finds 150 more injected races than
word-based ReEnact
115SigRace Signature-Based Data Race Detection
- Abdullah Muzahid, Dario Suarez, Shanxiang Qi
Josep Torrellas
Computer Science DepartmentUniversity of
Illinois at Urbana-Champaignhttp//iacoma.cs.uiuc
.edu
Universidad de Zaragoza, Spain
116 117Execution Overhead
- No overhead in generating signatures (HW)
- Additional instructions are negligible
- Checkpointing (ReVive 6.3)
- Network traffic (63 bytes per 1000 ins -
compressed)
- Re-execution (depends on false positives race
position) - Can be done offline
118Network Traffic Overhead
? 1 cache line
63
119Re-execution Overhead
- Instructions re-executed until the first true
data race is analyzed are shown as overhead - In this process, it may also encounter many false
positive races
- Instructions re-executed to analyze only the true
race are shown as true overhead
- Instructions re-executed to filter out the false
positives are shown as false overhead
120Re-execution Overhead
Modest overhead
22
121False Positives
- Parallel bloom filters with H3 hash function
Low False Positive
1.57
122Virtualization
123Virtualization
- RDM uses as many queues as the number of threads
- Timestamp is accessed by thread id
- Thread id remains same even after migration
- Timestamps, flags, conflict signature are saved
and restored at context switch - RDM intersects incoming signatures against all
other threads(even inactive ones) signatures - Threads can be re-executed without any scheduling
constraints
124Scalability
- For small of proc., scalability is not a problem
- The operation of RDM can be pipelined
- Simple repetitive operation
- Network traffic (compressed message) around
63Bytes/thousand ins