SigRace: Signature-Based Data Race Detection - PowerPoint PPT Presentation

About This Presentation
Title:

SigRace: Signature-Based Data Race Detection

Description:

SigRace: Signature-Based Data Race Detection Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas Computer Science Department University of Illinois at ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 125
Provided by: Abdullah83
Category:

less

Transcript and Presenter's Notes

Title: SigRace: Signature-Based Data Race Detection


1
SigRace Signature-Based Data Race Detection
  • Abdullah Muzahid, Dario Suarez, Shanxiang Qi
    Josep Torrellas

Computer Science DepartmentUniversity of
Illinois at Urbana-Champaignhttp//iacoma.cs.uiuc
.edu
Universidad de Zaragoza, Spain
2
Debugging Multithreaded Programs
  • Debugging a multithreaded program has a lot in
  • common with medieval torture methods
  • -- Random quote found via Google search

3
Data Race
  • Two threads access the same variable without
    intervening synchronization and at least one is a
    write
  • Common bug
  • Hard to detect and reproduce

4
Dynamic Data Race Detection
  • Mainly two approaches

5
Dynamic Data Race Detection
  • Mainly two approaches
  • Lockset Finds violation of locking discipline

6
Dynamic Data Race Detection
  • Mainly two approaches
  • Lockset Finds violation of locking discipline
  • Happened-Before Finds concurrent conflicting
    accesses

7
Happened-Before Approach
8
Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
9
Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
10
Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
Unlock L
2, 0
11
Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
Unlock L
2, 0
12
Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
Unlock L
2, 0
Lock L
13
Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
0, 1
Unlock L
2, 0
Lock L
14
Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
0, 1
Unlock L
2, 0

Lock L
2, 1
15
Happened-Before Approach
Thread 0
Thread 1
0, 0
0, 0
Lock L
1, 0
Unlock L
2, 0
Lock L
2, 1
  • Epoch sync to sync

16
Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
  • Epoch sync to sync

17
Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
  • Epoch sync to sync

18
Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
  • Epoch sync to sync

19
Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
  • Epoch sync to sync

20
Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
  • Epoch sync to sync
  • a, b happened before e

21
Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
  • Epoch sync to sync
  • a, b happened before e

22
Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
b
1, 0
Unlock L
2, 0
c
Lock L
2, 1
e
  • Epoch sync to sync
  • a, b happened before e
  • c, d unordered

23
Happened-Before Approach
Thread 0
Thread 1
a
0, 0
0, 0
d
Lock L
x
b
1, 0
Data Race
Unlock L
2, 0
c
Lock L
2, 1
e
x
  • Epoch sync to sync
  • a, b happened before e
  • c, d unordered

24
Software Implementation
  • Need to instrument every memory access
  • 10x 50x slowdown
  • Not suitable for production runs

25
Hardware Implementation
26
Hardware Implementation


C1
C2
27
Hardware Implementation


C1
C2
TS
TS
28
Hardware Implementation
x
ts1


C1
C2
TS
TS
29
Hardware Implementation
WR
x
x
ts1
ts2


C1
C2

TS
TS
ts2
30
Hardware Implementation
check
WR
x
x
ts1
ts2
ts2


C1
C2

TS
TS
ts2
31
Limitations of HW Approaches
32
Limitations of HW Approaches
P1
P2
C1


C2
  • Modify cache and coherence protocol

33
Limitations of HW Approaches
P1
P2
check
ts1
C1


C2
ts2

ts2
  • Modify cache and coherence protocol
  • Perform checking at least on every coherence
    transaction

34
Limitations of HW Approaches
P1
P2
C1


C2
  • Modify cache and coherence protocol
  • Perform checking at least on every coherence
    transaction
  • Lose detection ability when cache line is
    displaced or invalidated

35
Our Contributions
  • SigRace Novel HW mechanism for race detection
    based on signatures
  • Simple HW
  • Cache and coherence protocol are unchanged
  • Higher coverage than existing HW schemes
  • Detect races even if the line is
    displaced/invalidated
  • Usable on-the-fly in production runs
  • SigRace finds 150 more injected races than a
    state-of-the-art HW proposal

36
Outline
  • Motivation
  • Main Idea
  • Implementation
  • Results
  • Conclusions

37
Main Idea
Address Signature Happened-before
38
Hardware Address Signatures
39
Hardware Address Signatures
40
Hardware Address Signatures
  • Logical AND for intersection
  • Has false positives but not false negatives

41
Using Signatures for Race Detection
42
Using Signatures for Race Detection
sync
Sig
TS
sync
43
Using Signatures for Race Detection
sync
Block
Sig1
TS1
sync
  • Block is a fixed number of dynamic instructions
    (not a cache block or basic block or
    atomic block)

44
Using Signatures for Race Detection
sync
Block
Sig1
TS1
sync
Race Detection Module
45
Using Signatures for Race Detection
sync
Block
Ø
TS1
sync
Race Detection Module
46
Using Signatures for Race Detection
sync
Block
TS1
Sig2
sync
Race Detection Module
47
Using Signatures for Race Detection
sync
Block
TS1
Sig2
sync
Race Detection Module
48
Using Signatures for Race Detection
sync
Block
TS1
Ø
sync
Race Detection Module
49
Using Signatures for Race Detection
sync
sync
Block
TS2
Sig3
Race Detection Module
50
Using Signatures for Race Detection
sync
sync
Block
TS2
Sig3
Race Detection Module
51
Using Signatures for Race Detection
sync
sync
Block
TS2
Sig3
Race Detection Module
52
Using Signatures for Race Detection
sync
sync
Sig ? Sig
Block
TS2
Sig3
Race Detection Module
53
On Chip Race Detection Module (RDM)
54
On Chip Race Detection Module (RDM)
P1
P2
Q1
Q2
RDM
Chip
55
On Chip Race Detection Module (RDM)
P1
P2
T1 R1 W1
Q1
Q2
RDM
Chip
56
On Chip Race Detection Module (RDM)
P1
P2
T1 R1 W1
Q1
Q2
RDM
Chip
57
On Chip Race Detection Module (RDM)
P1
P2
T1 R1 W1
Q1
Q2
RDM
Chip
58
On Chip Race Detection Module (RDM)
P1
P2
T2 R2 W2
T1 R1 W1
Q1
Q2
RDM
Chip
59
On Chip Race Detection Module (RDM)
P1
P2
T2 R2 W2
T1 R1 W1
Q1
Q2
RDM
Chip
60
On Chip Race Detection Module (RDM)
P1
P2
T2 R2 W2
T1 R1 W1
Q1
Q2
RDM
Chip
61
On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ W2 ?
RJ Else stop
P1
P2
TJ RJ WJ
T2 R2 W2
T1 R1 W1
Q1
Q2
RDM
Chip
62
On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ W2
? RJ Else stop
P1
P2
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
63
On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ W2
? RJ Else stop
P1
P2
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
64
On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ
W2 ? RJ Else stop
P1
P2
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
65
On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ
W2 ? RJ Else stop
P1
P2
Done in Background
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
66
On Chip Race Detection Module (RDM)
If T2 TJ unordered R2 ? WJ W2 ? WJ
W2 ? RJ Else stop
P1
P2
False Positives
T2 R2 W2
T1 R1 W1
TJ RJ WJ
Q1
Q2
RDM
Chip
67
Re-execution
  • Needed for
  • Identify the accesses involved
  • Discard if a false positive

68
Support for Re-execution
  • Take periodic checkpoints ReVive Prvulovic et
    al, ISCA02
  • Log inputs (interrupts, sys calls, etc)
  • Save synchronization history in TS Log
  • Timestamp at sync points

69
Modes of Operation
  • Normal Execution
  • Re-Execution Bring the program to just before
    the race
  • Race Analysis Pinpoint the racy accesses or
    discarding the false positive

70
SigRace Re-execution Mode
  • Can be done in another machine

71
SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state

72
SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state

73
SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state

checkpoint
T0
T1
T2
sync
sync
sync
sync
s2
sync
Data Race
s1
?
Conflict Sig
Conflict Sig
74
SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state

checkpoint
checkpoint
T0
T1
T2
T0
T1
T2
sync
sync
sync
sync
s2
sync
Data Race
s1
?
Conflict Sig
Conflict Sig
75
SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state

checkpoint
checkpoint
T0
T1
T2
T0
T1
T2
sync
sync
sync
sync
sync
sync
sync
sync
s2
sync
sync
Data Race
s1
?
Use the TS Log
Conflict Sig
Conflict Sig
76
SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
sync
sync
77
SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
sync
sync
78
SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
sync
ld
?
sync
Conflict Sig
Conflict Sig
79
SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
log
sync
ld
?
sync
Conflict Sig
Conflict Sig
80
SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
log
sync
ld
?
sync
Conflict Sig
Conflict Sig
sync
sync
81
SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
log
sync
sync
log
sync
sync
82
SigRace Analysis Mode
checkpoint
T0
T1
T2
sync
sync
sync
log
sync
sync
log
sync
sync
  • Pinpoints racy addresses or,
  • Identifies and discards false positives

83
Outline
  • Motivation
  • Main Idea
  • Implementation
  • Results
  • Conclusions

84
New Instructions
  • collect_on
  • Enable R and W address collection in current
    thread

85
New Instructions
  • collect_on
  • Enable R and W address collection in current
    thread
  • collect_off
  • Disable R and W address collection in current
    thread

86
New Instructions
  • sync_reached

87
New Instructions
  • sync_reached
  • Dump TS, R and W

TS R W
P
Network
RDM
88
New Instructions
  • sync_reached
  • Dump TS, R and W
  • Clear signatures

TS Ø Ø
P
Network
RDM
89
New Instructions
  • sync_reached
  • Dump TS, R and W
  • Clear signatures
  • Update TS

TS Ø Ø
P
Network
RDM
90
Modifications in Sync Libraries
91
Modifications in Sync Libraries
  • Synchronization object

92
Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (
unlock(1.lock)
)
93
Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (
TS R W
P
sync_reached
Network
unlock(1.lock)
RDM
)
94
Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (
TS Ø Ø
P
sync_reached
Network
unlock(1.lock)
RDM
)
95
Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (
TS Ø Ø
P
sync_reached
Network
unlock(1.lock)
RDM
)
96
Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (
sync_reached
lock
TS
1.timestamp TS
unlock(1.lock)
)
97
Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (
sync_reached
1.timestamp TS
unlock(1.lock)
)
98
Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (
sync_reached
1.timestamp TS
unlock(1.lock)
AppendtoTSLog(TS)
TS Log
TS
)
99
Modifications in Sync Libraries
  • Synchronization object
  • Lock macro

LOCK (

lock(1.lock)

)
100
Modifications in Sync Libraries
  • Synchronization object
  • Lock macro

LOCK (

lock(1.lock)
TS GenerateTS (TS,
lock
timestamp
1.timestamp)

)
101
Modifications in Sync Libraries
  • Synchronization object
  • Lock macro

Transparent to Application Code
LOCK (

lock(1.lock)
TS GenerateTS (TS,
lock
timestamp
1.timestamp)

)
102
Other Topics in Paper
  • Easy to virtualize
  • Queue Overflow
  • Detailed HW structures

103
Outline
  • Motivation
  • Main Idea
  • Implementation
  • Results
  • Conclusions

104
Experimental Setup
  • PIN Binary Instrumention Tool
  • Default parameters
  • Benchmarks SPLASH2, PARSEC
  • of proc 8
  • Signature size 2 Kbits
  • Block size 2,000 ins
  • Queue size 16 entries
  • Checkpoint interval 1 Million ins

105
Race Detection Ability
  • Three configurations
  • SigRace Default
  • SigRace Ideal Stores every signature between 2
    checkpoints
  • ReEnact Prvulovic et al, ISCA03 Cache based
    approach with timestamp per word

106
Race Detection Ability
App Ideal SigRace Default SigRce ReEnact
Cholesky 16 16 16
Barnes 11 11 6
Volrend 27 27 18
Ocean 1 1 1
Radiosity 15 15 12
Raytrace 4 4 3
Water-sp 8 4 2
Streamcluster 13 12 13
Total
95
90
70
107
Race Detection Ability
  • More coverage than ReEnact

App Ideal SigRace Default SigRce ReEnact
Cholesky 16 16 16
Barnes 11 11 6
Volrend 27 27 18
Ocean 1 1 1
Radiosity 15 15 12
Raytrace 4 4 3
Water-sp 8 4 2
Streamcluster 13 12 13
Total
95
90
70
108
Race Detection Ability
  • More coverage than ReEnact
  • Coverage comparable to ideal configuration

App Ideal SigRace Default SigRce ReEnact
Cholesky 16 16 16
Barnes 11 11 6
Volrend 27 27 18
Ocean 1 1 1
Radiosity 15 15 12
Raytrace 4 4 3
Water-sp 8 4 2
Streamcluster 13 12 13
Total
95
90
70
109
Injected Races
  • Removed one dynamic sync per run
  • Each application runs 25 times with diff sync
    elimination

110
Injected Races
111
Injected Races
  • More overall coverage than ReEnact

112
Injected Races
  • More overall coverage than ReEnact
  • 150 more coverage

113
Injected Races
  • More overall coverage than ReEnact
  • 150 more coverage

114
Conclusions
  • Proposed SigRace
  • Simple HW
  • Cache and coherence protocol are unchanged
  • Higher coverage than existing HW schemes
  • Detect races even if the line is
    displaced/invalidated
  • Usable on-the-fly in production runs
  • SigRace finds 150 more injected races than
    word-based ReEnact

115
SigRace Signature-Based Data Race Detection
  • Abdullah Muzahid, Dario Suarez, Shanxiang Qi
    Josep Torrellas

Computer Science DepartmentUniversity of
Illinois at Urbana-Champaignhttp//iacoma.cs.uiuc
.edu
Universidad de Zaragoza, Spain
116
  • Back Up Slides

117
Execution Overhead
  • No overhead in generating signatures (HW)
  • Additional instructions are negligible
  • Main overheads
  • Checkpointing (ReVive 6.3)
  • Network traffic (63 bytes per 1000 ins -
    compressed)
  • Re-execution (depends on false positives race
    position)
  • Can be done offline

118
Network Traffic Overhead
? 1 cache line
63
119
Re-execution Overhead
  • Instructions re-executed until the first true
    data race is analyzed are shown as overhead
  • In this process, it may also encounter many false
    positive races
  • Instructions re-executed to analyze only the true
    race are shown as true overhead
  • Instructions re-executed to filter out the false
    positives are shown as false overhead

120
Re-execution Overhead
Modest overhead
22
121
False Positives
  • Parallel bloom filters with H3 hash function

Low False Positive
1.57
122
Virtualization
123
Virtualization
  • RDM uses as many queues as the number of threads
  • Timestamp is accessed by thread id
  • Thread id remains same even after migration
  • Timestamps, flags, conflict signature are saved
    and restored at context switch
  • RDM intersects incoming signatures against all
    other threads(even inactive ones) signatures
  • Threads can be re-executed without any scheduling
    constraints

124
Scalability
  • For small of proc., scalability is not a problem
  • The operation of RDM can be pipelined
  • Simple repetitive operation
  • Network traffic (compressed message) around
    63Bytes/thousand ins
  • Checkpoint is an issue.
Write a Comment
User Comments (0)
About PowerShow.com