Nonintrusive onthefly data race detection using execution replay - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Nonintrusive onthefly data race detection using execution replay

Description:

AADEBUG 2000 - MUNCHEN. Non-intrusive on-the-fly data race detection using execution replay ... Huge overhead causing probe effect and Heisenbugs ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 46
Provided by: ron80
Category:

less

Transcript and Presenter's Notes

Title: Nonintrusive onthefly data race detection using execution replay


1
Non-intrusive on-the-fly data race detection
using execution replay
AADEBUG 2000 - MUNCHEN
  • Michiel Ronsse - Koen De Bosschere
  • Ghent University - Belgium

2
Contents
  • Introduction
  • Non-determinism data races
  • RecPlay
  • Method
  • Implementation
  • Example
  • Experimental Evaluation
  • Conclusions

3
Introduction
  • Developing parallel programs for multiprocessors
    with shared memory is considered difficult
  • number of threads running simultaneously
  • co-operation synchronisation through shared
    memory
  • too much synchronisation deadlock
  • too little synchronisation race condition
  • cyclic debugging is impossible due to
    non-deterministic nature of most parallel
    programs ? program execution is not repeatable

4
Causes of non-determinism
  • Sequential Programs input (keyboard, disk,
    network), signals, interrupts, certain system
    calls (gettimeofday(),)
  • Parallel programs race conditions
  • two threads
  • accessing the same shared variable (memory
    location)
  • in an unsynchronised way
  • and at least one thread modifies the variable

5
Example code
include ltpthread.hgt unsigned global5 thread1(
) globalglobal6 thread2() globalglobal7
main() pthread_t t1,t2 pthread_create(t1,
NULL, thread1, NULL) pthread_create(t2, NULL,
thread2, NULL) pthread_join(t1,
NULL) pthread_join(t2, NULL) printf(globald
\n, global)
6
Possible executions
L(5)
L(5)
L(5)
L(5)
L(5)
A
A
A
A
S(11)
A
S(11)
L(11)
S(12)
S(12)
S(11)
A
S(18)
global18
global11
global12
7
Race conditions
  • Two types
  • synchronisation races
  • doesnt allow us to use cycli debugging
  • is not a bug, is desired non-determinism
  • data races
  • doesnt allow us to use cyclic debugging
  • is a bug, is undesired non-determinism
  • distinction is a matter of abstraction
  • Automatic of data races detection is possible
  • collect all memory references
  • check parallel references

8
Detecting data races
  • Static methods
  • checking the source code for all possible
    executions with all possible input
  • NP complete ? not feasible
  • Dynamic methods
  • during an actual execution gt only detects data
    races during this execution
  • Removal requires cyclic debugging

9
Dynamic data race detection
  • Piece of code between two consecutive
    synchronisation operations a segment
  • We collect two sets for all segments i of all
    thread L(i) and S(i) with the addresses of all
    load and store operations
  • For all parallel segments,

gives the list of conflicting addresses.
10
Existing race detection methods
  • Huge overhead causing probe effect and Heisenbugs
  • Only detect the existence of a data race (and the
    variable), not the instructions involved.
  • It is a bug, we need cyclic debugging!

11
RecPlay
  • Synchronisation races execution replay
  • Data races
  • detect
  • also enables cyclic debugging
  • Allows you to detect/remove the first data race
  • Three phases
  • record the order of the synchronisation
    operations
  • replay the synchronisation operations and check
    for data races
  • normal replay, without checking for data races

12
Overview
Replay ident.
Replay debug
Choose input
Replay detect
The end
Record
Replay debug
Choose new input
Automatic
Requires user intervention
13
Instrumentation
  • JiTI (Just in Time Instrumentation) was developed
    especially for RecPlay, but it is a generic
    instrumentation tool
  • Instruments memory and synchronisation operations
  • Deals correctly with data in code, code in data,
    self-modifying code
  • Clones processes the original process is used
    for the data and the instrumented clone is used
    for the code
  • No need for recompilation, relinking or
    instrumentation of files.

14
Execution replay
  • ROLT (Reconstruction of Lamport Timestamps) is
    used for tracing/replaying the synchronisation
    operations
  • Attaches a scaler Lamport timestamp to each
    synchronisation operation
  • Delaying synchronisation operations for
    operations with a smaller timestamp suffices for
    a correct replay
  • We only need to log a small subset of all
    operations

15
Collecting memory operations
  • We need two lists of adresses per segment i L(i)
    and S(i)
  • A multilevel bitmap is used
  • low memory consumption
  • comparing two bitmaps is easy
  • We lose information two accesses to the same
    variable are counted once. This is however no
    problem for data race detection

16
Memory bitmap
9 bit
9 bit
14 bit
17
Detecting parallel segments
  • A vectorclock is attached to each segment
  • All segment information (two bitmapsvector
    timestamps) is kept on a list L.
  • Each new segment is compared against the segments
    on list L.

18
Detecting obsolete segments
  • Obsolete segments should be removed from list L.
  • We use snooped matrix clock in order to detect
    these segments

19
Detecting obsolete segments
obsolete segment
segment on list L
segment in execution
point of execution
the future
20
Identification phase
  • If a data race is detected, we know
  • the address involved
  • the type of operations involved (load or store)
  • the threads involved
  • the segments containing the racing instructions
  • We need another replayed execution to find the
    racing instructions themselves ( call stack, )
  • This replay executes at full speed till the
    racing segments start executing.

21
An Example
B?2
22
An Example
A?1
B?2
C?4
P(S1)
23
An Example
A?1
B?2
C?4
P(S1)
24
An Example
A?1
B?2
V(S1)
C?4
P(S1)
25
An Example
A?1
B?2
V(S1)
C?4
P(S1)
26
An Example
A?1
B?2
V(S1)
C?4
P(S1)
27
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
28
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
29
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
30
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
31
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
32
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
33
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
V(S3)
34
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
V(S3)
35
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
V(S3)
P(S3)
36
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
V(S3)
P(S3)
37
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
V(S3)
P(S3)
38
An Example
?
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
V(S3)
P(S3)
39
An Example
?
A?1
B?2
V(S1)
?
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
V(S3)
P(S3)
40
An Example
A?1
B?2
V(S1)
C?4
P(S1)
C?AB
A?3
V(S2)
P(S2)
V(S3)
P(S3)
41
Experimental Evaluation
  • RecPlay has been implemented for Solaris running
    on SPARC multiprocessors
  • Tested on a SUN SparcServer 1000 with 4
    processors
  • SPLASH-2 was used as a benchmark
  • number of multithreaded numeric applications,
    such as fast fourier transform, a raytracer, ...
  • Several data races were found, including in
    SPLASH-2

42
Basic performance of RecPlay
43
Segments with memory accesses
44
Efficiency of the ROLT mechanism
45
Conclusions
  • RecPlay is a practical and effictient tool for
    detecting and removing data races
  • RecPlay also make cyclic debugging possible
  • Three types of clocks (scalar, vector and matrix)
    are used to enable a fast and memory-effictient
    implementation
  • Data races have been found
Write a Comment
User Comments (0)
About PowerShow.com