Title: AnomalyBased Bug Prediction, Isolation, and Validation: An Automated Approach for Software Debugging
1Anomaly-Based Bug Prediction, Isolation, and
Validation An Automated Approach for Software
Debugging
- Martin Dimitrov Huiyang Zhou
2Background Terminology
- Defect Erroneous piece of code (a bug)
- Infection The defect is triggered gt the
program state differs from what the programmer
intended. - Failure An observable error (crash, hang, wrong
results) in program behavior.
Terminologies are based on the book Why Programs
Fail by Andreas Zeller.
3Background From Defects to Failures
Erroneous code
Variable and input values
Observer sees failure
Figure from the book Why Programs Fail by A.
Zeller
Sane state
Infected state
Program execution
4Motivation
- The typical process of software debugging
involves - Examine the point of program failure and reason
backwards about the possible causes. - Create a hypothesis of what could be the root
cause. - Modify the program to verify the hypothesis.
- If the failure is still there, the search
resumes. - Software debugging is tedious and time consuming
! - In this work we propose an approach to automate
the debugging effort and pinpoint the failure
root cause.
5Presentation Outline
- Motivation
- Proposed approach
- Detecting anomalies (step 1)
- Isolating relevant anomalies (step 2)
- Validating anomalies (step 3)
- Experimental methodology
- Experimental results
- Conclusions
6Proposed Approach
Dynamic Instruction Stream
mov ...
cmp ...
jge ...
mov ...
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
time
7Proposed Approach
Dynamic Instruction Stream
cmp ...
jge ...
mov ...
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
time
8Proposed Approach
Dynamic Instruction Stream
jge ...
mov ...
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
time
9Proposed Approach
Dynamic Instruction Stream
mov ...
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
time
10Proposed Approach
Dynamic Instruction Stream
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
time
11Proposed Approach
Dynamic Instruction Stream
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
time
12Proposed Approach
Dynamic Instruction Stream
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
time
13Proposed Approach
Dynamic Instruction Stream
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
time
14Proposed Approach
Dynamic Instruction Stream
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
time
15Proposed Approach
Dynamic Instruction Stream
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
call ...
time
16Proposed Approach
Dynamic Instruction Stream
- A program failure is observed
- Crash
- Hang
- Incorrect results, etc.
- Start the automated debugging process
- The output of our approach is a ranked list of
instructions (the possible root-cause of failure)
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
call ...
mov ...
Failure
time
17Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
mov ...
cmp ...
jge ...
mov ...
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
time
18Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
cmp ...
jge ...
mov ...
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
time
19Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
jge ...
mov ...
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
time
20Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
mov ...
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
time
21Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
mov ...
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
time
22Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
lea ...
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
time
23Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
movl ...
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
time
24Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
inc ...
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
time
25Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
cmp ...
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
time
26Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
jl ...
movl ...
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
call ...
time
27Proposed ApproachStep 1 Detect anomalies in
program execution
Dynamic Instruction Stream
movl ...
- Each anomaly constitutes a hypothesis for the
root cause of program failure.
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
call ...
mov ...
Failure
time
28Proposed ApproachStep 2 Isolate the relevant
anomalies
Dynamic Instruction Stream
movl ...
- Create dynamic forward slices from the anomalies
to the failure point. - Discard anomalies which do not lead to the
failure point.
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
call ...
mov ...
Failure
time
29Proposed Approach Step 3 Validate the isolated
anomalies
Dynamic Instruction Stream
movl ...
- Automatically fix the anomaly and observe if
the program still fails.
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
call ...
mov ...
Failure
time
30Proposed ApproachStep 3 Validate the isolated
anomalies
Dynamic Instruction Stream
movl ...
- If the failure disappears we have a high
confidence the root cause have been pinpointed.
inc ...
cmp ...
jr ...
movl ...
inc ...
cmp ...
jl ...
test ...
jne ...
mov ...
call ...
No failure
mov ...
Success
time
31Detecting Program Anomalies (Step 1)
- When infected by a software bug the program is
likely to misbehave - Out-of-bounds addresses and values
- Unusual control paths
- Page faults
- Redundant computations, etc.
- Anomaly detection Infer program specifications
from passing runs and turn them into soft
assertions. - Learn program invariants during passing runs
- (e.g. variable i is always between 0 and
100) - Flag violated invariants during the failing run
- (e.g. Report anomaly if variable i is 101)
32Detecting Program Anomalies
- We use several anomaly detectors to monitor a
large spectrum of program invariants and catch
more bugs. - DIDUCE S. Hangal et al. , ICSE 2002
- Instructions tent to produce values/addresses
within a certain range (e.g. 0 lt i lt 100).
Detect violations of these invariants. - AccMon P. Zhou et al. , MICRO-37 2004
- Only a few static instructions access a given
memory location (load/store set locality). Signal
an anomaly if memory access does not belong to
the load/store set. - LoopCount
- Detect abnormal number of loop iterations.
33Detecting Program Anomalies
- void more_arrays ()
- . . .
- a_count STORE_INCR
- / Copy the old arrays. /
- for (indx 1 indx lt old_count indx)
- arraysindx old_aryindx
-
- / Initialize the new elements. /
- for ( indx lt v_count indx)/ defect /
- arraysindx NULL / infection /
- / Free the old elements. /
- if (old_count ! 0)
- free (old_ary) / crash /
-
Heap Memory
data
data
data
data
data
size
size
a_count ?
data
data
v_count ?
data
data
34Detecting Program Anomalies
- void more_arrays ()
- . . .
- a_count STORE_INCR
- / Copy the old arrays. /
- for (indx 1 indx lt old_count indx)
- arraysindx old_aryindx
-
- / Initialize the new elements. /
- for ( indx lt v_count indx)/ defect /
- arraysindx NULL / infection /
- / Free the old elements. /
- if (old_count ! 0)
- free (old_ary) / crash /
-
LoopCount Loop iterates more times than usual.
LoopCount Loop iterates more times than usual.
(false positive)
AccMon store instruction is not in store set of
this memory location.
AccMon store instruction is not in store set of
this memory location. (false positive)
DIDUCE Address of store instruction is out of
normal range.
DIDUCE Address of store instruction is out of
normal range. (false positive)
35Detecting Program Anomalies Architectural Support
- DIDUCE and AccMon capture invariants using
limited size caches structures, as proposed in
previous work - LoopCount utilizes the existing loop-branch
predictor to detect anomalies. - Advantages and disadvantages of hardware support
- Performance efficiency
- Portability
- Efficient ways to change or invalidate dynamic
instructions - Limited hardware resource may become a concern
36Isolating Relevant Anomalies (Step 2)
- Anomaly detectors alone are NOT effective for
debugging - May signal too many anomalies / false positives
- Tradeoff between bug coverage and number of false
positives - Our solution
- Allow aggressive anomaly detection for maximum
bug coverage - Automatically isolate only the relevant anomalies
by constructing dynamic forwards slices from the
anomaly to the failure point
37Isolating Relevant Anomalies Architectural
Support
- Add token(s) to each register and memory word.
- Detected anomalies set a token associated with
the destination memory word or register. - Tokens propagate based data dependencies.
- When the program fails, examine the point of
failure for token.
38Isolating Relevant Anomalies Architectural
Support
void more_arrays () . . . / Copy the
old arrays. / for (indx 1 indx lt
old_count indx) arraysindx
old_aryindx / Initialize the new
elements. / for ( indx lt v_count
indx)/ defect / arraysindx NULL
/ infection / / Free the old elements.
/ if (old_count ! 0) free
(old_ary) / crash /
Memory
Token
. . .
Failure mov ebx,0xc(edx)
39Isolating Relevant Anomalies Architectural
Support
- Problem Many tokens for each memory location/
register - Solution
- We leverage tagged architectures for information
flow tracking. - Use only one token (1 bit) (i.e., shared by all
anomalies ) - We leverage delta debugging A. Zeller, FSE 1999
to isolate the relevant anomalies automatically.
Number of Initial Anomalies
Number of Isolated Anomalies
40Delta-Debugging
41Validating Isolated Anomalies (Step 3)
- Validate the remaining anomalies by applying a
fix and observing if the program failure
disappears. - Our fix is to nullify the anomalous instruction
(turn it into no-op) - If the program succeeds, we have a high
confidence we have found the root cause (or at
least broken the infection chain)
42Validating Isolated Anomalies
void more_arrays () . . . / Initialize
the new elements. / for ( indx lt v_count
indx)/ defect / arraysindx NULL
/ infection / / Free the old elements.
/ if (old_count ! 0) free
(old_ary) / crash /
Memory
Token
data
data
0x0
0x0
0x0
data
data
data
size
size
data
Success
data
data
data
- The size information is not corrupted and the
program terminates successfully.
43Validating Isolated Anomalies
- Four possible outcomes of our validation step
- Rank isolated anomalies based on the outcome
- succeed (highest) , no crash, unknown, failure
(lowest)
- In our running example the root-cause is ranked
1. -
44Experimental Methodology
- Implemented a working debugging tool using binary
instrumentation (PIN). - Evaluated applications from BugBench S. Lu et
al., Bugs 2005 and gcc compiler.
45Experimental Results
46Case Study GCC
47GCC Defect
48GCC Fix
49Experimental ResultsCompared to Failure-Inducing
Chops
50Limitations
- No failure, no bug detection
- Un-triggered bugs
- Bugs are triggered but output is correct
- Target at bugs in sequential programs
51Conclusions
- We present a novel automated approach to pinpoint
the root causes of software failures - Detect anomalies during program execution.
- Isolate only the relevant anomalies.
- Validate isolated anomalies by nullifying
execution results - Our experimental results demonstrate that we
accurately pinpoint the defect even for large
programs such as gcc.
52Questions
- The tool is available for download at
- http//csl.cs.ucf.edu/debugging