DynaMine: Finding Common Error Patterns by Mining Software Revision Histories PowerPoint PPT Presentation

presentation player overlay
1 / 37
About This Presentation
Transcript and Presenter's Notes

Title: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories


1
DynaMine Finding Common Error Patternsby Mining
Software Revision Histories
  • Benjamin Livshits
  • Stanford University

Thomas Zimmermann Saarland University
2
A Box Full of Nails
  • A lot of
  • promise
  • potential
  • excitement
  • Not that many success stories
  • Not sure what to apply it to
  • Lets try this particularly exciting idea
  • Miners looking at their tools
  • Promises, promises
  • Interesting usage patterns found by CVS mining
  • Interesting error patterns found by CVS mining

3
My Background
  • Tools for bug detection
  • Analysis pointer analysis, etc.
  • Mostly static, some dynamic
  • Applications
  • Security
  • Buffer overruns
  • Format string violations
  • SQL injections
  • Cross-site scripting
  • HTTP response splitting
  • Data lifetimes
  • J2EE patterns
  • Bad session stores
  • Lapsed listeners
  • Eclipse patterns
  • Missing calls to dispose
  • Not calling super
  • Forgetting to deregister listeners

4
Glorified Bug Finding System
  • A language for describing bug patterns
  • Called PQL, see OOPSLA 2005
  • 2 years of work
  • Static and dynamic analysis combined
  • We dont know what to look for
  • Took a long time to find useful error patterns
  • Programmers often dont recognize patterns
  • Have pretty good tools
  • How do we find more patterns to check?
  • Want find error patterns in unfamiliar code

5
The Usual Suspects
  • Much bug-detection research in recent years
  • Focus generic patterns, sometimes
    language-specific
  • NULL dereferences
  • Security
  • Buffer overruns
  • Format string violations
  • Memory
  • Double-deletes
  • Memory leaks
  • Locking errors/threads
  • Deadlock/race detection
  • Atomicity
  • Lets look at the space of error patterns in more
    detail

6
Classification of Error Patterns
NULL dereferences Buffer overruns Double-deletes L
ocks/threads
Generic patterns -- the usual suspects
  • NULL dereferences
  • Buffer overruns
  • Double-deletes
  • Locking errors/threads

Bugs in J2EE servlets
App-specific patterns particular to a system or a
set of APIs
Device drivers
Bugs in Linux code
Error Pattern Iceberg
7
Classification of Error Patterns
There are hundreds of WinAmp plugins out there
Generic patterns -- the usual suspects
  • NULL dereferences
  • Buffer overruns
  • Double-deletes
  • Locking errors/threads

Anybody knows any good error patterns specific to
WinAmp plugins?
App-specific patterns particular to a system or a
set of APIs
?
  • Intuition
  • Many other application-specific patterns exist
  • Much of application-specific stuff remains a gray
    area so far
  • Goal Lets figure out what the patterns are

8
Motivation Matching Method Pairs
  • Start small
  • Matching method pairs
  • Only two methods
  • A very simple state machine
  • Calls must match perfectly, order matters
  • Very common, our inspiration is
  • System calls
  • fopen/fclose
  • lock/unlock
  • GUI operations
  • addNotify/removeNotify
  • addListener/removeListener
  • createWidget/destroyWidget
  • Want to find more of the same
  • And, if are lucky, more interesting patterns

9
DynaMine Our Insight
  • Our problem
  • Want to find patterns whose violation causes
    errors
  • Want to find patterns for program understanding
  • Our technique
  • Look at revision histories
  • Crucial observation
  • Use data mining techniques to find method that
    are often added at the same time

Things that are frequently checked in together
often form a pattern
10
DynaMine Our Insight (continued)
  • Now we know the potential patterns
  • Profile the patterns
  • Run the application
  • See how many times each pattern
  • hits number of times a pattern is followed
  • misses number of times a pattern is violated
  • Based on this statistics, classify the patterns
  • Usage patterns almost always hold
  • Error patterns violated a large number of the
    times, but still hold most of the time
  • Unlikely patterns not validated enough times

11
Architecture of DynaMine
sort and filter
mine CVS histories
patterns
instrument relevant method calls
revision history mining
run the application
post-process
dynamic analysis
usage patterns
error patterns
unlikely patterns
report bugs
report patterns
reporting
12
Mining approach
13
Mining Basics
  • Rely on co-change
  • Simplification look at method calls only
  • Look for interesting patterns in the way methods
    are called
  • Example
  • Sequence of revisions
  • Files Foo.java, Bar.java, Baz.java, Qux.java

14
Mining Matching Method Calls
  • Use our observation
  • Methods that are frequently added simultaneously
    often represent a usage pattern
  • For instance addListener()
    removeListener()

15
Data Mining Summary
  • We consider method calls added in each check-in
  • We want to find patterns of method calls
  • Too many potential patterns to consider
  • Want to filter and rank them
  • Use support and confidence for that
  • Support and confidence of each pattern
  • Standard metrics used in data mining
  • Support reflects how many times each pair appears
  • Confidence reflects how strongly a particular
    pair is correlated
  • Refer to the paper for details

16
Improvements Over the Traditional Approach
  • Default data mining approach doesnt quite work
  • Filters based on confidence and support
  • Still too many potential patterns!
  • Filtering
  • Consider only patterns with the same initial
    subsequence as potential patterns
  • Ranking
  • Use one-line fixes to find likely error patterns

17
Matching Initial Call Sequences
1 Pair
3 Pairs 1 Pair
10 Pairs 2 Pairs
1 Pair 0 Pairs
0 Pairs
18
Using Fixes to Rank Patterns
  • Look for one-call additions which likely indicate
    fixes
  • Rank patterns with such methods higher

This is a fix! Move patterns containing
removeListener up
19
Applications under Study
  • Apply these ideas to the revision history of
    Eclipse and jEdit
  • Very large open-source projects
  • Many people working on both, are all over the
    planet
  • 122 on Eclipse
  • 92 on jEdit
  • Many check-ins
  • Eclipse 2,837,854
  • jEdit 144,495
  • Long histories
  • Eclipse since 2001
  • jEdit since 2000

20
Some patterns(as promised)
21
Categories of Patterns
  • Method calls during execution
  • Care about the methods
  • Care about the order
  • Care about the parameters/return values
  • Herere some common cases
  • Matching method pairs
  • State machines
  • More complex patterns

22
Some Interesting Method Pairs (1)
kEventControlActivate kEventControlDeactivate
addDebugEventListener removeDebugEventListener
beginRule endRule
suspend resume
NewPtr DisposePtr
addListener removeListener
register deregister
addElementChangedListener removeElementChangedListener
addResourceChangeListener removeResourceChangeListener
addPropertyChangeListener removePropertyChangeListener
createPropertyList reapPropertyList
preReplaceChild postReplaceChild
addWidget removeWidget
stopMeasuring commitMeasurements
blockSignal unblockSignal
HLock HUnlock
OpenEvent fireOpen

23
Some Interesting Method Pairs (2)
kEventControlActivate kEventControlDeactivate
addDebugEventListener removeDebugEventListener
beginRule endRule
suspend resume
NewPtr DisposePtr
addListener removeListener
register deregister
addElementChangedListener removeElementChangedListener
addResourceChangeListener removeResourceChangeListener
addPropertyChangeListener removePropertyChangeListener
createPropertyList reapPropertyList
preReplaceChild postReplaceChild
addWidget removeWidget
stopMeasuring commitMeasurements
blockSignal unblockSignal
HLock HUnlock
OpenEvent fireOpen

Register/unregister the current widget with the
parent display object for subsequent event
forwarding
24
Some Interesting Method Pairs (3)
kEventControlActivate kEventControlDeactivate
addDebugEventListener removeDebugEventListener
beginRule endRule
suspend resume
NewPtr DisposePtr
addListener removeListener
register deregister
addElementChangedListener removeElementChangedListener
addResourceChangeListener removeResourceChangeListener
addPropertyChangeListener removePropertyChangeListener
createPropertyList reapPropertyList
preReplaceChild postReplaceChild
addWidget removeWidget
stopMeasuring commitMeasurements
blockSignal unblockSignal
HLock HUnlock
OpenEvent fireOpen

Add/remove listener for a particular kind of GUI
events
25
Some Interesting Method Pairs (4)
kEventControlActivate kEventControlDeactivate
addDebugEventListener removeDebugEventListener
beginRule endRule
suspend resume
NewPtr DisposePtr
addListener removeListener
register deregister
addElementChangedListener removeElementChangedListener
addResourceChangeListener removeResourceChangeListener
addPropertyChangeListener removePropertyChangeListener
createPropertyList reapPropertyList
preReplaceChild postReplaceChild
addWidget removeWidget
stopMeasuring commitMeasurements
blockSignal unblockSignal
HLock HUnlock
OpenEvent fireOpen

Use OS native locking mechanism for resources
such as icons, etc.
26
State Machines
  • Order captured by a state machine
  • Must be followed precisely omitting or repeating
    a method call is a sign of error.
  • Simplest formalism for describing the object
    life-cycle.
  • Matching method pairs specific case
  • Very common in C
  • Consider OS code
  • Less common in Java, but

27
State Machines (1)
  • o.enterAlignment o.redoAlignment
    o.exitAlignment
  • Part of the org.eclipse.jdt.internal.formatter.Scr
    ibe package responsible for pretty-printing of
    code
  • enterAlignment/exitAlignment pairs must match
  • redoAlignment is invoked in exception cases

28
State Machines (2)
  • o.beginCompoundEdit()
  • (o.insert(...) o.remove(...))
  • o.endCompoundEdit()
  • Compound edits within jEdit can be undone at
    once
  • beginCompoundEdit/endCompoundEdit act as brackets
  • Other operations inbetween

29
State Machines (3)
  • OS.PmMemCreateMC
  • OS.PmMemStart OS.PmMemFlush OS.PmMemStop
  • OS.PmMemReleaseMC
  • Memory context manipulation (like memory pools)
  • Wrappers around underlying OS functionality
  • The middle part of the pattern is optional

30
More Complex Stuff (1)
  • try
  • monitor.beginTask(null, Policy.totalWork)
  • int depth -1
  • try
  • workspace.prepareOperation(null,
    monitor)
  • workspace.beginOperation(true)
  • depth workspace.getWorkManager().beginU
    nprotected()
  • return runInWorkspace(Policy.subMonitorFo
    r(monitor,
  • Policy.opWork, SubProgressMonitor.PRE
    PEND_MAIN_LABEL_TO_SUBTASK))
  • catch (OperationCanceledException e)
  • workspace.getWorkManager().operationCance
    led()
  • return Status.CANCEL_STATUS
  • finally
  • if (depth gt 0)
  • workspace.getWorkManager().endUnprotecte
    d(depth)
  • workspace.endOperation(null, false,
  • Policy.subMonitorFor(monitor,
    Policy.endOpWork))
  • catch (CoreException e)

31
More Complex Stuff (2)
  • try
  • monitor.beginTask(null, Policy.totalWork)
  • int depth -1
  • try
  • workspace.prepareOperation(null,
    monitor)
  • workspace.beginOperation(true)
  • depth workspace.getWorkManager().beginU
    nprotected()
  • return runInWorkspace(Policy.subMonitorFo
    r(monitor,
  • Policy.opWork, SubProgressMonitor.PRE
    PEND_MAIN_LABEL_TO_SUBTASK))
  • catch (OperationCanceledException e)
  • workspace.getWorkManager().operationCance
    led()
  • return Status.CANCEL_STATUS
  • finally
  • if (depth gt 0)
  • workspace.getWorkManager().endUnprotecte
    d(depth)
  • workspace.endOperation(null, false,
  • Policy.subMonitorFor(monitor,
    Policy.endOpWork))
  • catch (CoreException e)

32
More Complex Stuff (3)
  • try
  • monitor.beginTask(null, Policy.totalWork)
  • int depth -1
  • try
  • workspace.prepareOperation(null,
    monitor)
  • workspace.beginOperation(true)
  • depth workspace.getWorkManager().beginU
    nprotected()
  • return runInWorkspace(Policy.subMonitorFo
    r(monitor,
  • Policy.opWork, SubProgressMonitor.PRE
    PEND_MAIN_LABEL_TO_SUBTASK))
  • catch (OperationCanceledException e)
  • workspace.getWorkManager().operationCance
    led()
  • return Status.CANCEL_STATUS
  • finally
  • if (depth gt 0)
  • workspace.getWorkManager().endUnprotecte
    d(depth)
  • workspace.endOperation(null,
    false,
  • Policy.subMonitorFor(monitor,
    Policy.endOpWork))
  • catch (CoreException e)

33
Grammar for Workspace Transactions
  • Requires human intelligence
  • Requires a lot of it
  • Is actually an excellent pattern havent seen
    runtime violations

S ? O O ? w.prepareOperation()
w.beginOperation() U
w.endOperation() U ? w.getWorkManager().beginUnp
rotected() S w.getWorkManager()
.operationCanceled() w.getWorkManager().
beginUnprotected()
34
Dynamic checking
35
Dynamically Check the Patterns
  • Home-grown bytecode instrumentor
  • Get a list of matching patterns
  • Instrument calls to any of the methods to dump
    parameters
  • Post-processing of the output
  • Process a stream of events
  • Find and count matches and mismatches
  • o.register(d)
  • o.deregister(d)
  • o.deregister(d)

matched
???
mismatched
36
Experiments
37
Experimental Setup
  • Applied to Eclipse and jEdit
  • 3,600,000 lines of Java code combined
  • Included many plugins
  • Times
  • 6 days to fetch and process CVS histories
  • 30 minutes to compute the patterns
  • An hour to instrument
  • 15 minutes to run
  • And we are done!

38
Experimental Summary
  • Pattern classification
  • 56 patterns total
  • 13 are usage patterns
  • 8 are error patterns
  • 11 are unlikely patterns
  • 24 were not hit at runtime
  • Error patterns
  • Resulted in a total of 264 dynamically confirmed
    pattern violations

39
Summary
  • Knowing code patterns is important
  • We explored using software histories
  • Co-change often indicates patterns
  • Use previous fixes (one-line changes) to drive
    error patterns
  • Found interesting patterns
  • Matching method pairs
  • State machines
  • More complex stuff
  • Confirmed valid patterns
  • Found pattern violations at runtime
  • We have a paper in FSE 2005
Write a Comment
User Comments (0)
About PowerShow.com