Mining Version Histories to Guide Software Changes - PowerPoint PPT Presentation

About This Presentation
Title:

Mining Version Histories to Guide Software Changes

Description:

Mining Version Histories to Guide Software Changes Thomas Zimmerman Peter Weisgerber Stephan Diehl Andreas Zeller In this paper, we apply data mining to version ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 12
Provided by: queensuCa
Category:

less

Transcript and Presenter's Notes

Title: Mining Version Histories to Guide Software Changes


1
Mining Version Histories to Guide Software
Changes Thomas Zimmerman Peter
Weisgerber Stephan Diehl Andreas Zeller
2
In this paper, we apply data mining to version
histories 'Programmers who changed these
functions also changed....' Just like the
Amazon.com feature helps the customer browsing
along related items, our ROSE tool guides the
programmer along related changes...
3
Agenda
  • ROSE Overview
  • CVS to ROSE
  • Data Analysis
  • Evaluation
  • Paper Critique

4
ROSE Overview
  • Aims
  • Suggest and predict likely changes. Suppose a
    programmer has just made a change. What else does
    she have to change?
  • Prevent errors due to incomplete changes. If a
    programmer wants to commit changes, but has
    missed a related change, ROSE issues a warning.
  • Detect coupling undetectable by program analysis.
    As ROSE operates exclusively on the version
    history, it is able to detect coupling between
    items that cannot be detected by program
    analysis.

5
ROSE Overview (2)?
6
CVS to ROSE
  • ROSE works in terms of changes in entities
  • ex changes in directories, files, classes,
    methods, variables
  • Every entity is a triple (c, i, p), where c is
    the syntactic category, i is the identifier, and
    p is the parent entity
  • ex (method, initDefaults(), (class, Comp, ...))?
  • Every change is expressed using predicates
  • alter(e)?
  • add_to(e)?
  • del_from(e)?
  • Each transaction from CVS is converted to a list
    of those changes

7
Data Analysis
  • ROSE aims to mine rules from those alterations
  • alter(field, fKeys, ...) is possibly followed
    by
  • alter(method, initDefaults(), ...)?
  • alter(file, plug.properties, ...)?
  • The probability is measured by
  • Support count. Determines the number of
    transactions the rule has been derived from.
  • Confidence. The relative amount of the given
    consequences across all alternatives for a given
    antecedent.
  • ex suppose fKeys was altered in 11
    transactions. 10 of those also alter()'ed
    initDefaults() and plug.properties. 10 is the
    support count, and 10/11 (or 0.909) is the
    confidence.

8
Data Analysis
  • Other features
  • add_to() and del_from() allow an abstraction from
    the name of an added entity to the name of the
    surrounding entity.
  • The notation of entities allows varying
    granularities for mining data.
  • Fine-granular mining. For source code of C-like
    languages, alter() is used for fields, functions,
    etc. add_to() is used for file entities.
  • Coarse-granular mining. Regardless of file type,
    only alter() is used for file entities. add_to()
    and del_from() can be used to capture when a file
    has been added or deleted
  • Coarse-granular rules have a higher support count
    and usually return more results. However they are
    less precise in location, and see limited use for
    guiding programmers.

9
Evaluation
  • Usage Scenarios
  • Navigation through source code. Given a change,
    can ROSE point to other entities that should
    typically be changed too?
  • Error prevention. If a programmer has changed
    many entities but missed to change one, does ROSE
    find the missing one?
  • Closure. When the transaction is finished, how
    often does ROSE erroneously suggest that a change
    is missing in the error prevention scenario?
  • Evaluation on eight large open-source projects
  • ECLIPSE
  • GCC
  • GIMP
  • JBOSS
  • JEDIT
  • KOFFICE
  • POSTGRES
  • PYTHON

10
Evaluation (2)?
  • Summary
  • One can have precise suggestions or many
    suggestions, but not both.
  • When given an initial item, ROSE makes
    predictions in 66 percent of all queries. On
    average, the predictions of ROSE contain 33
    percent of all items changed later in the same
    transaction, For those queries for which ROSE
    makes recommendations, in 70 of the cases, a
    correct location is within ROSE's topmost three
    suggestions.
  • In 3 percent of the queries where one item is
    missing, ROSE issues a correct warning. An issued
    warning predicts on average 75 percent of the
    items that need to be considered.
  • ROSE's warnings about missing items should be
    taken seriously Only 2 percent of all
    transactions cause a false alarm. In other words
    ROSE does not stand in the way.
  • ROSE has its best predictive power for changes to
    existing entities.
  • ROSE learns quickly A few weeks after a project
    starts, ROSE makes already useful suggestions.

11
Critique
  • Likes
  • The tool was applied and accordingly evaluated to
    8 projects, and conclusions were drawn depending
    on their varying natures.
  • It's relevant to our assignment, thus it was easy
    to follow.
  • Dislikes
  • There is research value, but there is reason to
    be skeptical that the recall of such tools will
    reach practical levels (for the Navigation
    purposes). Intuitively, recommendations might
    break things if blindly followed, regardless of
    if the recommendation is correct. Ie there is no
    practical value if the recommendations are
    incomplete, which is more likely for complex
    applications where this really matters.
  • I still don't know what ROSE stands for. p
Write a Comment
User Comments (0)
About PowerShow.com