Inferring Developer Activities by Analyzing Successive Versions of Source Code - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Inferring Developer Activities by Analyzing Successive Versions of Source Code

Description:

Compile 'Compile and link the program developed so far' ... Correct Compile-Time Errors (Program) Version A does not compile. ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 31
Provided by: jaymies
Category:

less

Transcript and Presenter's Notes

Title: Inferring Developer Activities by Analyzing Successive Versions of Source Code


1
Inferring Developer Activitiesby Analyzing
Successive Versionsof Source Code
  • Work done for HPCS and CMSC631
  • Jaymie Strecker
  • January, 2005

2
Goal
  • Source code from
  • version control system (e.g. CVS)
  • instrumented compiler

3
Motivation
  • Applications (to study development process)
  • Analyze source code collected by instrumented
    compiler compare results of analysis to
    self-reported activities
  • Analyze source code collected by version control
    system (e.g. if no instrumentation available)

4
Source Code Analysis vs. Instrumented Compilers
  • Benefits of source code analysis
  • Finer granularity (look at individual changes,
    not whole versions)
  • Guarantees consistency across subjects and
    experiments
  • Transparent to subjects
  • Can apply to data that has already been collected

But source code analysis should supplement
instrumented compilers, not replace them.
5
  • Source Code Changes
  • Developer Activities
  • An Inference Algorithm
  • Algorithm Evaluation
  • Conclusions

6
Source Code Changes
One program change should be concerned with the
contiguous set of concrete statements that
represent a single abstract instruction.
Dunsmore and Gannon, 1978
Approximation contiguous set of modified lines
main() int a, b a 0
main() int a, b, c a 0
printf(a d\n, a)
7
Change Model
8
  • Source Code Changes
  • Developer Activities
  • An Inference Algorithm
  • Algorithm Evaluation
  • Conclusions

9
Developer Activities
  • Developer activities
  • Formulate Formulate an algorithmic approach
  • Program Create or incrementally augment the
    program and its testing infrastructure
  • Compile Compile and link the program
    developed so far
  • Test Test the program, observing its
    behavior
  • Debug Diagnose and fix erroneous behavior
  • Run Run the program on real input data
  • Optimize Improve program performance

-- Smith, 2004
10
Low-Level Developer Activities
11
  • Source Code Changes
  • Developer Activities
  • An Inference Algorithm
  • Algorithm Evaluation
  • Conclusions

12
Inferring Developer Activities
13
Identifying the Low-Level Activity
  • Heuristics to guess the activity for a change
  • Add Functionality I (Program) First version.
  • Correct Compile-Time Errors (Program) Version
    A does not compile.
  • Comment/Uncomment Executable Statements (Debug)
    A statement appears in both versions, but in
    just one version the statement is inside a
    comment.

14
Identifying the Low-Level Activity
  • Modify Comments (Program) More than half of the
    changed lines involve text within comments.
  • Modify Debugging Code (Debug) The change
    involves a print statement which does not appear
    (uncommented) in the final version.
  • Add Functionality II (Program) The change is an
    addition.

15
change
yes
AddFunc1
no
yes
CorrectCompile
no
yes
CommentStmts
Program or Debug
no
yes
ModifyDoc
no
yes
ModifyDebug
no
yes
AddFunc2
Unclassified
no
16
Automatic Inference
  • Implementation of an inference tool is
    straightforward
  • Change definition and model
  • Heuristics for low-level activities
  • Tool performs simple static analysis
  • Pattern matching on source code text
  • Typically takes a few seconds to analyze one
    subjects programming assignment

17
  • Source Code Changes
  • Developer Activities
  • An Inference Algorithm
  • Algorithm Evaluation
  • Conclusions

18
Evaluation of Inference Tool
Source code data from experiments
Subjects self-reported activities
Inferred developer activities
False positives
False negatives
Unclassified changes
19
Data Analyzed
Experiment Allan Snavelys class at UCSD (Fall
2004) Source code used Serial C/C
implementations Assignments used 3 Subjects
used 11 Source code collection method
Instrumented compiler
At the beginning of the study, subjects were
shown definitions of the developer activity
options used by the instrumented compiler.
20
Metrics
  • False positives (heuristic) Number of changes
    the heuristic recognizes incorrectly
  • False negatives (self-reported activity) Number
    of changes with the self-reported activity that
    are classified incorrectly
  • Unclassified changes Number of changes that no
    heuristic recognizes

21
Distribution varies widely across subjects (as
does distribution of self-reported activities).
22
High false positive rate for many heuristics.
Small sample size for heuristics other than
CorrectCompile.
23
Almost all changes are classified.
24
No heuristics recognize Experimenting, Testing,
or Tuning.
25
  • Source Code Changes
  • Developer Activities
  • An Inference Algorithm
  • Algorithm Evaluation
  • Conclusions

26
Conclusions
  • Source code changes only sometimes reflect
    self-reported activities.
  • Serial Coding and Parallelizing were usually
    recognized correctly.
  • Source code alone doesnt reveal the developers
    intentions (e.g. Experimenting with Environment).
  • Some activities may not affect the source code
    (e.g. Testing).

27
Conclusions
  • Source code analysis and instrumented compilers
    give different types of information.
  • SCA shows that subjects do multiple activities
    between compiles IC reports just one activity
    per compile.
  • IC reports the amount and type of effort spent
    SCA shows what was accomplished by this effort.
  • SCA is consistent across subjects IC may not be
    (e.g. Debugging vs. Testing).

Good news Revised SCA could someday supplement
IC. Bad news Difficult to evaluate SCA using
IC.
28
Possible Future Work
  • Narrow the scope of changes analyzed
  • Defect-related changes
  • Language- or API-specific change patterns
  • Build upon data from instrumented compilers and
    other sources
  • Correlations between change characteristics or
    patterns and self-reported activity
  • Compare each version to a known solution or use
    test case output to understand defects

29
For More Information
  • Paper
  • http//www.cs.umd.edu/strecker/infer_act.pdf
  • Slides
  • http//www.cs.umd.edu/strecker/infer_act.ppt

30
Abstract
In HPCS experiments, instrumented compilers
regularly log the state of the source code being
developed outside of experiments, a software
projects sequence of source code versions often
resides in a CVS repository. Such source code
data abounds. Since the changes made from version
to version in the source code are the end product
of the developers effort, information about the
development process is encoded in those changes.
In this study, we attempt to extract one piece of
that information why the developer made each
change. Currently, instrumented compilers collect
data from developers about the types of
activities they perform in the future, source
code analysis may be a viable supplement to
instrumented compilers. Unlike activity data
collected by instrumented compilers, analysis of
source code changes produces fine-grained,
repeatable results. In a first attempt at source
code change analysis, we present a technique that
uses heuristics to recognize certain patterns of
source code changes that hint at the developers
intentions. We compare the results of this
technique to data collected by an instrumented
compiler, and we suggest refinements to make the
technique a useful tool for analysis.
Write a Comment
User Comments (0)
About PowerShow.com