How to do successful research in software evolution - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

How to do successful research in software evolution

Description:

Exploration / navigation / visualization. Abstract. to desired. meta-model. Automated ... Exploration. CS846. Michael W. Godfrey. 9. Case studies of origin analysis ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 25

Provided by: michaelw1

Category:

more less

Transcript and Presenter's Notes

Title: How to do successful research in software evolution

1
How to do successful research in software
evolution

Michael W. Godfrey
Software Architecture Group (SWAG)
University of Waterloo

2
A general approach

OK, its really just our research groups way to
do successful research in software evolution ?
A three stage tool-based pipeline
Extract
Abstract
Navigate, query, explore

3
A general approach
Automated
Abstract to desired meta-model
Extract raw facts
Source artifacts
Simplified data
Semi-automated
Exploration / navigation / visualization
4
(No Transcript)
5
(No Transcript)
6
Four interesting ways in which history can
teach us about software

Michael W. Godfrey
Xinyi Dong
Cory Kapser
Lijie Zou
Software Architecture Group (SWAG)
University of Waterloo

7
Longitudinal case studies of growth and evolution

Studied several OSSs, esp. Linux kernel
Looked for evolutionary narratives to explain
observable historical phenomena
Methodology
Analyze individual tarball versions
Build hierarchical metrics data model
Generate graphs, look for interesting lumps under
the carpet, try to answer why

8
Longitudinal case studies of growth and evolution
Analysis scripts
Source code
Metrics data
Extraction / analysis
MS Excel
Exploration
9
Case studies of origin analysis

Reasoning about structural change
(moving, renaming, merging, splitting, etc.)
Try to reconstruct what happened
Formalized several change patterns
e.g., service consolidation
Methodology
Consider consecutive pairs of versions
Entity analysis metrics-based clone detection
Relationship analysis compare relational images
(calls, called-by, uses, extends, etc)
Create evolutionary record of what happened
what evolved from what, and how/why

10
Case studies of origin analysis
ER model
cppx / Understand / Beagle
Source code
Metrics data
Extraction / analysis
Beagle
Exploration
11
Case studies of code cloning

Motivation
Lots of research in clone detection, but more on
algorithms and tools than on case studies and
comprehension
What kinds of cloning are there? Why does
cloning happen? What kinds are the most/least
harmful? Do different clone kinds have different
precision / recall numbers? Different algorithms?
Future work track clone evolution
Do related bugs get fixed? Does cloned code have
more bugs?
Methodology
Use CCFinder on source to find initial clone
pairs.
Use ctags to map out source files into entity
regions
Consecutive typedefs, fcn prototypes, var defs
Individual macros, structs, unions, enums, fcn
defs
Map (abstract up) clone pairs to the source code
regions

12
Case studies of code cloning

Methodology
Filter different region kinds according to
observed heuristics
C structs often look alike parameterized string
matching returns many more false positives
without these filters than, say, between
functions.
Sort clones by location
Same region, same file, same directory, or
different directory
and entity kind
fcn to fcn / structures (enum, union, struct) /
macro / heterogeneous (different region kinds) /
misc. clones
and even more detailed criteria
Function initialization / finalization clones,
Navigate and investigate using CICS gui, look for
patterns
Cross subsystem clones seems to vary more over
time
Intra subsystem clones are usually function clones

13
Case studies of code cloning
CCFinder
Source code
Custom filters and sorter
Taxonomized clone pairs
ctags
Extraction / analysis
CICS gui
Exploration
14
Longitudinal case studies of software
manufacturing-related artifacts

Q How much maintenance effort is put into SM
artifacts, relative to the system as a whole?
Studying six OSSs
GCC, PostgreSQL, kepler, ant, mycore, midworld
All used CVS we examined their logs
We look for SM artifacts (Makefile, build.xml,
SConscript) and compared them to non-SM artifacts

15
Longitudinal case studies of software
manufacturing-related artifacts

Some results
Between 58 and 81 of the core developers
contributed changes to SM artifacts
SM artifacts were responsible for
3-10 of the number of changes made
Up to 20 of the total LOC changed (GCC)
Open questions
How difficult is it to maintain these artifacts?
Do different SM tools require different amounts
of effort?

16
Longitudinal case studies of software
manufacturing-related artifacts
Analysis scripts
CVS repos
Metrics data
Extraction / analysis
MS Excel
Exploration
17
Dimensions of studies

Single version vs. consecutive version pairs vs.
longitudinal study
Coarsely vs. finely grained detail
Intermediate representation of artifacts
Raw code vs. metrics vs. ER-like semantic model
Navigable representation of system architecture
auto-abstraction of info at arbitrary levels

18
Challenges in this field

Dealing with scale
Big system analysis times many versions
Research tools often live at bleeding edge, slow
and produce voluminous detail
Automation
Research tools often buggy, require handholding
Often, hard to get automated multiple analyses.

19
Challenges in this field

Artifact linkage and analysis granularity
Repositories (CVS, Unix fs) often store only
source code, with no special understanding of,
say, where a particular method resides.
(How) should we make them smarter?
e.g., ctags and CCfinder
Your thoughts?

20
Four interesting ways in which history can
teach us about software

Michael W. Godfrey
Xinyi Dong
Cory Kapser
Lijie Zou
Software Architecture Group (SWAG)
University of Waterloo

21
(No Transcript)
22
Tools that SWAG have written

Fact extractors
LDX for object files compiled for Linux Wu
Recommended for C/C systems that can be built
on Linux
CPPX for gcc-compliant C/C systems Malton /
Dean
some features of C not yet supported
Much slower and less robust than LDX
These fact extractors use the TA language for
output.

23
Tools that SWAG have written

Fact manipulators
JGrok/QL Wu
a re-implementation of grok Holt in Java
Basically, JGrok reads in data stored as sets and
relations, and allows set/relationship operations
to be performed on them.
JGrok has no special knowledge of sw systems!
Can input / output data in the TA language
Visualization engine
LSedit Farmaner / Davis / Synytskyy
Java application performs layout and
visualization of software system facts encoded
in TA.

24
More on SWAG tools