Tracking Structural Evolution using Origin Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Tracking Structural Evolution using Origin Analysis

Description:

Does software evolve in the same way as frogs and social structures? The Nature of Economies, by ... This also begs the question of software artifact ontology: ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 16
Provided by: MikeGo67
Category:

less

Transcript and Presenter's Notes

Title: Tracking Structural Evolution using Origin Analysis


1
Tracking Structural Evolution using Origin
Analysis
  • Michael Godfrey and Qiang Tu
  • Software Architecture Group (SWAG)
  • University of Waterloo

2
Overview
  • Open questions in software evolution research
  • Motivation
  • Origin analysis and Beagle
  • Efficiency considerations
  • An example
  • Open questions in origin analysis

3
Some open questions
  • Philosophical
  • Does software evolve in the same way as frogs and
    social structures?
  • The Nature of Economies, by Jane Jacobs
  • What are the recurring patterns and compelling
    metaphors of software evolution?
  • Methodological
  • How to measure size?
  • How to correlate size and quality?
  • How to measure change?
  • How to model architectural change?
  • What is the predictive power of such models?
  • Do the other phenomena dominate?

4
Some open questions
  • Practical
  • What information do developers need to know about
    how a software system has evolved?
  • What kinds of tools would be useful
  • to the front-line developer?
  • to the manager?
  • How best to deal with
  • Large data sets (large_system ? many_versions)
  • Visualization and navigation

5
Motivation
  • Want to build tools to aid developers in
    understanding how software evolves.
  • Change can be mostly additive or much more
    invasive
  • Building an accurate model of how a system has
    evolved is hard in the presence of refactoring,
    redesign, structural and architectural change.
  • Usual assumption
  • A change in name/location of a software entity
    means the old one died and a new one was born
  • which means that structural discontinuities
    break old models of the system, and cause useful
    knowledge to be lost.

6
Motivation
  • This also begs the question of software artifact
    ontology
  • What are the software entities/artifacts of
    interest in evolutionary studies?
  • All CVSd things?
  • Hard machine processable things, like source
    code files?
  • User docs, requirements docs, ?
  • Atomic vs. composite things?
  • (subsystems vs. files vs. classes vs. methods)
  • What does it mean for an artifact/entity to be a
    different version of an older artifact/entity?
  • Same name? file? location? CVS control?
  • Because I say so?

7
Origin analysis
  • Suppose that
  • f is the name of a software entity (e.g.,
    function, type, global variable) of version Vnew
    of a software system.
  • There is no entity of the same name/kind in the
    previous version Vold
  • We define origin analysis as the process of
    deciding
  • if f was newly introduced in Vnew,or
  • if it should be more accurately viewed as a
    changed/moved/ renamed version of a differently
    named entity of Vold

8
The Beagle tool IWPC-02
  • Design goals
  • Support browsing of evolutionary histories of
    software systems
  • Visual navigation and querying
  • Architectural-level modelling
  • Compare system snapshots
  • Support identification and detection of change
    patterns

9
The Beagle tool IWPC-02
  • At system check-in
  • Populate database with facts and metrics info
    from various tools.
  • grok scripts lift facts to file/ subsystem
    /architectural level.
  • At runtime
  • SWAGkit (PBS) engine for visualization/navigation.
  • Java-based infrastructure using DB/2, VA-Java,
    IBM-Websphere.

10
Origin analysis Two techniques
  • Entity analysis (i.e., metrics-based
    Bertillonage)
  • For each added entity f
  • Calculate combined Euclidean distance from each
    deleted entity for five metrics Kostas.
  • Select top k matches compare entity names.
  • Relationship analysis (e.g., calls,
    is-called-by, refs)
  • For each added entity f
  • Find Rf, set of all entities that call f that are
    present in both versions.
  • For each g ? Rf, calculate Qg, set of all
    deleted entities that g calls in the old
    version.
  • Look at intersection of the Qgs these are good
    candidates.

11
Efficiency considerations
  • When comparing Vnew to Vold, need to find the
    entities that seem to have been added and
    deleted.
  • These sets are fast to determine.
  • Most subsequent calculations involve only these
    small subsets of the entire entity space (plus
    the other entities they have relationships
    with).
  • Computationally expensive approaches for clone
    detection (e.g., graph matching) were not
    considered.
  • Cant pre-compute easily.
  • Precise matching not worth the effort, as it
    doesnt seem to help much for this task.

12
Efficiency considerations
  • Entity analysis
  • Entity info is generated by fact extractor and
    metrics tool.
  • Info is generated only once per version, when
    system is checked into repository.
  • Performing entity analysis is a matter of a
    simple numerical calculation on a small set of
    likely candidates.
  • Relationship analysis
  • Relationship info (who-calls-whom,
    who-inherits-from-whom, etc.) is generated by
    fact extractor.
  • Info is generated only once per version, when
    system is checked into repository.
  • Computation and comparison of relational images
    is fairly fast.
  • Special-purpose tool (grok ) and relatively small
    amount of data.

13
Case study gcc/g/egcs
  • Have extracted full info for 29 versions of
    gcc/g/egcs
  • Want to examine major breaks in development to
    see how well origin analysis works.
  • EGCS v1.0 was forked from the GCC v2.7.2.3
    codebase
  • EGCS project goals
  • C compiler more ANSI compliant,
  • new FORTRAN front-end,
  • new optimizations and code-generation algorithms,
  • and EGCS introduced a new directory structure
    and a new file naming scheme, in addition to all
    of the other redesign and restructuring.
  • Naïve analysis indicated everything old is new
    again ?

14
Case study gcc/g/egcs
  • Example
  • The EGCS 1.0 Parser subsystem contains 15
    (non-trivial) implementation files, comprising
    848 functions.
  • Using origin analysis and common sense, Qiang
    decided that about half of the new functions
    werent new.
  • Thats still a massive amount of change for a new
    release of a compiler!

15
Origin analysis Open issues
  • Origin analysis is a semi-automatic technique it
    requires human intervention to make intelligent
    decisions.
  • In general, theres no ultimate arbiter of
    correctness/appropriateness.
  • Techniques are fast and approximate.
  • Bertillonage, not DNA comparison
  • What are the most effective ways of performing
    entity and relationship analysis?
  • Which metrics? Which relationships? How best to
    combine them all?
  • Requires case studies, validation.
  • What is the best way to consider composite
    software entities?
  • (e.g., files, classes, subsystems)
  • Can evaluate as atoms, or
  • Can simply use hints from contained entities.
Write a Comment
User Comments (0)
About PowerShow.com