Clustering Software Artifacts Based on Frequent common changes - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering Software Artifacts Based on Frequent common changes

Description:

CVS system stores version of artifacts in a central repository. ... Change transactions ( Commits in terms of CVS) ... number were obtained with tool Stat CVS ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 21
Provided by: Har53
Learn more at: https://www.sosy-lab.org
Category:

less

Transcript and Presenter's Notes

Title: Clustering Software Artifacts Based on Frequent common changes


1
Clustering Software Artifacts Based on Frequent
common changes
  • Presented by Ashgan Fararooy
  • Prepared byHaroon Malik
  • (Modified)

2
Abstract
  • The clusters of artifacts that are frequently
    changed changed together are subsystem
    candidates.
  • Two step method identification of clusters
  • Extracting Co-Change graph from the version
    control repository.
  • Computing a layout of the co-changed graph. This
    reveals the cluster of frequent co-change
    artifacts.

3
Overview
  • Software artifact
  • Is an entity that belongs to a software system
  • E.g. A package, a file, a line of code or even a
    piece of document
  • Version
  • State of a software artifact at a particular
    point in time.
  • CVS system stores version of artifacts in a
    central repository.
  • User of such systems modify local copies of the
    software artifacts, and check-in their changes to
    the central repository.

4
Proposed Model
  • Software clustering groups software artifacts
    into subsystems which are as independent as
    possible with respected to comprehension,
    change, reuse etc.
  • Co-change graph model is proposed for clustering
    software system

5
Proposed Model
  • Co-change Graph
  • Abstraction of version control repositories.
  • Vertices of this graph are
  • Software artifacts (Files or Functions)
  • Change transactions ( Commits in terms of CVS).
  • Edges connect the change transaction with their
    participating artifacts.

6
Proposed Model
  • Presentation
  • The result of clustering is not a partition of
    the graph vertices, but a layout of graph
    vertices.
  • This layout of the graph refers to position of
    the graph vertices in two or three dimensional
    space.
  • Heavily co-changed artifacts closer together.
  • Rarely co-changed artifacts at larger distances.
  • Layout is comprehensive and provides additional
    information
  • How clearly Clusters are Separated.
  • If artifacts are at center of the cluster or
    rather between two clusters.

7
Proposed Model
  • Contents
  • Not just arranged in some nice way, but their
    positions have a well-defined interpretation with
    respect to their common changes.
  • Two artifacts are placed closer to the degree of
    that their common change is stronger then random.

8
Co-Change Graph.
  • The graph refers to the common changes of
    artifacts in version repositories.
  • It can be easily extracted from version
    repositories.
  • Ensures, that the clustering results have a clear
    interpretation in terms of repositories.
  • Biases though arbitrary choices (e.g. weight
    function of values of free parameters) are
    minimized.

9
Co-Change Graph
  • Change Transaction
  • It is a coherent sequence of check-ins of
    several software artifacts.
  • Software artifacts that participate in the same
    change transaction are co-changed (commonly
    changed).
  • The Co-change graph of a give version of
    repository is an undirected graph (V,E ).
  • The set of vertices V of the co-change graph
    contains all the software artifacts and all
    change transaction of the version repository.

10
Co-Change Graph
  • The set of edges E contains the undirected edge
    c,a, if the artifact a was changed by
    transaction c.
  • Bipartite
  • It contains no edges that connect two change
    transaction of two software artifacts.

11
Co-Change Graph
  • For a vertex v of a co-changed graph, the number
    of its adjacent vertices is called the degree of
    v and denoted by deg(v)
  • For transaction vertices, the degree gives the
    number of artifacts that participate in the
    transaction.
  • For artifacts, the degree gives the number of
    their changes.

12
Weight Co-change Graph
  • It involves assigning a real number to each edge
    by weigh function (w) to set of Edges (E)
  • The real number assigned to each edge interprets
    the importance of the corresponding change.
  • Each edge is give same weight.

13
Condensed Co-Change Graph
  • It is a weighted, undirected graph (V,E,w), for a
    given repository.
  • Where, the set of vertices V contains all
    software artifacts in repository.
  • Set of Edges E contains the edge a,a, if the
    artifact a and a were commonly changed by a
    transaction.

14
Edge-Repulsion Linlog Energy Model
  • This model specify the good graph layout.
  • The basic idea is that in co-change graph edges
    causes both repulsion and attraction.
  • Every edge will cause same amount of repulsion
    and attraction.
  • Model helps in creating suitable readable layouts

15
Evaluation
  • The Software system were chosen based on
  • Size, number of developers, project duration and
    artifacts in different programming languages
  • Based on familiarity, because the evaluation
    requires the knowledge of authoritative
    decompositions

16
Evaluation
  • The co-change graph were extracted on file level
  • A tool cvs2cl2 is used to recover change
    transaction from CVS repository
  • A calculator for relation generated the co-change
    graph from transaction ---- CrocoPat
  • Duration, total changes indeed all number were
    obtained with tool Stat CVS
  • Layout was computed using utomatically usig Edge
    repulsion linlog energy model

17
Artifacts in the CrocoPat repository
18
Artifacts in the Rabbit repository
19
Artifacts in the Blast repository
20
Conclusions
  • Introduced a new method for clustering software
    artifacts.
  • Defined the co-change graph as underlying formal
    model
  • Evaluated our method on three example software
    systems with different types of documents and
    source code in several programming languages
Write a Comment
User Comments (0)
About PowerShow.com