ACDC An Algorithm for ComprehensionDriven Clustering - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

ACDC An Algorithm for ComprehensionDriven Clustering

Description:

ACDC An Algorithm for Comprehension-Driven Clustering ... ACDC - Skeleton Construction... ACDC: An algorithm for comprehension driven clustering. ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 17
Provided by: cseY
Category:

less

Transcript and Presenter's Notes

Title: ACDC An Algorithm for ComprehensionDriven Clustering


1
ACDC An Algorithm for Comprehension-Driven
Clustering
Vassilios Tzerpos R.C. Holt
Presenter Romil Jain
2
Presentation Road Map
  • Introduction
  • Clustering for Comprehension
  • Subsystem Patterns
  • The ACDC Algorithm
  • Orphan Adoption
  • Algorithm Validation
  • Conclusions

3
Introduction
  • Two main approaches to clustering
  • Knowledge-based Use domain knowledge to
    understand source code functionalities and
    cluster files accordingly.
  • Structure-based Extract syntactic interactions
    (e.g. function calls, variable reference) and
    cluster the software based on module dependency
    graphs (MDG).
  • Most researchers agree on structure-based
    criteria and naming conventions as promising.

4
Introduction(contd)
  • Primary goal is software comprehension.
  • Various criterions low coupling, high cohesion,
    interface minimization etc.
  • Research focuses mostly on increasing performance
    and accuracy of algorithms based on these
    criterions.
  • Instead focus should be on creating
    comprehensible clusters.

5
Clustering for Comprehension
http
main
https
6
Clustering for Comprehension
  • A clustering algorithm should have the following
    features for easy comprehension
  • Effective cluster naming.
  • Bounded cluster cardinality Up to 20 files or so
    per cluster.
  • Pattern-driven approach Create a decomposition
    based on certain patterns in the software system
    which human beings can easily identify.

7
Subsystem Patterns
  • Source File All procedures/variables in same
    file can be grouped together.
  • Directory Structure can give clues for
    clustering.
  • Body-header E.g. .h and .c files can be put
    together.
  • Leaf Collection Modules for similar purposes
    (e.g. set of drivers) can be put together.
  • Support Library Modules accessed by majority of
    other modules (e.g. utility/math libraries)

8
Subsystem Patterns (contd)
  • Central Dispatcher Pattern (Dual of Support
    Library Pattern) Resources with large out-degree
    (e.g. main.c, engine.java)
  • Subgraph Dominator Pattern A set of nodes N
    n0, n1, such that any other node must go
    through n0 to reach ni?N.

9
The ACDC Algorithm
  • Performs clustering in 2 stages
  • Constructs a skeleton decomposition using
    pattern-driven approach. Gives appropriates names
    to clusters based on the pattern used.
  • Completes decomposition using a technique called
    Orphan Adoption, which clusters the files which
    were not clustered in the 1st stage.

10
ACDC - Skeleton Construction
  • ACDC performs the following steps in order
  • Source file clusters Cluster resources that
    reside in same file together, and consider only
    files for the next steps ().
  • Body-header conglomeration Consolidate .h and .c
    files in same cluster ()
  • Leaf collection and support library
    identification Just identify, and do not
    cluster, files exhibiting these patterns (For
    support library nodes in-degree 20)

11
ACDC - Skeleton Construction
  • Ordered and limited subgraph domination
  • Disregard any files with Central Dispatcher
    Pattern (i.e. with out-degree 20) and their
    edges too.
  • Find subgraph dominator patterns and create a
    separate subsystem containing dominator node and
    dominated set. ()
  • Organize various subsystems in a containment
    hierarchy (maximum cardinality 20).
  • Consider the previously disregarded files again
    for subgraph domination w.r.t. these subsystems.

12
ACDC - Skeleton Construction
  • Creation of Put any remaining files
    identified previously as support libraries in
    this cluster.

13
ACDC Orphan Adoption
  • Many files may still remain un-clustered. These
    are called Orphans.
  • Attempt to place each orphan in a subsystem which
    has larger connectivity to it.
  • If the leaf files have not been assigned yet, put
    them in

14
Algorithm Validation
  • ACDC was run on the following systems
  • TOBEY Used by IBM for optimizing compiler
    products. Contains 939 source files and 250,000
    lines of code.
  • Linux 955 source files and 750,000 lines of code
  • Results

TOBEY is professionally developed, while Linux
is open source.
15
Conclusion
  • Clustering should be comprehension driven
  • ACDC attempts to do this with the following
  • Manageable cluster size (? 20).
  • Meaningful subsystem naming.
  • Pattern driven clustering (easier to understand)
  • Good quality clustering makes it a good candidate
    for engineering projects.

16
Thank You!
  • References
  • V. Tzerpos and R. C. Holt. ACDC An algorithm for
    comprehension driven clustering. In Proceedings
    of the 7th Working Conference on Reverse
    Engineering, pages 258267, Brisbane, Australia,
    November 2000.
Write a Comment
User Comments (0)
About PowerShow.com