Searching for the Boundaries of Concepts in Code - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Searching for the Boundaries of Concepts in Code

Description:

Concept Assignment to Raise the Abstraction Level of Slicing ... Move to fitter neighbouring solution. Try to escape when reaching at local optimum ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 39
Provided by: kiarash
Category:

less

Transcript and Presenter's Notes

Title: Searching for the Boundaries of Concepts in Code


1
Searching for the Boundaries of Concepts in Code
  • Part of the
  • CONTRACTS Project

2
Concept Assignment to Raise the Abstraction Level
of Slicing
Professor Mark Harman Kings College London Dr
Nicolas Gold Kings College London Kiarash
Mahdavi Kings College London Zheng Li Kings
College London Professor Rob Hierons Brunel
University Professor Dave Binkley Loyola
College, USA Professor Jim Cordy Queens
University, Canada DaimlerChrysler Knowledge
Software
3
Concept Assignment
  • First defined in 1993
  • The process of assigning descriptive terms to
    their implementation in source code (and possibly
    to each other), the terms being nominated by a
    maintainer and usually relating to computational
    intent.

4
Concept Assignment
  • Strength
  • More Abstract understanding of software.
  • useful for comprehension.
  • Weakness
  • Usually not executable.
  • can not be directly applied to testing,
    re-factoring and other useful software
    maintenance operations.

5
Slicing
  • Process of extracting or isolating Parts of a
    program that depend on or are dependent upon a
    specified element of a program.
  • Strength
  • Executable.
  • Can be used for testing, maintenance and software
    reuse.
  • Weakness
  • Effective use of slicing requires good code
    understanding.

6
Concept Assignment and Slicing
  • Concept Assignment provides good code
    understanding, but may not be executable.
  • Slicing is executable, but effective slicing
    requires good code understanding.

7
Concept Assignment and Slicing Potential
  • Potential to create More expressive slices
  • Potential for new types of analysis e.g.
    concept-level impact analysis
  • Could also facilitate
  • Reuse/reengineering
  • Comprehension/reverse engineering
  • Domain model improvement

8
Now back to the presentation
  • Our current Concept Assignment algorithm (HB-CA)
    provides good Quality Assignments but does not
    consider overlapping concepts.
  • But Concepts do overlap.

MOVE EXAMPLE TO PRINT-LL. MOVE POLICY-NUM TO
OUT-PNUM. MOVE 13 TO PRINT-CC. MOVE SCHEME-REF
TO OUT-SREF. CALL PRINT USING P-PRINTLINE. CALL
WRITE USING OUT-REC.
9
Hypothesis Based Concept Assignment
  • HB-CA requirements
  • source code
  • Library contains indicators and concept
    relationships provided by maintainer

10
Library
  • Provided by maintainer
  • Contains Indicators and concept relationships
  • Indicators are used to scan the software to
    allocate the possible presence of concepts
    (Hypothesis)
  • Concept relationships demonstrate how related
    concepts can be combined to identify complex
    concepts

11
Concept relationship Example
Action
Write
Object
Record
Database
File
Specialisation
Transaction
PaymentFile
12
HB-CA Process
  • Hypothesis Generation Using indicators to locate
    and create an ordered list of concept hypothesis
    (Hypothesis List).
  • Segmentation Using a Self Organising Map to
    cluster the Hypothesis list to identify areas of
    conceptual focus.
  • Concept binding Scoring the clusters and
    allocating (binding) a simple or complex concept.

13
Hypothesis Generation
Hypothesis List
Source code
. . Output File Output . . .
. . 6----------------- 7----------------- 8---
-------------- 9----------------- . . .
Indicator Library
14
Segmentation
  • Currently Action Hypothesis are used to train the
    SOM.
  • SOM is used to identify non overlapping
    (isolated) clusters within hard segments.

15
Concept Binding
  • Each cluster created by SOM is examined.
  • All possible Simple or composite permutations of
    Concepts are scored.
  • The highest scoring Concept is selected and bound
    to the partition.

16
Concept binding example
. . Output File Output . . .
. . 6----------------- 7----------------- 8---
-------------- 9----------------- . . .
. . Output File Output . . .
Segmentation Algorithm(SOM)
Output File
Concept Relationship Library
Write Customer Record
Hypothesis List
Source code
Hypothesis List
17
Overlapping Concept Problem Definition
  • To find a set of concepts such that
  • Create strongest concept binding.
  • Are not restricted by boundaries.
  • Cover as much of the Segment as possible.

18
Algorithms
  • Genetic Algorithm
  • Evolve a population of solution (chromosomes) by
    using operators such as mutation and crossover.
  • Guided by a fitness function.
  • Hill Climbing
  • Search through the local neighbourhood of a
    solution
  • Guided by a fitness function.
  • Try to escape local optima
  • Random search.

19
Chromosome
  • Represent clustering solution to a Hard Segment
  • Composed of Genes, representing clusters.
  • Genes may have an on or off switch, which is used
    for the fitness function.
  • Variable number of genes

20
Chromosome Structure
Chromosome
--------------------
Genes
Hyp. list
21
Fitness Function
  • All algorithms use the same fitness function.
  • Used to evaluate a potential solution (set of
    clusters) against a set of desirable
    characteristics.
  • Initially a winning concepts is identified for
    each cluster (concept binding).
  • The set of clusters are evaluated according to
    the fitness function.

22
Fitness Function
Concept binding Strength
Segment Coverage
23
Fitness Function Examples
Cluster fitness
Coverage
Cluster length
1
A A B Z Z B B Z Z
1-01-11
1
3-23-04
5
6
Fitness (8 7)/((211)9) 0.483
5
3-23-13
7
8
11
24
Fitness Function
  • After evaluation of all cluster Fitness
    (determined by corresponding Gene), Overlapping
    clusters with the same winning concept that are
    of lower fitness are turned off.
  • Also complete overlap is not allowed and
    corresponding gene is turned off.

25
Genetic Algorithm
  • Tournament selection
  • 0.99 coefficient
  • Flexible Stopping condition
  • fitness stagnation over period of 50 generations

26
Genetic Algorithm
  • Crossover
  • Clusters boundaries of the Genes are used to
    create new Genes (clusters)
  • Only on Genes are used
  • Crossover rate 0.8
  • Mutation
  • Consist of randomly changing the value of start
    or end location of a cluster within a gene
  • Mutation rate 0.001

27
Hill Climbing
  • Start from a random solution
  • Evaluate fitness of the solutions neighbourhood
  • Move to fitter neighbouring solution
  • Try to escape when reaching at local optimum

28
Hill Climbing
  • Two Stages
  • Local search stage (moving through neighbours)
  • Evaluate the neighbourhood of current chromosomes
  • Escape from local optimum
  • Crossover the Genes within the current chromosome

29
Hill Climbing
  • Local search
  • Operations to examine Neighbours
  • Move increase or decrease both boundaries by 1
    hypothesis
  • Resize moving one of the cluster boundaries by 1
    hypothesis

30
Hill Climbing
  • Escape from local optimum
  • Crossover Genes at random
  • Add the Genes that improve on the overall fitness
    to the solution

31
Random Search
  • For comparison purposes
  • Randomly generate and evaluate chromosomes
  • Stop when reached the largest number of fitness
    calculations used by GA or HC

32
Experimental details
  • 21 COBOL II programs
  • 10 runs per segment for each search type

33
GA and HC Median fitness ordered by Segment Size
34
GA and Random Median Fitness ordered by Segment
Size
35
HC and Random Median Fitness ordered by Segment
Size
36
GA and HC Cost Comparison ordered by Segment Size
37
Conclusions
  • GA does best, and HC worst.
  • A better definition of HC neighbouring solutions
    may be helpful.

38
Discussion / Future Work
  • Alternative graphical representation to clearly
    display distribution of results.
  • Complexity of searching for overlapping clusters.
  • Statistical evaluation of the results
    significance.
Write a Comment
User Comments (0)
About PowerShow.com