Title: Dave Binkley and Mark Harman
1Dave Binkley and Mark Harman
- Locating Dependence Clusters and Dependence
Pollution - Preview of ICSM 05 talk
2Overview
- Dependence and slicing
- Monotone Slice Size Graphs (MSGs)
- Approximating similarity
- Verification Does the approximation work?
- Validation Do the clusters occur in real code?
- Pollution
- Refactoring
3Dependence
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8program slicing
the essential idea ...
which other lines affect the selected line?
dont waste time on the grey part
reuse only the red part
debugging
re-use
9Manifesto
- Slicing as a means to an end
- To produce metrics
- To produce visualizations
- To understand dependence structure
DASL
10MSGs
- Monotone Slice Size Graph
- Plot all slices of program in one graph
- Order by monotonically increasing slice size
- This gives a landscape
11Example MSG
X axis percentage of slices represented Y axis
normalised slice size
12Example MSG
X axis percentage of slices represented Y axis
normalised slice size
13Conjecture
- Clusters correspond to sheer cliff drops
- We implicitly assume same size is a close
approximation to same slice - In the previous example there were no clusters
- Lets look at another example
14(No Transcript)
15Verification
- How good is the approximation ?
- What is the chance that two slices could have the
same size and yet be different slices? - Well the same within a tolerance
16Tolerance different yet considered the
same Agreement of slices which are the same
17Tolerance different yet considered the
same Agreement of slices which are the same
18Tolerance different yet considered the
same Agreement of slices which are the same
19Verified
- Agreement is close for reasonable tolerane
- 20 programs studied
- 0.4 of slices need gt 1 tolerance for total
agreement - But 0.4 is the number of clusters
- Only 0.00533 of pairwise slice comparisons
require more than 1 tolerance to agree - These are the slices which simple happen to have
the same size yet are different slices - The chance of a false positive is therefore very
low
20Validation
- Do these clusters occur much in real code?
- We studied 20 programs
- The results were startling
- We expected clusters here and there
- We found them everywhere
- Well not quite
21Validation
- Of course two slices of the same size is a
cluster - So we search for large clusters
- We chose 10 as our threshold for large
- This is conservative
22(No Transcript)
23Results
- 6 had no clusters larger than 10
- 14 had at least one cluster larger than 10
- 4 of the 14 were extreme
24No clusters
25With clusters
26Extreme Cases
27Manifesto
active not passive use dependence analysis to
change programs
28Manifesto
active not passive use dependence analysis to
change programs
29Manifesto
active not passive use dependence analysis to
change programs
30Manifesto
active not passive use dependence analysis to
change programs
31(No Transcript)
32(No Transcript)
33What is pollution?
34What is pollution?
- Like noise pollution
- it is in the ear of the beholder
- It could be thought of as a bad thing
- mixed into a good thing
- unnecessarily
- We define dependence pollution to be avoidable
dependence clusters
35What is pollution?
Like noise pollution it is in the ear of the
beholder It could be thought of as a bad thing
mixed into a good thing unnecessarily We define
dependence pollution to be avoidable dependence
clusters
36Why
- Why are dependence clusters bad?
- Impact of change
- Comprehension
- Testing
- Reuse
-
- How could they be avoidable
- Capillary data flow (CFD)
- Mutually recursive Cluster (MRC)
37Case Study
- copia
- Example of a mutually Recursive Cluster (MRC)
- We can remove it by refactoring
- This removes the dependence pollution
38(No Transcript)
39Case Study
- bc
- A calculator program
- We looked for Capillary Data flow CDF
- We tried removing variable which contributed most
to dependence
40(No Transcript)
41Related work
- Clustering in the large
- This is fine grained clustering based on SDG
- Slicing in general
- But we use slicing as a means to an end
- We are interested in dependence profile
- Slicing as a means to an end?
- Bieman and Ott
- Canfora, De Lucia and Munro
- Korel and Rilling
- Beszédes et al.
- Krinke and Snelting
- Visualising Dependence
- Balmas
- Ball and Eick
42Future work
- Extend empirical results
- Categorise
- Other ways to reduce dependence
- By being more sophisticated about what it is
- Domain specific dependence
- Application specific dependence
43Conclusions
- Dependence clusters are prevalent
- They can be discovered using slicing
- The approximation is very close
- The visualisation can be a help