Title: Source code metrics in the software industry
1Source code metrics in the software industry
School of Computer and Information Science, Edith
Cowan University PhD research project by Tim
Littlefair supervised by Dr Thomas
O'Neill http//www.fste.ac.cowan.edu.au/tlittlef
2Source code metrics
- Software metrics are generally classified as
being divided into two categories process
metrics and product metrics. - The value of process metrics to aid software
process management is now widely accepted. - Source code metrics are a subset of the product
metric classification. Many source metrics have
been proposed, there is no consensus as to which
are useful.
3Project features
- Metrics of interest were selected.
- Automated measurement tool was implemented.
- Tool was deployed and evaluated by developers
performing real-world development under
commercial conditions. - Internet was used to distribute tool and solicit
evaluation feedback. - Project is concluding with experiment on use of
metrics data as support for software inspection
process.
4Metrics Implemented
- Procedural metrics (measured on a per-function
basis) lines of code, lines of comment,
McCabe's Cyclomatic Complexity. - Metrics of object oriented design (proposed by
Chidamber and Kemerer) depth of inheritance
tree, number of children, coupling between
objects, weighted methods per class. - Structural metrics (based on work by Henry
Kafura) fan-in, fan-out, information flow.
5Evaluation survey
- Tool was publicised via USENET, deployed (and
refined) over a 6 month period. - USENET and email addresses from FTP logs used to
publicise evaluation URL. - 25 respondents over 3 month period, weak
consensus on value of procedural metrics, opinion
neutral or marginally negative on OO design
metrics and structural metrics.
6Review Experiment
- Survey yielded little interesting data - need to
extend project to make it worthwhile. - Experiment designed to attempt to refute null
hypothesis (i.e. "metrics are of no value") by
detecting positive effect of metrics use. - Simulated code review - comparing performance of
groups with and without metrics support.
7The exercise
- 5 Java classes to be reviewed.
- Each respondent to give yes/no answer on the
presence of each of 5 risk factors for each class
(e.g. "Excessive length", "Inadequate
commenting"). - Comparison of performance requires independently
determined set of 'correct' answers.
8Treatment groups
- Group 0 perform the exercise without metric
support with plenty of time (1 hour) - Group 1 perform the exercise without metric
support under time constraint (15 mins) - Group 2 perform the exercise with metric support
under time constraint (15 mins) - Group 0 are used to derive 'correct' responses,
value of metrics support is assessed by seeing if
group 2 reflects these better than group 1.
9Experimental Outcomes
- Code review experiment closed after attracting 15
volunteers (6 in group 0, 4 in group 1 and 5 in
group 2) - This was less than we were hoping for, but enough
to do analysis on the results - Derivation of correct results by selecting a
threshold for group 0 responses - Statistical techniques used in processing of
returns contingency tables, receiver-operating
characteristic analysis, chi-squared analysis.
10Cumulative Responses
- The table below shows the number of respondents
in each group, together with the number of
positive (risk present) responses by each group
to each question. - Group 0 responses are distilled to derive the
correct answer to each of the 25 questions, the
performance of groups 1 and 2 will be assessed in
terms of level of agreement with these derived
responses.
11Receiver Operating Characteristic (ROC)
The ROC graph presents a visual summary of the
way a predictive system responds to a sample of
real cases, some of which will be on the
borderline. A perfect predictive system would
follow the left and upper boundaries of the
graph, one which is no better than chance would
follow the leading diagonal.
12Chi-squared test (1)
- We have gathered data on TP, TN, FP, FN responses
of 2 groups. - Are differences between groups due to systematic
factors or random variation? - Standard chi-squared test
- start from contingency table
- calculate chi-squared figure
- compare to characteristic value for desired
degree of certainty and size of contingency table - testing for null hypothesis (absence of
systematic difference)
13Chi-squared test (2)
14Experimental Conclusion (1)
- No significant difference in the performance of
the code review exercise was found between the
control group and the treatment group. We
therefore conclude that there is no evidence from
the current experiment to suggest that the
metrics information is of benefit in the setting
simulated by this experiment.
15Experimental Conclusion (2)
- While the experiment failed to demonstrate a
significant difference between the performance of
the two groups, nonetheless it is possible to say
that a difference in performance was observed.
Although the statistical analysis of the data
shows that the difference observed was well
within the range of outcomes that might arise out
of the operation of random effects, it is
possible that a similar experiment with a larger
number of participants might demonstrate a
significant effect.
16Project Summary
- Nature of source code metrics
- Selection of existing and new metrics based on
GQM paradigm - Evidence on industry attitudes
- Empirical evaluation of tools
- human decision support context
- problems of realistic experimentation
- burden of proof
- appropriate statistical techniques