Title: Analysis
1Analysis Evaluation of Data
- The collected data should be
- Reliable
- none or very little error is committed in the
gathering and tabulation of data - Accurate
- maintain the desired degree of precision
- Valid
- the data is applicable to the issue and attribute
of interest
2Sample Consideration
- We have collected error data on Requirements
Inspection, Design Inspection and Unit Testing
and want to analyze them for quality attribute - Potential Reliability problem?
- Did we collect and count the data correctly in
all three cases - Potential Accuracy problem ?
- Did we use the same level of precision (e.g. same
level of severity breakdown) - Potential Validity problem ?
- Is number of defect a valid quality attribute
- Do these data reflect a measure of the extent of
defects committed (extent number, severity,
complexity of fix, etc. ?)
3Some Common Analysis Methods of Data
- Distribution of Data
- Centrality and Dispersion
- Moving Averages
- Data Correlation
- Normalization of Data
41. Distribution of Data
- We often look at a scatter diagram of the raw
data and pick out the outliers - We count the frequency of occurrences and get a
distribution to get a view of the shape of the
distribution and the range of distribution. - severity 1 7 defects
- severity 2 24 defects
- severity 3 26 defects
- severity 4 88 defects
- severity 5 92 defects
- Range is from 7 defects to 92 defects
- Shape is not that important in this case,
- the skew is towards the less severe defects
5Common Distributions of Data
- There are some recognizable distributions
Normal
Linear
Logarithmic
Exponential
Negative Exponential
62. Centrality and Dispersion
- Use centrality to compare two sets of data
distribution - mean
- median
median value
mean value
median value
median value
mean value
Mean value
7Variance Standard Deviation
- A measure of dispersion from the central value
(see below) - we measured number of defects (xi) from n similar
sized functional areas - the mean or central value is calculated Xmean
?(xi) / n - the variance ? ( (Xi Xmean )2 ) / n
- Std Dev. SQRT (variance)
- For Normal Distribution, 1 Std captures about 68
of the sample. - Given a new function of similar size, we can
measure the number of defects found and compare
against the mean of the earlier group and the 1
std deviation.
8Control Chart
1 Std Dev.
Mean 5.3
1 Std Dev.
93. Moving Average - a Smoothing Technique
Jump smoothed
Jump smoothed
Special jump
104. Correlation
- Only addresses whether there is a relationship
- Does not address cause and effect
- Example
- size of the module may correlate to number of
defects - but size of the module may or may not be the
cause
11Linear Relationship
Y
Linear equation of the form Y a bX where
- b is the slope and - a is the y
intercept
X
12Least Square Linear Regression
- A method of estimating the linear relationship of
Y variables with the X variables in the following
form by minimizing the distance of Y coordinates
from the linear line to get Y abX. - We can estimate the parameters a, b as follows
- b ?(XY) - (1/n)(?X)(?Y)/ ?(X2) -
(1/n)(?X)2 - this b estimate gives the same value as the one
shown in the book - a Yave - (bXave)
- where X is each of the X observation and Xave is
the average of Xs
13Least Square Linear Regression - Example
- (size,defects) (150,2) (230,3)(500,4)(730,7)
(1000,9) - Xs 150, 230, 500, 730, 1000 ?(Xs) 2610
- X2 22,500 52,900 250,000 532,900 1,000,000
and
?(X2) 1,858,300 - Ys 2, 3, 4, 7, 9 ?(Y) 25
- XY 300, 690, 2000, 5110, 9000 ?(XY) 17,100
- b 17100-(1/5)(2610)(25)/1858300
-(1/5)((2610)2) - 4050/495880 .0081
- a 25/5 - (.0081)(2610/5) 5 - 4.23 .77
- Least Square Regression line is Y .77 .0081
X
Lets plug in x 150 and see what we get. .0081
(150) .77 1.22 .77 1.99 (close!) More
accurate for interpolation than extrapolation.
145. Normalization
- Pure data gives 1-dimensional comparison
- program A 52 person days to complete
- program B 33 person days to complete
- program C 64 person days to complete
- 64 gt 52 gt 33 what else can we say ? (suspect
different sizes of programs) - Normalization gives an equalizing factor in terms
of another attribute. - 52 person days 5000 loc or 96.1 loc /
person day - 33 person days 3000 loc or 90.9 loc /
person day - 64 person days 6000 loc or 93.7 loc /
person day