Analysis - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Analysis

Description:

Analysis & Evaluation of Data The collected data should be Reliable none or very little error is committed in the gathering and tabulation of data – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 15
Provided by: fts1
Category:

less

Transcript and Presenter's Notes

Title: Analysis


1
Analysis Evaluation of Data
  • The collected data should be
  • Reliable
  • none or very little error is committed in the
    gathering and tabulation of data
  • Accurate
  • maintain the desired degree of precision
  • Valid
  • the data is applicable to the issue and attribute
    of interest

2
Sample Consideration
  • We have collected error data on Requirements
    Inspection, Design Inspection and Unit Testing
    and want to analyze them for quality attribute
  • Potential Reliability problem?
  • Did we collect and count the data correctly in
    all three cases
  • Potential Accuracy problem ?
  • Did we use the same level of precision (e.g. same
    level of severity breakdown)
  • Potential Validity problem ?
  • Is number of defect a valid quality attribute
  • Do these data reflect a measure of the extent of
    defects committed (extent number, severity,
    complexity of fix, etc. ?)

3
Some Common Analysis Methods of Data
  1. Distribution of Data
  2. Centrality and Dispersion
  3. Moving Averages
  4. Data Correlation
  5. Normalization of Data

4
1. Distribution of Data
  • We often look at a scatter diagram of the raw
    data and pick out the outliers
  • We count the frequency of occurrences and get a
    distribution to get a view of the shape of the
    distribution and the range of distribution.
  • severity 1 7 defects
  • severity 2 24 defects
  • severity 3 26 defects
  • severity 4 88 defects
  • severity 5 92 defects
  • Range is from 7 defects to 92 defects
  • Shape is not that important in this case,
  • the skew is towards the less severe defects


5
Common Distributions of Data
  • There are some recognizable distributions













Normal
Linear













Logarithmic
Exponential
Negative Exponential
6
2. Centrality and Dispersion
  • Use centrality to compare two sets of data
    distribution
  • mean
  • median

median value
mean value
median value
median value
mean value
Mean value
7
Variance Standard Deviation
  • A measure of dispersion from the central value
    (see below)
  • we measured number of defects (xi) from n similar
    sized functional areas
  • the mean or central value is calculated Xmean
    ?(xi) / n
  • the variance ? ( (Xi Xmean )2 ) / n
  • Std Dev. SQRT (variance)
  • For Normal Distribution, 1 Std captures about 68
    of the sample.
  • Given a new function of similar size, we can
    measure the number of defects found and compare
    against the mean of the earlier group and the 1
    std deviation.

8
Control Chart

1 Std Dev.




Mean 5.3


1 Std Dev.

9
3. Moving Average - a Smoothing Technique
Jump smoothed
Jump smoothed
Special jump
10
4. Correlation
  • Only addresses whether there is a relationship
  • Does not address cause and effect
  • Example
  • size of the module may correlate to number of
    defects
  • but size of the module may or may not be the
    cause

11
Linear Relationship
Y
Linear equation of the form Y a bX where
- b is the slope and - a is the y
intercept









X
12
Least Square Linear Regression
  • A method of estimating the linear relationship of
    Y variables with the X variables in the following
    form by minimizing the distance of Y coordinates
    from the linear line to get Y abX.
  • We can estimate the parameters a, b as follows
  • b ?(XY) - (1/n)(?X)(?Y)/ ?(X2) -
    (1/n)(?X)2
  • this b estimate gives the same value as the one
    shown in the book
  • a Yave - (bXave)
  • where X is each of the X observation and Xave is
    the average of Xs

13
Least Square Linear Regression - Example
  • (size,defects) (150,2) (230,3)(500,4)(730,7)
    (1000,9)
  • Xs 150, 230, 500, 730, 1000 ?(Xs) 2610
  • X2 22,500 52,900 250,000 532,900 1,000,000
    and
    ?(X2) 1,858,300
  • Ys 2, 3, 4, 7, 9 ?(Y) 25
  • XY 300, 690, 2000, 5110, 9000 ?(XY) 17,100
  • b 17100-(1/5)(2610)(25)/1858300
    -(1/5)((2610)2)
  • 4050/495880 .0081
  • a 25/5 - (.0081)(2610/5) 5 - 4.23 .77
  • Least Square Regression line is Y .77 .0081
    X

Lets plug in x 150 and see what we get. .0081
(150) .77 1.22 .77 1.99 (close!) More
accurate for interpolation than extrapolation.
14
5. Normalization
  • Pure data gives 1-dimensional comparison
  • program A 52 person days to complete
  • program B 33 person days to complete
  • program C 64 person days to complete
  • 64 gt 52 gt 33 what else can we say ? (suspect
    different sizes of programs)
  • Normalization gives an equalizing factor in terms
    of another attribute.
  • 52 person days 5000 loc or 96.1 loc /
    person day
  • 33 person days 3000 loc or 90.9 loc /
    person day
  • 64 person days 6000 loc or 93.7 loc /
    person day
Write a Comment
User Comments (0)
About PowerShow.com