Summarizing Performance Data - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Summarizing Performance Data

Description:

This is simple if we can assume that the data comes from an iid model ... Mean and standard deviation make sense when data sets are not wild ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 56
Provided by: jeanyves4
Category:

less

Transcript and Presenter's Notes

Title: Summarizing Performance Data


1
Summarizing Performance Data
  • Important
  • Easy to Difficult
  • Warning some mathematical content

2
1 Summarizing Performance Data
  • How do you quantify
  • Central value
  • Dispersion (Variability)

old
new
3
Histogram is one answer, but is not summarized
enough
4
ECDF allow easy comparison
new
old
5
Summarized Measures
  • Median, Quantiles
  • Median
  • Quartiles
  • P-quantiles
  • Mean and standard deviation
  • Mean
  • Standard deviation
  • What is the interpretation of standard deviation
    ?
  • A if data is normally distributed, with 95
    probability, a new data sample lies in the
    interval

6
Coefficient of Variation
  • Scale free
  • Summarizes variability
  • Second order
  • For a data set with n samples
  • Exponential distribution CoV 1

7
Lorenz Curve Gap
  • Alternative to CoV
  • For a data set with n samples
  • Scale free, index of unfairness

8
Jains Fairness Index
  • Quantifies fairness of x
  • Ranges from
  • 1 all xi equal
  • 1/n maximum unfairness
  • Fairness and variability are two sides of the
    same coin

9
Lorenz Curve
Perfect equality (fairness)
Lorenz Curve gap
  • Old code, new code is JFI larger ? Gap ?
  • Ginis index is also used Def 2 x area between
    diagonal and Lorenz curve
  • More or less equivalent to Lorenz curve gap

10
Which Summarization Should One Use ?
  • There are (too) many synthetic indices to choose
    from
  • Traditional measures in engineering are standard
    deviation, mean and CoV
  • In Computer Science, JFI and mean
  • JFI is equivalent to CoV
  • In economy, gap and Ginis index (a variant of
    Lorenz curve gap)
  • Statisticians like medians and quantiles (robust
    to statistical assumptions)
  • We will come back to the issue after discussing
    confidence intervals

11
2 Confidence Interval
  • Do not confuse with prediction interval
  • Quantifies uncertainty about an estimation

12
mean and standard deviation
quantiles
  • central value
  • accuracy of central value
  • dispersion

13
Confidence Intervals for Mean of Difference
  • Mean reduction 0 is outside the confidence
    intervals for mean and for median
  • Confidence interval for median

14
Computing Confidence Intervals
  • This is simple if we can assume that the data
    comes from an iid modelIndependent Identically
    Distributed
  • How do I know if this is true ?
  • Controlled experiments draw factors randomly
    with replacement
  • Simulation independent replications (with random
    seeds)
  • Else we do not know in some cases we will have
    methods for time series

15
What does independence mean ?
16
CI for median
  • Is the simplest of all
  • Robust always true provided iid assumption holds

17
(No Transcript)
18
Confidence Interval for Median
  • n 31
  • n 32

19
CI for mean and sdt dev
  • Most commonly used method
  • But requires some assumptions to hold, may be
    misleading if they do not hold
  • There is no exact theorem as for median and
    quantiles, but there are asymptotic results and a
    heuristic.

20
Normal Case
  • Assume data comes from an iid normal
    distributionUseful for very small data samples
    (n lt30)

21
Tables in Weber-Tables
22
Example
  • n 100 95 confidence levelCI for meanCI
    for standard deviation
  • amplitude of CI decreases incompare to
    prediction interval

23
Standard Deviation n or n-1 ?
24
CI for mean, asymptotic case
  • If data is not normal but the central limit
    theorem holds(in practice n is large and
    distribution is not wild)

25
Example
  • CI for mean same as before except s instead
    of ? 1.96 for all n instead of 1.98 for n100
  • In practice both (normal case and large n
    asymptotic) are the same if n gt 30
  • But large n asymptotic does not require normal
    assumption

26
Bootstrap Percentile Method
  • A heuristic that is robust (requires only iid
    assumption)
  • But be careful with heavy tail, see next
  • but tends to underestimate CI
  • Applies to mean and any other statistic
  • Idea use the empirical distribution in place of
    the theoretical (unknown) distribution
  • For example, with confidence level 95
  • the data set is S
  • Do r1 to r999
  • (replay experiment) Draw n bootstrap replicates
    with replacement from S
  • Compute sample mean Tr
  • Bootstrap percentile estimate is (T(25), T(975))

27
Example Compiler Options
  • Does data look normal ?
  • No
  • Methods 2.3.1 and 2.3.2 give same result (n gt30)
  • Method 2.3.3 (Bootstrap) gives same result
  • gt Asymptotic assumption valid

28
Other Example File Transfer Times
  • Normal assumption and Bootstrap do not coincide
    for data
  • Symtom that Asymptotic assumption may not hold
    Normal assumption does not hold
  • This is an example of wild distribution
  • They coincide for log of data

29
Take Home Message
  • Confidence interval for median (or other
    quantiles) is easy to get from the Binomial
    distribution
  • Requires iid
  • No other assumption
  • Confidence interval for the mean
  • Requires iid
  • And
  • Either if data sample is normal and n is small
  • Or data sample is not wild and n is large enough
  • The boostrap is more robust but more complicated
    to use
  • To apply student or normal statistic, we need to
    verify the assumptions

30
QQplot is common tool for verifying assumption
  • Normal Qqplot
  • X-axis standard normal quantiles
  • Y-axis Ordered statistic of sample
  • If data comes from a normal distribution, qqplot
    is close to a straight line (except for end
    points)
  • Visual inspection is often enough
  • If not possible or doubtful, we will use tests
    later

31
QQPlots of Compiler Options
  • Both data sets do not look normal

32
Verifying Assumption
  • If data set looks normal (by inspection of
    qqplot) OK
  • Else, do the test of the asymptotic regime
  • Compute bootstrap replicates of the estimator of
    the mean
  • If the asymptotic regime holds, they should look
    normal

33
QQplots of Compiler OptionBootstrap Replicates
  • Asymptotic CI is valid

34
QQplots of File Transfer TimesBootstrap
Replicates
  • Do not appear to be normal

35
Prediction Interval
  • CI for mean or median summarize
  • Central value uncertainty about it
  • Prediction interval summarizes variability of
    data

36
Prediction Interval based on Order Statistic
  • Assume data comes from an iid model
  • Simplest result (not well known, though)

37
Prediction Interval for small n
  • For n39, xmin, xmax is a prediction interval
    at level 95
  • For n lt39 there is no prediction interval at
    level 95 with this method
  • But there is one at level 90 for n gt 18
  • For n 10 we have a prediction interval xmin,
    xmax at level 81

38
Prediction Interval for Small n and Normal Data
Sample
39
Re-Scaling
  • Many results are simple if the data is normal, or
    close to it (i.e. not wild). An important
    question to ask is can I change the scale of my
    data to have it look more normal.
  • Ex log of the data instead of the data
  • A generic transformation used in statistics is
    the Box-Cox transformation
  • Continuous in ss0 logs-1 1/xs1
    identity

40
Prediction Intervals for File Transfer Times
mean and standard deviation
order statistic
mean and standard deviation on rescaled data
41
Take Home Message
  • The interpretation of ? as measure of
    variability is meaningful if the data is normal
    (or close to normal). Else, it is misleading. The
    data should be best re-scaled.

42
Non-standard Means
  • Geometric, etc. means were invented for cases
    where the data does not look normal they
    correspond to re-scaling
  • Compare to prediction interval the exponential
    of a prediction interval for the log of the data
    is a prediction interval for the data

43
Which Summarization Should I Use ?
  • Two issues
  • Robustness to outliers
  • Compactness

44
Outlier in File Transfer Time
45
Robustness of Conf/Prediction Intervals
Based on mean std dev
mean std dev
Order stat
Based on mean std dev re-scaling
CI for median
geom mean
Outlier removed Outlier present
46
Fairness Indices
  • Confidence Intervals obtained by Bootstrap
  • How ?
  • JFI is very dependent on one outlier
  • As expected, since JFI is essentially CoV, i.e.
    standard deviation
  • Gap is sensitive, but less
  • Does not use squaring why ?

47
Compactness
  • If normal assumption (or, for CI asymptotic
    regime) holds, ? and ? are more compact
  • two values give both CIs at all levels,
    prediction intervals
  • Derived indices CoV, JFI
  • In contrast, CIs for median does not give
    information on variability
  • Prediction interval based on order statistic is
    robust (and, IMHO, best)

48
Take-Home Message
  • Use methods that you understand
  • Mean and standard deviation make sense when data
    sets are not wild
  • Close to normal, or not heavy tailed and large
    data sample
  • Use quantiles and order statistics if you have
    the choice
  • Rescale

49
2.10 Intersection of Intervals
  • We have several methods to find CIs it is
    tempting to take intersections.
  • It does not work well

50
A Statistical Curiosity
51
The Meaning of Confidence
52
Exercises
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com