PERFORMANCE ANALYSIS ECE6101 - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

PERFORMANCE ANALYSIS ECE6101

Description:

Quantile/Percentile. Median. The 50 percentile or 0.5 quantile. Mode ... quantile of Standard Normal Distrib. see tables at the end of the book. Dr. Farhat Anwar ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 45
Provided by: Fam89
Category:

less

Transcript and Presenter's Notes

Title: PERFORMANCE ANALYSIS ECE6101


1
PERFORMANCE ANALYSISECE-6101
  • Lecturer Dr Farhat Anwar
  • Web http//eng.iiu.edu.my/farhat/download/ECE61
    01

2
  • Topics include
  • The ratio game
  • Mean/ Median/ Mode
  • Clustering
  • Ch-11 Raj Jain
  • Ch-12 Raj Jain

3
The Ratio Game
  • If you cant convince them, confuse them.
  • Trumans Law
  • Based on Mathematical Fact that...
  • ... is not equal

4
Graphical Version with
  • Percentages are basically ratios. They allow
    playing ratio games in ways that do not look like
    ratios.
  • Example Two experiments were repeatedly
    conducted on two systems. Each experiment either
    passes or failed (system either met the specified
    performance goal or did not). The results are
    tabulated in the first two rows of Table 11.8.

5
Graphical Version with
  • One alternative to compare the two systems is to
    take each experiment individually, as shown in
    Fig. 11.1a. The conclusion from this figure is
    that System B is better than System A in both
    experiments.
  • Another alternative is to add the results of the
    two experiments as shown in the last row of Table
    11.8 and plotted in Figure 11.1b. The conclusion
    in this case is that System A is better than
    System B

6
Graphical Version with

7
Graphical Version with
  • Actually both alternatives have the problem of
    incomparable bases.
  • In alternative 1, the base is the total number of
    times the experiment is repeated on a system,
    which is different for the two systems.
  • In alternative 2, the base is the sum of
    repetitions of the two experiments together,
    which is also different for the two systems.

8
The Ratio Game
  • Ratios have a numerator and a denominator.
  • The denominator is also called a base
  • Two ratios with different bases are not
    comparable.
  • Definition
  • The technique of using ratios with incomparable
    bases and combining them to ones advantage is
    called ratio game.

9
Consequences/Strategies
  • If one system is better on all benchmarks...no
    contradictions.
  • Even if one system is better on all benchmarks...
    ratio game leads to better relative number by
    selecting the appropriate base.
  • If one system is better in some cases and worse
    in other cases contradictory conclusions can be
    drawn sometimes.
  • If the metric is LB (lower better) then use your
    favorite system as a base.
  • If the metric is HB (higher better) then use your
    opponents system as a base.
  • Benchmarks that perform better should be
    elongated, those that perform worse should be
    shortened.
  • Remember Taking an average of a ratio is not a
    correct way to analyze data.

10
A mathematical Analysis.
  • Using raw data, System A is better if and only
    if
  • Using System A as a base, System A is better if
    and only if
  • Using System B as a base, System A is better if
    and only if

11
A mathematical Analysis.
  • Both axes in the figure show the relative
    performance of System B with system A as a base.

12
A mathematical Analysis.
  • Using raw data, system A is better if and only if
  • (ab)/2 gt (axby)/2 or ylt-(a/b)x(ab)/b
  • System A is better than System B below the line
    y-(a/b)x(ab)/b and worse above the line.
  • Using System A as the base, System A will be
    considered better if and only if
  • (xy)/2lt1 or ylt2-x
  • The region below the line y2-x represents the
    subspace in which System A will be considered
    better.
  • If the measurement fall in the additional area
    (shaded area), System A will be considered worse
    than System B using raw data but will be
    considered better using system A as the base.
  • This is one possible region where ratio games
    will lead to contradictory conclusions

13
A mathematical Analysis.
  • If System B is used as the base, System A will be
    considered better if and only if
  • 1/2(1/x1/y) gt 1 or y lt x/(2x-1)
  • This equation represents a hyperbola.
  • In the region below this hyperbola, System A
    will be considered better.
  • This region covers a bigger area for System A.
  • Once again the region above the lines
    corresponding to
  • y-(a/b)x(ab)/b and y2-x and below this
    hyperbola represents the subspace in which
    contradictory results will be reached by using
    different bases and raw results.

14
What is wrong in those games?
  • Cant take the mean value of ratios!
  • How can we fix a good analysis
  • Do your homework in statistics...
  • Rest of this lecture
  • Index - where to look it up
  • Walk trough English terminology
  • Recipes
  • No proofs - no derivations

15
Summarizing Measured Data
  • Independent Events
  • Two events are called independent event if the
    occurrence of one event does not in any way
    affect the probability of the other event.
  • Random Variable
  • A variable is called a random variable if it
    takes one of a specified set of values with a
    specified probability.
  • Cumulative Distribution Function (cdf)
  • The Cumulative Distribution Function (CDF) of a
    random variable maps a given value a to the
    probability of the variable taking a value less
    than or equal to a

16
Summarizing Measured Data
  • Probability Density Function (pdf)
  • The derivative
  • of the CDF F(x) is called the probability
    density function (pdf) of x.
  • Given a pdf f(x), the probability of x being in
    the interval (x1, x2) can also be computed by
    integration

17
Summarizing Measured Data
  • Probability Mass Function (pmf)
  • For discrete random variable, the CDF is not
    continuous and, therefore, not differentiable. In
    such cases, the probability mass function (pmf)
    is used in place of pdf.
  • Consider a discrete random variable x that can
    take n distinct values x1,x2,,xn with
    probabilities p1,p2,,pn such that the
    probability of the ith value xi is pi. The pmf
    maps xi to pi
  • The probability of x being in the interval
    (x1,x2) can also be computed by summation

18
Summarizing Measured Data
  • Probability Mass Function (pmf)
  • Mean or Expected Value

19
Summarizing Measured Data
  • Variance
  • Standard Deviation
  • Coefficient of Variation

20
Summarizing Measured Data
  • Covariance
  • Covariance Symbols
  • Correlation Coefficient
  • Mean and Variance of Sums
  • see Formula in Book

21
  • Quantile/Percentile
  • Median
  • The 50 percentile or 0.5 quantile
  • Mode
  • Most likely value, that is xi, that has the
    highest probability pi or max of pdf(xi).

22
  • Normal Distribution N(µ, ?)
  • Standard Normal Distribution N(0,1)
  • ?-quantile of Standard Normal Distrib.
  • see tables at the end of the book

23
Central Limit Theorem
  • The sum of a large number of independent
    observations from any distribution tends to have
    a normal distribution.
  • The sum of a normal variate is a normal variate

24
Summarizing Data by a Single Number
  • Averages or (indices of central tendencies)
  • Sample mean
  • Sample median
  • Sample mode
  • Selecting among them

25
  • Examples
  • Mode Most used resource in a system.
  • Mean Interarrival time of packets.
  • Median Load on a computer.
  • Median Average configuration (number of devices,
    memory size...).
  • Abuses of arithmetic mean
  • Significantly different values
  • Skewed distribution
  • Multiplying means of dependent variables.
  • and once more....
  • Taking means of ratio with different bases

26
Other means
  • Geometric mean
  • works also fine for ratios
  • equal to exparithmetic mean of logxi).
  • very commonly used in benchmarks.

27
  • Harmonic mean
  • works when 1/x are cumulative.
  • work well for rates
  • MIPS
  • MByte/s
  • MFlops

28
Summarizing Variability
  • Variability is specified the following measures
    or indices of dispersion
  • Range - min...max
  • Variance or standard deviation
  • 10 and 90-percentile
  • semi-interquantile range
  • mean absolute deviation

29
Discussion of Variability Indices
  • range
  • extremely unstable, one outlier and you are gone
  • sample variance and sample std.dev.
  • only n-1 independent differences
    ,degree of freedom n-1.
  • Variances and sample std.dev. Are absolute
    measures.
  • Variance has square as a unit.
  • C.O.V is normalized by the mean and better
  • C.O.V of 5 is bad,
  • C.O.V.of 0.2 (or 20) is good.

30
Specifying quantiles
  • Definition How much of the distribution density
    is within a certain range.
  • 5 - 95 is about equivalent to range.
  • Decile, quartiles... fixed quantiles increment
    0.1 or 0.25.
  • Semi-Interquartile Range is difference between Q3
    and Q1.
  • mean absolute deviation Sort of a Std. Dev. but
    with absolute value instead of square.

31
  • Selecting correct index of dispersion

32
Determining the Distribution
  • do a quantile/quantile plot
  • determine the quantile of the suspected
    distribution
  • e.g. for N(0,1) approximation as follows
  • Plot it against the quantiles of your
    distribution.
  • see if you can match a linear function.

33
  • Example Modelling error for 8 predictions of a
    model were found to be yi. Are errors normally
    distributed?

34
  • Result Yes, see graphic.
  • Normal qualtile-quantile plot for the error data

Residual quantile
Normal quantile
35
  • some matches on Figure 12.6 in the book
  • Interpretation of normal quantile-quantile plots

36
Clustering
  • Take a sample (subset of workload components)
  • Select workload parameters
  • Transform parameters (e.g. to log)
  • Remove outliers
  • Scale data (e.g. to µ0, s1)
  • Select a distance metric (e.g. euclid)

37
  • Do clustering algorithm
  • Interpret results
  • Change parameters (e.g. number of clusters)
    repeat from step transform.
  • Select a representative of each cluster

38
Example Simple Spanning Tree Method
  • Consider workload with five components and two
    parameters. The CPU time and number of I/Os were
    measured for five programs.
  • For more sophisticated approaches...read an AI or
    IT paper for more (keyword VC dimension).

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com