STATISTICS for the Utterly Confused, 2nd ed. SLIDES PREPARED PowerPoint PPT Presentation

presentation player overlay
1 / 51
About This Presentation
Transcript and Presenter's Notes

Title: STATISTICS for the Utterly Confused, 2nd ed. SLIDES PREPARED


1
STATISTICS for the Utterly Confused, 2nd ed.
  • SLIDES PREPARED
  • By
  • Lloyd R. Jaisingh Ph.D.
  • Morehead State University
  • Morehead KY

2
Chapter 3
  • Data Description Numerical Measures of
    Variability for Ungrouped Univariate Data

3
Outline
  • Do I Need to Read This Chapter?
  • 3-1 The Range
  • 3-2 The Interquartile Range
  • 3-3 The Mean Absolute Deviation
  • 3-4 The Variance and Standard Deviation
  • 3-5 The Coefficient of Variation
  • 3-6 The Empirical Rule
  • Its a Wrap

4
Objectives
  • Introduction of some basic statistical
    measurements of spread or variability.
  • How to compute these measures and investigate
    some of their properties.

5
Introduction
  • A measure of variability for a collection of data
    values is a number that is meant to convey the
    idea of spread for the data set.
  • The most commonly used measures of variability
    for sample data are the
  • range
  • interquartile range
  • mean absolute deviation
  • variance or standard deviation
  • coefficient of variation.

6
3-1 The Range
  • Explanation of the term range The range is the
    difference between the largest and smallest
    values in the data set.
  • NOTE The explanation is true for a sample as
    well as a finite population of values.

7
3-1 The Range
  • Example What is the range for the following
    sample values?
  • 3 8 6 14 0 4 0 12 7 0 -10
  • Solution First we should arrange from smallest
    to largest.
  • -10 -7 0 0 0 3 6 8 12 14
  • Range 14 (-10) 24

8
3-1 The Range
  • Question Why does subtracting the smallest value
    from the largest value is a measure of spread?
  • The next slide shows the plot of the data set.
  • Observe that the range measures the distance
    between the smallest and largest values.
  • This distance gives a measurement of the spread
    of the data.

9
3-1 The Range
Range gives a measurement of spread.
10
Quick Tip
  • The range does not use the concept of of
    deviations.
  • It is affected by outliers (large or small values
    relative to the rest of the data set).
  • The range does not utilize all the information in
    the data set only the largest and smallest
    values.
  • Thus it is not a very useful measure of spread or
    variation.

11
The Range -- Example
  • Example What is the range for the following
    sample values?
  • 9 995 1000 1002 1014
  • Solution Range 1014 9 1005
  • Here the range is significantly affected by the
    outlying value of 9.

12
3-2 The Interquartile Range
  • Explanation of the term interquartile range
    The interquartile range measures the spread of
    the middle 50 of an ordered data set.
  • NOTE The interquartile range is obtained using
    the following steps-
  • Step1 Order the data set from smallest to
    largest.
  • Step 2 Find the median for the ordered set.
    Denote by Q2.

13
3-2 The Interquartile Range
  • Step 3 Find the median for the first 50 of the
    ordered set. The median found in Step 2 is not
    included in this portion of the data. Denote by
    Q1.
  • Step 4 Find the median for the second 50 of the
    ordered set. The median found in Step 2 is not
    included in this portion of the data. Denote by
    Q3.

14
3-2 The Interquartile Range
  • Step 5 The interquartile range is computed from
    the following

15
3-2 The Interquartile Range
  • The following depicts the idea of the
    interquartile range.

16
3-2 The Interquartile Range
  • Example The following scores for a statistics
    10-point quiz were reported. What is the value
    of the interquartile range?
  • 7 8 9 6 8 0 9 9 9
  • 0 0 7 10 9 8 5 7 9

17
3-2 The Interquartile Range
  • Solution With the availability of technology, it
    makes it easy to compute the interquartile range.
  • We will present information from the MINITAB
    software and the TI-83 calculator to help compute
    the interquartile range.

18
3-2 The Interquartile Range
  • MINITAB Solution The following shows the
    descriptive statistics output.

Interquartile range Q3 Q1 9 5.75 3.25.
19
3-2 The Interquartile Range
  • TI-83 Solution The following shows the
    descriptive statistics output.

Interquartile range Q3 Q1 9 6 3.
Note A slight difference in the answers. TI-83
rounded Q1 to 6.
20
3-3 The Mean Absolute Deviation
  • The mean absolute deviation utilizes deviations
    of the data values from the mean.
  • Explanation of the term - Mean Absolute Deviation
    (MAD) The mean absolute deviation is the average
    of the absolute deviations from the mean of the
    data set.

21
3-3 The Mean Absolute Deviation
  • The MAD is computed using the following formula.
  • The formula says that you
  • subtract the sample mean from each data
  • value
  • take the absolute values of the results
  • add the absolute values together
  • divide by the sample size

22
3-3 The Mean Absolute Deviation
  • Example What is the MAD for the following sample
    values?
  • 3 8 6 12 0 -4 10
  • Solution First of all, the sample mean 5
    (Verify).
  • The table on the next slide shows the
    computations

23
3-3 The Mean Absolute Deviation
MAD 32/7 4.57
24
3-3 The Mean Absolute Deviation
Question What does the MAD measure? The MAD
measures the average (absolute) distance of the
sample values from the mean of the data values.
25
3-3 The Mean Absolute Deviation
The deviations contribute to the total in
proportion to the size of the deviation.
The average distance of the sample values from
the mean is 4.57.
26
Quick Tip
  • If data set A has a larger MAD than data set B,
    then it is reasonable to believe that the values
    in data set A are more spread out (variable) than
    the values in data set B.
  • The MAD is sensitive to values that are very
    small or very large relative to the rest of the
    data set.

27
3-4 The Variance and Standard Deviation
  • The variance and standard deviation are the
    most common and useful measures of variability.
  • These two measures provide information about how
    the data vary about the mean.

28
3-4 The Variance and Standard Deviation
When the data are clustered about the mean, the
variance and standard deviation will be somewhat
small.
29
3-4 The Variance and Standard Deviation
When the data are widely scattered about the
mean, the variance and standard deviation will be
somewhat large.
30

3-4 The Variance and Standard Deviation
  • Explanation of the term sample variance
    the sample variance is an approximate average of
    the squared deviations of the data values from
    the sample mean.
  • The sample variance is computed from the
    following formula and is denoted by s2

31

3-4 The Variance and Standard Deviation
  • Example What is the variance for the following
    sample values?
  • 3 8 6 14 0 11
  • NOTE Do not let the formula intimidate you. We
    will build a table to help with the computations.

32

3-4 The Variance and Standard Deviation
  • We will build a table to help in the
    computations. NOTE The mean 7.

S2 132/(6 1) 132/5 26.4
33

3-4 The Variance and Standard Deviation
  • In the previous example, observe that the
    variance is large relative to the size of the
    data values.
  • This can be observed from the plot which shows
    that the data values are very much spread out
    about the mean value of 7.

34

3-4 The Variance and Standard Deviation
  • Explanation of the term sample standard
    deviation the sample standard deviation is the
    positive square root of the variance.
  • NOTE the standard deviation has the same unit
    as the variable.
  • Example The sample standard deviation for the
    previous example is

35
Quick Tips
  • If all of the observations have the same value,
    the sample variance (standard deviation) will be
    zero. That is, there is no variability in the
    data set.
  • The variance (standard deviation) is influenced
    by outliers in the data set.
  • The unit for the standard deviation is the same
    as that for the raw data.
  • Thus it is preferred to use the standard
    deviation rather than the variance as the measure
    of variability.

36

3-4 The Variance and Standard Deviation
  • Explanation of the term population variance
    the population variance is the average of the
    squared deviations of the data values from the
    population mean.
  • The population variance is computed from the
    following formula and is denoted by s2

37

3-4 The Variance and Standard Deviation
  • Explanation of the term population standard
    deviation the population standard deviation is
    the positive square root of the population
    variance.
  • The population standard deviation is computed
    from the following formula and is denoted by s

38

3-5 The Coefficient of Variation
  • The coefficient of variation (CV) allows us to
    compare the variation of two (or more) different
    variables.
  • Explanation of the term sample coefficient of
    variation the sample coefficient of variation is
    defined as the sample standard deviation divided
    by the sample mean of the data set.
  • Usually, the result is expressed as a
    percentage.

39

3-5 The Coefficient of Variation
NOTE The sample coefficient of variation
standardizes the variation by dividing it by
the sample mean.
40

3-5 The Coefficient of Variation
  • The coefficient of variation has no units since
    the standard deviation and the mean have the same
    units, and thus cancel out each other.
  • Because of this property, we can use this
    measure to compare the variations for different
    variables with different units.

41

3-5 The Coefficient of Variation
  • Example The mean number of parking tickets
    issued in a neighborhood over a four-month period
    was 90, and the standard deviation was 5. The
    average revenue generated from the tickets was
    5,400, and the standard deviation was 775.
    Compare the variations of the two variables.
  • Solution is on the next slide.

42

3-5 The Coefficient of Variation
  • Solution

Since the CV is larger for the revenues, there is
more variability in the recorded revenues than
in the number of tickets issued.
43

3-5 The Coefficient of Variation
  • Explanation of the term population coefficient
    of variation the population coefficient of
    variation is defined as the population standard
    deviation divided by the population mean of the
    data set.
  • NOTE The population CV has the same properties
    as the sample CV.

44

3-6 The Empirical Rule
  • Knowing the value of the mean and the value of
    the standard deviation for a data set can provide
    a great deal of information about the data set.
  • In particular, if the data set has a single
    mound and is symmetrical (bell-shaped), then
    one can generalize some propertied of the
    distribution.
  • One such generalization is called the Empirical
    Rule.

45

3-6 The Empirical Rule
  • The Empirical Rule gives some general statements
    relating the mean and the standard deviation of a
    bell-shaped distribution.
  • It relates the mean to one, two, and three
    standard deviations.

46

3-6 Empirical Rule
  • One Sigma Rule Approximately 68 of the data
    values will lie within one standard deviation
    from the mean.
  • That is, one can expect a deviation of more than
    one sigma from the mean to occur once in every
    three observations.
  • This true because approximately 33
    (approximately 1/3) of the values are outside one
    standard deviation from the mean

47

3-6 Empirical Rule - One Sigma Rule
Graphical Display of the One Sigma Rule
48

3-6 Empirical Rule
  • Two Sigma Rule Approximately 95 of the data
    values will lie within two standard deviations
    from the mean.
  • That is, one can expect a deviation of more than
    two sigma from the mean to occur once in every
    twenty observations.
  • This true because approximately 5 (1/20) of the
    values are outside two standard deviations from
    the mean

49

3-6 Empirical Rule - Two Sigma Rule
Graphical Display of the Two Sigma Rule
50

3-6 Empirical Rule
  • Three Sigma Rule Approximately 99.7 of the
    data values will lie within three standard
    deviations from the mean.
  • That is, one can expect a deviation of more than
    three sigma from the mean to occur once in every
    333 observations.
  • This true because approximately 0.3 (1/333) of
    the values are outside three standard deviations
    from the mean

51

3-6 Empirical Rule - Three Sigma Rule
Graphical Display of the Three Sigma Rule
Write a Comment
User Comments (0)
About PowerShow.com