Title: STATISTICS for the Utterly Confused, 2nd ed. SLIDES PREPARED
1STATISTICS for the Utterly Confused, 2nd ed.
- SLIDES PREPARED
- By
- Lloyd R. Jaisingh Ph.D.
- Morehead State University
- Morehead KY
2Chapter 3
- Data Description Numerical Measures of
Variability for Ungrouped Univariate Data
3Outline
- Do I Need to Read This Chapter?
- 3-1 The Range
- 3-2 The Interquartile Range
- 3-3 The Mean Absolute Deviation
- 3-4 The Variance and Standard Deviation
- 3-5 The Coefficient of Variation
- 3-6 The Empirical Rule
- Its a Wrap
4Objectives
- Introduction of some basic statistical
measurements of spread or variability. - How to compute these measures and investigate
some of their properties.
5Introduction
- A measure of variability for a collection of data
values is a number that is meant to convey the
idea of spread for the data set. - The most commonly used measures of variability
for sample data are the - range
- interquartile range
- mean absolute deviation
- variance or standard deviation
- coefficient of variation.
63-1 The Range
- Explanation of the term range The range is the
difference between the largest and smallest
values in the data set. - NOTE The explanation is true for a sample as
well as a finite population of values.
73-1 The Range
- Example What is the range for the following
sample values? - 3 8 6 14 0 4 0 12 7 0 -10
- Solution First we should arrange from smallest
to largest. - -10 -7 0 0 0 3 6 8 12 14
- Range 14 (-10) 24
83-1 The Range
- Question Why does subtracting the smallest value
from the largest value is a measure of spread? - The next slide shows the plot of the data set.
- Observe that the range measures the distance
between the smallest and largest values. - This distance gives a measurement of the spread
of the data.
93-1 The Range
Range gives a measurement of spread.
10Quick Tip
- The range does not use the concept of of
deviations. - It is affected by outliers (large or small values
relative to the rest of the data set). - The range does not utilize all the information in
the data set only the largest and smallest
values. - Thus it is not a very useful measure of spread or
variation.
11The Range -- Example
- Example What is the range for the following
sample values? - 9 995 1000 1002 1014
-
- Solution Range 1014 9 1005
- Here the range is significantly affected by the
outlying value of 9.
123-2 The Interquartile Range
- Explanation of the term interquartile range
The interquartile range measures the spread of
the middle 50 of an ordered data set. - NOTE The interquartile range is obtained using
the following steps- - Step1 Order the data set from smallest to
largest. - Step 2 Find the median for the ordered set.
Denote by Q2.
133-2 The Interquartile Range
- Step 3 Find the median for the first 50 of the
ordered set. The median found in Step 2 is not
included in this portion of the data. Denote by
Q1. - Step 4 Find the median for the second 50 of the
ordered set. The median found in Step 2 is not
included in this portion of the data. Denote by
Q3. -
143-2 The Interquartile Range
- Step 5 The interquartile range is computed from
the following
153-2 The Interquartile Range
- The following depicts the idea of the
interquartile range.
163-2 The Interquartile Range
- Example The following scores for a statistics
10-point quiz were reported. What is the value
of the interquartile range? - 7 8 9 6 8 0 9 9 9
- 0 0 7 10 9 8 5 7 9
173-2 The Interquartile Range
- Solution With the availability of technology, it
makes it easy to compute the interquartile range.
- We will present information from the MINITAB
software and the TI-83 calculator to help compute
the interquartile range. -
183-2 The Interquartile Range
- MINITAB Solution The following shows the
descriptive statistics output. -
Interquartile range Q3 Q1 9 5.75 3.25.
193-2 The Interquartile Range
- TI-83 Solution The following shows the
descriptive statistics output. -
Interquartile range Q3 Q1 9 6 3.
Note A slight difference in the answers. TI-83
rounded Q1 to 6.
203-3 The Mean Absolute Deviation
- The mean absolute deviation utilizes deviations
of the data values from the mean. - Explanation of the term - Mean Absolute Deviation
(MAD) The mean absolute deviation is the average
of the absolute deviations from the mean of the
data set. -
213-3 The Mean Absolute Deviation
- The MAD is computed using the following formula.
- The formula says that you
- subtract the sample mean from each data
- value
- take the absolute values of the results
- add the absolute values together
- divide by the sample size
223-3 The Mean Absolute Deviation
- Example What is the MAD for the following sample
values? - 3 8 6 12 0 -4 10
- Solution First of all, the sample mean 5
(Verify). - The table on the next slide shows the
computations
233-3 The Mean Absolute Deviation
MAD 32/7 4.57
243-3 The Mean Absolute Deviation
Question What does the MAD measure? The MAD
measures the average (absolute) distance of the
sample values from the mean of the data values.
253-3 The Mean Absolute Deviation
The deviations contribute to the total in
proportion to the size of the deviation.
The average distance of the sample values from
the mean is 4.57.
26Quick Tip
- If data set A has a larger MAD than data set B,
then it is reasonable to believe that the values
in data set A are more spread out (variable) than
the values in data set B. - The MAD is sensitive to values that are very
small or very large relative to the rest of the
data set.
273-4 The Variance and Standard Deviation
- The variance and standard deviation are the
most common and useful measures of variability. - These two measures provide information about how
the data vary about the mean.
283-4 The Variance and Standard Deviation
When the data are clustered about the mean, the
variance and standard deviation will be somewhat
small.
293-4 The Variance and Standard Deviation
When the data are widely scattered about the
mean, the variance and standard deviation will be
somewhat large.
303-4 The Variance and Standard Deviation
- Explanation of the term sample variance
the sample variance is an approximate average of
the squared deviations of the data values from
the sample mean. - The sample variance is computed from the
following formula and is denoted by s2
313-4 The Variance and Standard Deviation
- Example What is the variance for the following
sample values? - 3 8 6 14 0 11
- NOTE Do not let the formula intimidate you. We
will build a table to help with the computations.
323-4 The Variance and Standard Deviation
- We will build a table to help in the
computations. NOTE The mean 7. -
S2 132/(6 1) 132/5 26.4
333-4 The Variance and Standard Deviation
- In the previous example, observe that the
variance is large relative to the size of the
data values. - This can be observed from the plot which shows
that the data values are very much spread out
about the mean value of 7.
343-4 The Variance and Standard Deviation
- Explanation of the term sample standard
deviation the sample standard deviation is the
positive square root of the variance. - NOTE the standard deviation has the same unit
as the variable. - Example The sample standard deviation for the
previous example is
35Quick Tips
- If all of the observations have the same value,
the sample variance (standard deviation) will be
zero. That is, there is no variability in the
data set. - The variance (standard deviation) is influenced
by outliers in the data set. - The unit for the standard deviation is the same
as that for the raw data. - Thus it is preferred to use the standard
deviation rather than the variance as the measure
of variability.
363-4 The Variance and Standard Deviation
- Explanation of the term population variance
the population variance is the average of the
squared deviations of the data values from the
population mean. - The population variance is computed from the
following formula and is denoted by s2
373-4 The Variance and Standard Deviation
- Explanation of the term population standard
deviation the population standard deviation is
the positive square root of the population
variance. - The population standard deviation is computed
from the following formula and is denoted by s
383-5 The Coefficient of Variation
- The coefficient of variation (CV) allows us to
compare the variation of two (or more) different
variables. - Explanation of the term sample coefficient of
variation the sample coefficient of variation is
defined as the sample standard deviation divided
by the sample mean of the data set. - Usually, the result is expressed as a
percentage.
393-5 The Coefficient of Variation
NOTE The sample coefficient of variation
standardizes the variation by dividing it by
the sample mean.
403-5 The Coefficient of Variation
- The coefficient of variation has no units since
the standard deviation and the mean have the same
units, and thus cancel out each other. - Because of this property, we can use this
measure to compare the variations for different
variables with different units.
413-5 The Coefficient of Variation
- Example The mean number of parking tickets
issued in a neighborhood over a four-month period
was 90, and the standard deviation was 5. The
average revenue generated from the tickets was
5,400, and the standard deviation was 775.
Compare the variations of the two variables. - Solution is on the next slide.
423-5 The Coefficient of Variation
Since the CV is larger for the revenues, there is
more variability in the recorded revenues than
in the number of tickets issued.
433-5 The Coefficient of Variation
- Explanation of the term population coefficient
of variation the population coefficient of
variation is defined as the population standard
deviation divided by the population mean of the
data set. - NOTE The population CV has the same properties
as the sample CV.
443-6 The Empirical Rule
- Knowing the value of the mean and the value of
the standard deviation for a data set can provide
a great deal of information about the data set. - In particular, if the data set has a single
mound and is symmetrical (bell-shaped), then
one can generalize some propertied of the
distribution. - One such generalization is called the Empirical
Rule.
453-6 The Empirical Rule
- The Empirical Rule gives some general statements
relating the mean and the standard deviation of a
bell-shaped distribution. - It relates the mean to one, two, and three
standard deviations.
463-6 Empirical Rule
- One Sigma Rule Approximately 68 of the data
values will lie within one standard deviation
from the mean. - That is, one can expect a deviation of more than
one sigma from the mean to occur once in every
three observations. - This true because approximately 33
(approximately 1/3) of the values are outside one
standard deviation from the mean
473-6 Empirical Rule - One Sigma Rule
Graphical Display of the One Sigma Rule
483-6 Empirical Rule
- Two Sigma Rule Approximately 95 of the data
values will lie within two standard deviations
from the mean. - That is, one can expect a deviation of more than
two sigma from the mean to occur once in every
twenty observations. - This true because approximately 5 (1/20) of the
values are outside two standard deviations from
the mean
493-6 Empirical Rule - Two Sigma Rule
Graphical Display of the Two Sigma Rule
503-6 Empirical Rule
- Three Sigma Rule Approximately 99.7 of the
data values will lie within three standard
deviations from the mean. - That is, one can expect a deviation of more than
three sigma from the mean to occur once in every
333 observations. - This true because approximately 0.3 (1/333) of
the values are outside three standard deviations
from the mean
513-6 Empirical Rule - Three Sigma Rule
Graphical Display of the Three Sigma Rule