Title: Measures of Variation
1Measures of Variation
As well as the Central Tendency of the data in a
population or sample a second important
characteristic of the data is it variability
about some center.
- Measures of Variation include
- The range
- The Variance
- The Standard Deviation
- The Mean Absolute Deviation
2Measures of Variation
Standard Deviation of a Population
We will label the population variance to be
s2 And define s2 Si(xi µ)2/N
Where µ is the population mean N is the size
of the population Si(xi µ)2 is the sum of the
squares of the difference between each item in
the population and the mean.
3Measures of Variation
Suppose a student receives the following quiz
grades
82, 68, 74, 86, 90, 88, 62, 75, 80, 55
For this student, these grades are the total
population of her scores that are used to
calculate her mean or average grade. We obtain
µ (82 68 74 86 90 88 62 75 80
55)/10 760/10 76
The mean of this population is 76
4Measures of Variation
Having obtained the mean, we can now calculate
the variance
82, 68, 74, 86, 90, 88, 62, 75, 80, 55 and µ
76
s2 Si(xi µ)2/N (82-76)2 (68-76)2
(74-76)2 (86-76)2 (90-76)2
(88-76)2 (62-76)2 (75-76)2 (80-76)2
(55-76)2 /10 (36 64 4 100 196 144
196 1 16 441)/10 119.8
5Measures of Variation
We find the standard deviation in this population
data by taking the square root of the variance.
s2 Si(xi µ)2/N 119.8
s (119.8)½ 10.94
If we display the data on a dot plot, we can
visualize the use of the standard deviation as a
measure of variation in the data
82, 68, 74, 86, 90, 88, 62, 75, 80, 55
x
x
x
x
x
x
x
x
x
x
55 60 65
70 75 80
85 90 95
100
Mean 76
6Measures of Variation
Chebyshevs Theorem
The proportion of any set of data lying within K
standard deviations of the mean is always at
least 1 1/K2, for all K greater than or equal
to 2.
Chebyshevs Inequality tells us that in any
statistical distribution at least ¾ of the values
will lie within 2 standard deviations of the
mean, and at least 8/9 of all values will lie
within 3 standard deviations of the mean.
In the previous example we found µ 76 and s
10.94 µ - 2s 76 2(10.94) 54.12 µ 2s 76
2(10.94) 97.88
We find that 100 of the values lie within 2s of
the mean
7Measures of Variation
The Sample Standard Deviation
The standard deviation of a sample is denoted by
the letter s. The sample standard deviation is
an estimate of the population standard deviation s
_ s2 Si(xi x)2/(n 1)
Where x bar in the previous formula denotes the
sample mean. The sample standard deviation is
obtained by taking the square root of the
variance.
Note! To calculate the sample variance we divide
by the number of degrees of freedom (n 1)
instead of the sample size n. We have already
calculated the sample mean when we use the same
sample data to obtain a second statistic. Only
n-1 of those values are considered free the nth
value is fixed since the sum must equal n times
the mean.
8Measures of Variation
The formula for the standard deviation can be
transformed into a form that slightly simplifies
the computation.
s (n Si(xi)2 (Sixi)2)/n(n 1))½
On first sight it is not clear that we have
simplified the calculation, but if we assume that
the previous 10 grades were a sample taken from a
larger number of students enrolled in a course,
then we will illustrate how the two formula are
used to calculate the standard deviation.
9Measures of Variation
Using the original formula and treating the
previous data a sample data with a mean of 76 we
get
_ s (Si(xi x)2/(n 1))½
82, 68, 74, 86, 90, 88, 62, 75, 80, 55
s (((82-76)2 (68-76)2 (74-76)2 (86-76)2
(90-76)2 (88-76)2 (62-76)2 (75-76)2
(80-76)2 (55-76)2)/(n-1))½ (1198/9)½
133.11½ 11.54
10Measures of Variation
To use the modified formula, we first construct
the following table
82, 68, 74, 86, 90, 88, 62, 75, 80, 55 n 10
x x2 82 6724 68 4724 74
5476 86 7396 90 8100 88 7744 62
3844 75 5625 80 6400 55 3025 760
58958
s2 ((10)(58958)-7602)/(10)(9)
(589580-577600)/(10)(9) 133.11
s 133.11½ 11.54
In this second method we find the total of the
sample items and the total of the square of each
of these items.
11Measures of Variation
Finding the standard deviation for tabulated or
weighted data
Recall the table we constructed for finding the
mean of a sample of September temperature
readings in the Central Tendency lecture notes.
Class Midpoint (x) Total (f) fx
x2 fx2
64.5 - 69 .5 67 6 402
4489 26934 69.5 74.5 72 11 792
5184 57024 74.5 79.5 77 20
1540 5929 118580 79.5 84.5
82 13 1066 6724 87412 84.5
89.5 87 9 783
7569 68121 89.5 94.5 92 1
92 8464 8464 60 4675
366535
We have augmented the previous table by adding
two additional columns that will be used for
calculating the sample standard deviation of
these grouped data.
12Measures of Variation
The formula for obtaining the standard deviation
of weighted or tabulated data is
s (n Si(fi xi2) (Si fi xi)2)/n(n 1))½
From the previous table we have
nSi(fi xi2) (60)(366535) 21992100 (Si fi
xi)2 (4675)2 21855625
s ((21992100 21855625)/(60)(59))½ 38.55½
6.21
13Measures of Variation
We construct an ogive from the previous table
frequency
60 55 50 45 40 35 30 25 20 15 10 5 0
Mean 79.183
s 6.21
x
x
2s 12.42
x
x
x
x
x
64.5 69.5 74.5 79.5
84.5 89.5 94.5
Temperature
14Measures of Variation
The Normal Distribution
- Continuous
- Symmetric
- Mean Median Mode (all the same value)
mean
s ? 68 of values
15Measures of Variation
Other measures of variation
Using the range to estimate the standard
deviation s range/4
On an earlier slide we found for a population of
student grades
82, 68, 74, 86, 90, 88, 62, 75, 80, 55
µ 76 and s 10.94
The range of this population 90 55 35 This
gives us an estimate of s 35/4 8.75
In the tabulated data for the temp readings we
have range 92 65 27 ? s 27/4 6.15 which
agrees fairly well with the calculated value of s
6.21
16Measures of Variation
The Coefficient of Variation (CV)
Define For either a population or a sample the
Coefficient of Variation is defined to be the
ratio of the standard deviation over the mean
CV s/ x for a sample Where x denotes x bar
the sample mean
CV s / µ for a population
The CV for the population of grades from the
previous page CV 10.94/76 0.144
17Part 2
Measures of Relative Standing
18Relative Standing
A z score is the number of standard deviations
that a raw score, x, is above or below the mean.
A raw score x taken from a population is
converted to a standardized z score by the formula
z (x µ)/s
In a sample the z score of a value x is given by
z (x x)/s where x denotes the sample mean
19Relative Standing
Percentiles
percentile of value x ((number of values lt x)/
total number of values)100 (round the result to
the nearest whole number
Suppose that in a class of 25 people we have the
following averages (ordered in ascending order)
42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74,
77, 78, 78, 79, 80, 81, 84, 85, 87, 89, 91, 94,
98
If you received a 77, what percentile are you?
percentile of 77 (12/25)100 48
20Relative Standing
Quartiles
Instead of finding the percentile of a single
data value as we did on the previous page, it is
often useful to group the data into 4, or more,
(nearly) equal groups. When grouping the data
into four equal groupings, we call these
groupings quartiles.
Let n number of items in the data set k
percent desired (ex. k 25) L locator ? the
value separating the first k percent of the
data from the rest
L (k/100) n
21Relative Standing
Lets separate the 25 class grades into four
quartiles.
- Step 1 order the data in ascending order
42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74,
77, 78, 78, 79, 80, 81, 84, 85, 87, 89, 91, 94,
98
Now find the 3 locators L25, L50, L75,
Round fraction part up to the next integer
L25 (25/100) 25 6.25 L50 (50/100) 25
12.5 L75 (75/100) 25 18.75
22Relative Standing
- Other measures of relative standing include
- Interquartile range (IQR) Q3 - Q1
- Semi-interquartile range (Q3 - Q1)/ 2
- Midquartile (Q3 Q1)/2
- 10 90 percentile range P90 - P10
For the data on the previous page we have
IQR 84 70 16 Semi IQR (84 70)/2
8 Midquartile (84 70)/2 77
23Box Diagram
Recall the ordered high temperature readings from
an previous lecture
65, 67, 68, 68, 69, 69, 71, 71, 71, 72, 72, 72,
73, 73, 73, 74, 74, 75, 75, 75, 75, 76, 76, 77,
77, 77, 77, 77, 77, 78, 78, 78, 78, 79, 79, 79,
79, 80, 81, 81, 81, 81, 81, 81, 81, 81, 82, 82,
83, 84, 85, 85, 85, 86, 86, 87, 87, 88, 89, 92
To construct a box diagram to illustrate the
extent to which the extreme data values lie
beyond the interquartile range, draw a line with
the low and high value highlighted at the two
ends. Mark the gradations between these two
extremes, then locate the quartile boundaries Q1,
Med., and Q3 on this line. Construct a box about
these values.
Q1 (73 74)/2 73.5
Q1 M Q3
65