Title: Measures of Spread, Variability or Dispersion
1Unit 6
- Measures of Spread, Variability or Dispersion
2Measures of Position
- The median is a measure of position - it marks
the midpoint or 50th percentile of the data - Other important benchmarks are the 25th and 75th
percentile which isolate the middle 50 of the
data - Other measures of position include other
percentiles such as the 10th, 80th, etc.
3Quartiles
- The 25th percentile is referred to as the first
quartile or symbolically Q1 - The 75th percentile is referred to as the third
quartile or symbolically Q3 - Sometimes the median is referred to as the second
quartile or Q2
4How to Find the Quartiles
- The weight loss (in pounds) for 17 members of a
health club three months after joining are - 5 8 10 7 2 6 3 9 4 11 7 5 9 4 6 11 5
- Draw the stem-and-leaf graph for the data
- Find the median as well as Q1 and Q3
5Stem-and Leaf Graph for Weight Loss Data
- Prototype 0 5 5
- 0 2 3 4 4
- 0 5 5 5 6 6 7 7 8 9 9
- 1 0 1 1
- median 6
- Q1 (4 5)/2 4.5
- Q3 (9 9)/2 9
6Boxplot for Weight Loss Data
7Measures of Spread or Variability
- The interquartile range for the weight loss data
is 9 - 4.5 4.5 pounds - The spread of the middle 50 of the data is 4.5
pounds - The range of the weight loss data is
- 11 - 2 9 pounds
8IQR vs Range
- The IQR is more stable (varies less from sample
to sample). Since it only focuses on the middle
50 it is not effected by outliers - The range is less stable since its value can be
very much impacted by outliers.
9Variability about the Mean
- The most important measure of variability is
the standard deviation which measures the spread
of the data about the mean. - The next 2 slides illustrate how to find the
standard deviation and the related statistics the
variance.
10Deviations about the Mean
- To measure how much a data value, x, deviates
from the mean, , we compute - ( x - ) for each data value, x
- Next square each of these deviations to make them
positive - so positive and negative deviations
will not cancel each other - Finally, add all such squared deviations to find
the total variability about the mean
11Example
- Average monthly temperatures for San Francisco,
CA and Raleigh, NC - Raleigh San Francisco
- Jan 39 49
- Feb 42 52
- Mar 50 53
- Apr 59 56
- May 67 58
- Jun 74 62
- Jul 78 63
- Aug 77 64
- Sep 71 65
- Oct 60 61
- Nov 51 55
- Dec 43 49
12Deviations from the MeanRaleigh NC Data
13The Sample Variance
- The sample variance is the sum of the squared
deviations from the mean divided by (n - 1) and
is symbolized - We divide by (n - 1) to get the average (not
total) spread of the data about the mean - The units of the variance are the square of the
units of the data itself - In our example the units of the variance are
degrees Fahrenheit squared.
14The Sample Standard Deviation
- The sample standard deviation is the square root
of the variance and is symbolized, s - The sample standard deviation has the same units
as the original data - The sample standard deviation is the most stable
measure of variability (varies least from sample
to sample)
15The Calculations!
16Calculate and Graph the Following
- Five number summary for the San Francisco data
- Five number summary for the Raleigh data
- Boxplots (on the same graph) of both data sets
- Comment on the median temperature for both cities
- What do you conclude about the variability of the
temperatures in both cities?
17Five Number Summary
- Raleigh
- Min 39, Q1 44.75, Med 59.5
Q3 73.25, Max 78, IQR 28.5 - San Francisco
- Min 49, Q1 52.25, Med 57
Q3 62.75, Max 65, IQR 10.5
18Boxplot
19Calculate the Following
- The mean temperature for both cities
- Comment on the mean temperatures
- The standard deviation for the San Francisco data
- What do you conclude about the variability in
temperature in these two cities? - Where would you rather live and why?
20Means and Standard Deviations
- Raleigh
- Mean 59.25 degrees
- Standard Deviation 14.17 degrees
- San Francisco
- Mean 57.25 degrees
- Standard Deviation 5.75 degrees
21Visualizing the Standard Deviation
- Since the standard deviation measures how
spread out the data is about the sample mean it
is often useful to quote the percent of
observations within one, two and three standard
deviations from the mean
22Interpreting the Standard Deviation
- Data represents the number of weeks it took
100 graduating seniors to find their first full
time job - 2 2 4 4 5 5 5 6 6 6
6 6 7 7 - 7 7 7 8 8 8 8 8 8
8 8 9 9 9 - 9 9 9 9 10 10 10 10 10
10 10 10 10 10 - 10 10 10 11 11 11 11 11 11
11 11 12 12 12 - 12 12 12 12 12 12 12 13 13
13 13 13 13 13 - 13 13 13 13 14 14 14 14 14
14 14 14 14 14 - 15 16 16 16 17 17 17 17 17
18 7 9 10 12 - 13 15
23Mean 10.73 St. Dev. 3.45Mean - St. Dev
7.28Mean St. Dev 14.1871/100 71 data
within one standard deviation of the mean
24Mean - 2(St. Dev.) 3.03Mean 2(St. Dev.)
17.6397/100 97 data within 2 standard
deviations of the mean
25Empirical Rule
- Sample data that approximately satisfies the
following conditions is said to follow the
Empirical Rule - Has a symmetrical bell shaped distribution
- 68 of the data is within one standard deviation
of the mean - 95 of the data is within teo standard deviations
of the mean