Title: Summary Statistics: Mean, Median, Standard Deviation, and More
1Summary Statistics Mean, Median, Standard
Deviation, and More
- Seek simplicity and then distrust it.
- (Dr. Monticino)
2Assignment Sheet
- Read Chapter 4
- Homework 3 Due Wednesday Feb. 9th
- Chapter 4
- exercise set A 1 -6, 8, 9
- exercise set C 1, 2, 3
- exercise set D 1 - 4, 8,
- exercise set E 4, 5, 7, 8, 11, 12
- Quiz 2 will be over Chapter 2
- Quiz 3 on basic summary statistic calculations
mean, median, standard deviation, IQR, SD units - If youd like a copy of notes - email me
3Overview
- Measures of central tendency
- Mean (average)
- Median
- Outliers
- Measures of dispersion
- Standard deviation
- Standard deviation units
- Range
- IQR
- Review and applications
4Central Tendency
- Measures of central tendency - mean and median -
are useful in obtaining a single number summary
of a data set - Mean is the arithmetic average
- Median is a value such that at least 50 of the
data is less and at least 50 is greater
5Example
- Calculate mean and median for following data sets
37 44 55 78 100 111 125 151 161 37 44 55 69 90
120 125 152 157 161
6Outliers and Robustness
- Mean can be sensitive to outliers in data set
- Not robust to data collection errors or a single
unusual measurement - Blind calculation can give misleading results
mean 170.35
median 151
7Outliers and Robustness
- Always a good idea to plot data in the order that
it was collected - Spot outliers
- Identify possible data collection errors
mean without outliers 150.14
median without outliers 149
8Outliers and Robustness
- Median can be a more robust measure of central
tendency than mean - Life expectancy
- U.S. males mean 80.1, median 83
- U.S. females mean 84.3, median 87
- Household income
- Mean 51,855, median 38,885
- .3 account for 12 of income
- Net worth
- Mean 282,500, median 71,600
9Which Central Tendency Measure?
- Calculate mean, median and mode
- Plot data
- Create histogram to inspect mode(s)
- Do not delete data points
- If analyze data without outliers, report and
explain outliers - Many statistical studies involve studying the
difference between population means - Reporting the mean may be dictated by objective
of study
10Which Central Tendency Measure?
- If data is
- Unimodal
- Fairly symmetric
- Mean is approximately equal to median
- Then mean is a reasonable measure of central
tendency
11Which Central Tendency Measure?
- If data is
- Unimodal
- Asymmetric
- Then report both median and mean
- Difference between mean and median indicates
asymmetry - Median will usually be the more reasonable
summary of central tendency
12Which Central Tendency Measure?
- If data is
- Not unimodal
- Then report modes and cautiously mean and median
- Analyze data for differences in groups around the
modes
13Limitations of Central Tendency
- Any single number summary may not adequately
represent data and may hide differences between
data sets - Example
14Measures of Dispersion
- Including an additional statistic - a measure of
dispersion - can help distinguish between data
sets which have similar central tendencies - Range max - min
- Standard deviation root mean square difference
from the mean
15Measures of Dispersion
16Measures of Dispersion
- Examples
- Standard deviation
17Measures of Dispersion
- Both range and standard deviation can be
sensitive to outliers - However, many data sets can be characterized by
mean and SD - If the values of the data set are distributed in
an approximately bell shape, the - 68 of the data will be within 1 SD unit of
mean, 95 will be within 2 SD units and nearly
all will be within 3 SD units
18Measures of Dispersion
- Example
- Suppose data set has mean 35 and SD 7
- How many SD units away from the mean is 42?
- How many SD units away from the mean is 38?
- How many SD units away from the mean is 30?
- Assuming bell shape distribution, 95 are
between what two values?
19Measures of Dispersion
- A robust measure of dispersion is the
interquartile range - Q1 value such that 25 of data less than, and
75 greater than - Q3 value such that 75 less than, and 25
greater than - IQR Q3 - Q1
20Example
- Calculate range, standard deviation and
interquartile range for the following data sets
1 98 99 100 100 100 102 102 104 107 95 98 99
100 100 100 102 102 104 107
21Assignment, Discussion, Evaluation
- Read Chapter 4
- Discussion problems
- Chapter 4
- exercise set A 1 -6, 8, 9
- exercise set C 1, 2, 3
- exercise set D 1 - 4, 8,
- exercise set E 4, 5, 7, 8, 11, 12
- Quiz 3 on basic summary statistic calculations
mean, median, standard deviation, IQR, SD units
22Review of Definitions
- Measures of central tendency
- Mean (average)
- Median
- If odd number of data points, middle value
- If even number of data points, average of two
middle values
23Question and Examples
- Can mean be larger than median? Can median be
larger than mean? - Give examples
- Can mean be a negative number? Can the median?
- The average height of three men is 69 inches.
Two other men enter the room of heights 73 and 70
inches. What is the average height of all five
men?
24Questions and Examples
- The average of a data set is 30.
- A value of 8 is added to each element in the data
set. What is the new average? - Each element of the data set is increased by 5.
What is the new average? - Suppose that data consists of only 1s and 0s
- What does the average represent?
- Application an experiment is performed and only
two outcomes can occur - Label one type of outcome 1 and the other 0
- For the data set 31, 45, 72, 86, 62, 78, 50, find
the median, Q1 (25th percentile) and Q3 (75th
percentile)
25Review of Definitions
- Measures of dispersion
- Standard deviation
- Range max - min
- IQR Q3 - Q1
26Questions and Examples
- Can the SD be negative? Can the range? Can the
IQR? - Can the SD equal 0?
- For the data set 3,1,5,2,1,6 find the SD, range
and IQR - The average weight for U.S. men is 175 lbs and
the standard deviation is 20 lbs - If a man weighs 190 lbs., how many standard
deviation units away from the mean weight is he? - Assuming a normal (bell-shaped) distribution for
weight, ninety-five percent of U.S. men weigh
between what two values?
27Questions and Examples
- The average of a data set is 23 and the standard
deviation is 5 - A value of 8 is added to each element in the data
set. What is the new standard deviation? - Each element of the data set is increased by 5.
What is the new standard deviation? Â - (Dr. Monticino)