Title: Summarizing Data Numerically
1Chapter 5
- Summarizing Data Numerically
- http//www.fotosearch.com/
2Wendall Zurkowitz, slave to the waffle light.
3Three Things Wendall would like to know
- Will every waffle take the same amount of time to
cook? - What is the average amount of time to cook a
waffle - How much variability is there in the cooking time
of a waffle? - We cover the average in this section, variability
in the next.
4Will every waffle take the same amount of time to
cook?Two things Wendall would like to know
What is the average amount of time to cook and
how much variability is there in the cooking
time. We cover the average in this section,
variability in the next.
5How to Describe Data
- What is the Shape?
- What is the Center?
- What is the Spread in the Data?
- Are there any Outliers?
6Measurement of Center
- If we take a sample of n values and calculate
what we have come to know as the average we have
calculated the arithmetic mean of the data. - This measure of center is a statistic since it
comes from a sample.
7The Sample Mean
- The sample mean is a statistic. The purpose for
its existence is to estimate the parameter, the
population mean. - The sample mean is denoted by
8The Population Mean
- The population mean is a parameter. The
population mean is denoted by
9Example
- Lets find the sample mean of the AGE data. Well
do it two ways, the hard way and the easy way.
10TAI p302 Is the mean always the center?
- Suppose that a sample of 100 is obtained from a
population. - Can the mean be larger than the maximum value or
smaller than the minimum value? - Can the mean be the same as the max or min value?
- Can the mean be the exact middle point of the
distribution? - Can the mean not be equal to any of the data
collected?
11LDI 5.1 p303 A Mean is not Always Representative
- Kims quiz scores are 7, 98, 25, 19 and 26.
- Calculate Kims mean quiz score and explain why
it doesnt do a very good job of summarizing the
scores.
12LDI 5.2 p303 Combining Means
- We have seven students. The mean score for three
of these students is 54 and the mean score for
the four other students is 76. - What is the mean score for all seven students?
13The Median!
- The median of a set n observations, ordered from
smallest to largest, is a value such that at
least half of the observations are less than or
equal to that value and at least half of the
observations are greater than or equal to that
value.
14Find the Median of the AGE data
- The Hard way
- The Easy way
15- LDI 5.3 Median Number of Children per Household
- Find the median number of children in a
household from this sample of 10 households, that
is, find the median of - Observation Number 1 2 3 4 5
6 7 8 9 10 - Number of Children 2, 3, 0, 1, 4, 0,
3, 0, 1, 2 - (a) Order the observations from smallest to
largest - (b) Calculate (n1)/2 _________________
- (c) Median ______________
- What happens to the median if the fifth
observation in the first list was incorrectly
recorded as 40 instead of 4? - (e) What happens to the median if the third
observation in the first list was incorrectly
recorded as -20 instead of 0? - Note The median is resistantthat is, it does
not change, or changes very little, in response
to extreme observations.
16The Mode
- To find the middle or measure of center of
categorical (qualitative) data we are forced to
use the Mode. It can also be used with numerical
(quantitative) data, but it is not a good measure
of center. - The mode of a set of data is the most frequently
occurring value, the value with the highest
frequency.
17Example
- Find the mode for the following data
- (a) 1, 2, 3, 2, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6, 7
- (b) 1, 4, 3, 4, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6
18Consider the following data 2, 2, 2, 20, 34, 45,
210What are the mode, median, mean?
19(No Transcript)
20- Lets Do It! 5.5 Attend Graduate School? When do
undergraduates make the decision to continue
their education and attend graduate school? An
undergraduate attending a four-year college with
a semester system (versus a quarter system) would
have a total of eight semesters of classes
(excluding any summer sessions). A sample of 18
senior undergraduates who would be graduating and
attending graduate school were asked the
following question "In which semester 1, 2, 3,
4, 5, 6, 7, or 8 did you decide you would
continue your education and attend graduate
school?" The responses are given below
(a) Construct a frequency plot of these
data. (b) Obtain the following sample statistics
for these data. Minimum ___________ Maximum
______________ Median _____________ Mean
_____________ (c) How do the two measures of
center, the median and the mean, compare? Select
one i. Median gt Mean ii. Median lt
Mean iii. Median Mean
21LDI 5.6
- Is this distribution symmetric?
- What is the median?
- What is the mean?
22LDI 5.7 p310 Good vs. Poor measure of Center
- Draw a distribution for which.
- The mean would be a good measure of the center of
a distribution. - The mean would be a poor measure of the center of
a distribution. - The median would be a good measure of the center
of a distribution. - The median would be a poor measure of the center
of a distribution. - The mode would be a good measure of the center of
a distribution. - The mode would be a poor measure of the center of
a distribution.
23Measures of Variation
- Now that we can measure the center of a
distribution, we need to know something about the
spread or variability of the data. - There are (as with the average) several popular
ways of doing this measurement.
24Why Measure Variation?
- Consider the following plots
- They both have mean of 60, but are they the same
distribution?
25The Range
- Our first crude estimate of the variation of a
data set is the range which is simply max min. - Again, this measure is very limited in its
ability to describe the spread in a data set.
26Example
- Consider these distributions
- They have the same range of 30 20 or 10, yet
they have very different variation.
27Quartiles
- Recall that the median is the middle number of a
distribution. This means that 50 of the data
will fall below this value. We can chop the data
into four equal pieces by finding the median of
the lower 50 and the upper 50. These values are
called the Quartiles.
28Find the Quartiles for AGE
- Q1 is the first quartile, 25 of the data fall
below this value and 75 above it. - MED is the second quartile, 50 of the data fall
below this value and 50 above it. - Q3 is the third quartile, 75 of the data fall
below this value and 25 fall above it.
29InterQuartile Range
- The interquartile range or IQR is simply the
difference between Q3 and Q1 - IQR Q3Q1
- Find the IQR for the AGE data.
305-Number Summary and Boxplots
- The 5-number summary is simplyMinQ1MedQ3Max
- A Boxplot is a plot of these points.
31Lets Do It
321.5xIQR Rule
- Any value of the data that falls 1.5xIQR above Q3
or 1.5xIQR below Q1 is a considered an outlier. - Do modified boxplot of AGE data by hand
- Do boxplots on TI-83
33Lets Do It
- Page 320 LDI 5.9
- Page 320 LDI 5.10
- Page 321 LDI 5.11
- Page 325 LDI 5.12
- Careful Boxplots dont fully show the shape of
the distribution!
34Gismo Products
35I could have sworn you said eleven
steps.
36Standard Deviation
- We want a way to measure spread based upon the
mean. To do this we will find the average
distance from the mean of our data. Well,
actually we find the sum of the squared
deviations and then divide by n 1 and then take
the square root.
37Sample Standard Deviation Formula
- The TI-83 calculates sample standard deviation of
data.
38Population Standard Deviation
- The TI-83 calculates the population standard
deviation of data.
39Find the Stan. Dev.
- Lets do this small data set by hand1, 4, 2, 3,
9, 7, 2, 4, 5, 1, 8, 8, 7 - Lets verify our result on the TI-83
40Interpretation of SD
- The standard deviation is roughly the average
distance of the observations from the mean. The
more spread out the data are from the mean the
larger the standard deviation will be. - Since the standard deviation is a distance, it is
always a positive number that carries the same
units as the mean.
41Same Means (x 4) Different Standard Deviations
s 0
s 3.0
s 0.8
s 1.0
Frequency
Standard Deviation Increases as Data Gets More
Spread
42LDI 5.13, p329 Increasing Spread
- Consider the following three data sets.
- I 20 20 20
- II 18 20 22
- III 17 20 23
- (a) Which data set will have the smallest
standard deviation? - (b) Which data set will have the largest standard
deviation? - (c) Find the standard deviation for each data
set and - check your answers to (a) and (b).
43Which Distribution has a larger standard
deviation?
44LDI 5.14, p331 What Type of Distribution?
45- Lets Do It! 5.15 Standard Deviation for Age
- Use the ages of the subjects from your class.
- (a) Find the standard deviation for these data.
- (b) Complete the sentence
- On average, the ages of these subjects are about
_______ years from their mean of ____ years. - (c) How many of the 20 subjects had ages within
one standard deviation of the mean - (d) How many of the 20 subjects had ages within
two standard deviations of the mean?
46Linear Transformations
- Linear transformations of data can be used to
change the units of data. For example, you
collect a set of temperature data in Celsius - 40, 41, 39, 41, 41, 40, 38
- Find the mean and standard deviation for this
data.
47What about Fahrenheit?
- Recall how to convert from Celsius to
Fahrenheitconvert our data using this
formula then find the new mean and standard
deviation.
48LDI 5.16, p338 A Transformation
- Data on number of children for 10 households in a
neighborhood - 2, 3, 0, 2, 1, 0, 3, 0,
1, 4
49Linear Transformation Rules
- If X represents the original values, x is the
average of the original values, and sx is the
standard deviation of the original values, and if
the new values are a linear transformation of X,
YaXb, then the new mean is given by
and the new standard deviation by
50LDI 5.17 Standardization A special
transformation p340
- Lets perform a special transformation of the
original data on the number of children in a
household - 2, 3, 0, 2, 1, 0, 3, 0, 1, 4
51Important Transformation
- We want to be able to standardize data to the
same scale so we can compare data that might be
in differing units. For example, compare SAT and
ACT scores or IQ scores from differing age groups.
52The Z score
53Examples
- Standardize the AGE data
- What are the mean and standard deviation for
these transformed data? - Will this always happen? Why?
54Chapter 5 Summary p344
- In this chapter presented several different
measures of center and variability in order to
summarize data numerically. Standardization is a
useful transformation which will be used on all
data sets.