Summarizing Data Numerically - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Summarizing Data Numerically

Description:

Will every waffle take the same amount of time to cook? ... have ...sworn ... you ... said ... eleven ... steps.' Standard Deviation ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 55
Provided by: mikeh85
Category:

less

Transcript and Presenter's Notes

Title: Summarizing Data Numerically


1
Chapter 5
  • Summarizing Data Numerically
  • http//www.fotosearch.com/

2
Wendall Zurkowitz, slave to the waffle light.
3
Three Things Wendall would like to know
  • Will every waffle take the same amount of time to
    cook?
  • What is the average amount of time to cook a
    waffle
  • How much variability is there in the cooking time
    of a waffle?
  • We cover the average in this section, variability
    in the next.

4
Will every waffle take the same amount of time to
cook?Two things Wendall would like to know
What is the average amount of time to cook and
how much variability is there in the cooking
time. We cover the average in this section,
variability in the next.
5
How to Describe Data
  • What is the Shape?
  • What is the Center?
  • What is the Spread in the Data?
  • Are there any Outliers?

6
Measurement of Center
  • If we take a sample of n values and calculate
    what we have come to know as the average we have
    calculated the arithmetic mean of the data.
  • This measure of center is a statistic since it
    comes from a sample.

7
The Sample Mean
  • The sample mean is a statistic. The purpose for
    its existence is to estimate the parameter, the
    population mean.
  • The sample mean is denoted by

8
The Population Mean
  • The population mean is a parameter. The
    population mean is denoted by

9
Example
  • Lets find the sample mean of the AGE data. Well
    do it two ways, the hard way and the easy way.

10
TAI p302 Is the mean always the center?
  • Suppose that a sample of 100 is obtained from a
    population.
  • Can the mean be larger than the maximum value or
    smaller than the minimum value?
  • Can the mean be the same as the max or min value?
  • Can the mean be the exact middle point of the
    distribution?
  • Can the mean not be equal to any of the data
    collected?

11
LDI 5.1 p303 A Mean is not Always Representative
  • Kims quiz scores are 7, 98, 25, 19 and 26.
  • Calculate Kims mean quiz score and explain why
    it doesnt do a very good job of summarizing the
    scores.

12
LDI 5.2 p303 Combining Means
  • We have seven students. The mean score for three
    of these students is 54 and the mean score for
    the four other students is 76.
  • What is the mean score for all seven students?

13
The Median!
  • The median of a set n observations, ordered from
    smallest to largest, is a value such that at
    least half of the observations are less than or
    equal to that value and at least half of the
    observations are greater than or equal to that
    value.

14
Find the Median of the AGE data
  • The Hard way
  • The Easy way

15
  • LDI 5.3 Median Number of Children per Household
  • Find the median number of children in a
    household from this sample of 10 households, that
    is, find the median of
  • Observation Number 1 2 3 4 5
    6 7 8 9 10
  • Number of Children 2, 3, 0, 1, 4, 0,
    3, 0, 1, 2
  • (a) Order the observations from smallest to
    largest
  • (b) Calculate (n1)/2 _________________
  • (c) Median ______________
  • What happens to the median if the fifth
    observation in the first list was incorrectly
    recorded as 40 instead of 4?
  • (e) What happens to the median if the third
    observation in the first list was incorrectly
    recorded as -20 instead of 0?
  • Note The median is resistantthat is, it does
    not change, or changes very little, in response
    to extreme observations.

16
The Mode
  • To find the middle or measure of center of
    categorical (qualitative) data we are forced to
    use the Mode. It can also be used with numerical
    (quantitative) data, but it is not a good measure
    of center.
  • The mode of a set of data is the most frequently
    occurring value, the value with the highest
    frequency.

17
Example
  • Find the mode for the following data
  • (a) 1, 2, 3, 2, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6, 7
  • (b) 1, 4, 3, 4, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6

18
Consider the following data 2, 2, 2, 20, 34, 45,
210What are the mode, median, mean?
19
(No Transcript)
20
  • Lets Do It! 5.5 Attend Graduate School? When do
    undergraduates make the decision to continue
    their education and attend graduate school? An
    undergraduate attending a four-year college with
    a semester system (versus a quarter system) would
    have a total of eight semesters of classes
    (excluding any summer sessions). A sample of 18
    senior undergraduates who would be graduating and
    attending graduate school were asked the
    following question "In which semester 1, 2, 3,
    4, 5, 6, 7, or 8 did you decide you would
    continue your education and attend graduate
    school?" The responses are given below

(a) Construct a frequency plot of these
data. (b) Obtain the following sample statistics
for these data. Minimum ___________ Maximum
______________ Median _____________ Mean
_____________ (c) How do the two measures of
center, the median and the mean, compare? Select
one i. Median gt Mean ii. Median lt
Mean iii. Median Mean
21
LDI 5.6
  • Is this distribution symmetric?
  • What is the median?
  • What is the mean?

22
LDI 5.7 p310 Good vs. Poor measure of Center
  • Draw a distribution for which.
  • The mean would be a good measure of the center of
    a distribution.
  • The mean would be a poor measure of the center of
    a distribution.
  • The median would be a good measure of the center
    of a distribution.
  • The median would be a poor measure of the center
    of a distribution.
  • The mode would be a good measure of the center of
    a distribution.
  • The mode would be a poor measure of the center of
    a distribution.

23
Measures of Variation
  • Now that we can measure the center of a
    distribution, we need to know something about the
    spread or variability of the data.
  • There are (as with the average) several popular
    ways of doing this measurement.

24
Why Measure Variation?
  • Consider the following plots
  • They both have mean of 60, but are they the same
    distribution?

25
The Range
  • Our first crude estimate of the variation of a
    data set is the range which is simply max min.
  • Again, this measure is very limited in its
    ability to describe the spread in a data set.

26
Example
  • Consider these distributions
  • They have the same range of 30 20 or 10, yet
    they have very different variation.

27
Quartiles
  • Recall that the median is the middle number of a
    distribution. This means that 50 of the data
    will fall below this value. We can chop the data
    into four equal pieces by finding the median of
    the lower 50 and the upper 50. These values are
    called the Quartiles.

28
Find the Quartiles for AGE
  • Q1 is the first quartile, 25 of the data fall
    below this value and 75 above it.
  • MED is the second quartile, 50 of the data fall
    below this value and 50 above it.
  • Q3 is the third quartile, 75 of the data fall
    below this value and 25 fall above it.

29
InterQuartile Range
  • The interquartile range or IQR is simply the
    difference between Q3 and Q1
  • IQR Q3Q1
  • Find the IQR for the AGE data.

30
5-Number Summary and Boxplots
  • The 5-number summary is simplyMinQ1MedQ3Max
  • A Boxplot is a plot of these points.

31
Lets Do It
  • Page 317 LDI 5.8

32
1.5xIQR Rule
  • Any value of the data that falls 1.5xIQR above Q3
    or 1.5xIQR below Q1 is a considered an outlier.
  • Do modified boxplot of AGE data by hand
  • Do boxplots on TI-83

33
Lets Do It
  • Page 320 LDI 5.9
  • Page 320 LDI 5.10
  • Page 321 LDI 5.11
  • Page 325 LDI 5.12
  • Careful Boxplots dont fully show the shape of
    the distribution!

34
Gismo Products
35
I could have sworn you said eleven
steps.
36
Standard Deviation
  • We want a way to measure spread based upon the
    mean. To do this we will find the average
    distance from the mean of our data. Well,
    actually we find the sum of the squared
    deviations and then divide by n 1 and then take
    the square root.

37
Sample Standard Deviation Formula
  • The TI-83 calculates sample standard deviation of
    data.

38
Population Standard Deviation
  • The TI-83 calculates the population standard
    deviation of data.

39
Find the Stan. Dev.
  • Lets do this small data set by hand1, 4, 2, 3,
    9, 7, 2, 4, 5, 1, 8, 8, 7
  • Lets verify our result on the TI-83

40
Interpretation of SD
  • The standard deviation is roughly the average
    distance of the observations from the mean. The
    more spread out the data are from the mean the
    larger the standard deviation will be.
  • Since the standard deviation is a distance, it is
    always a positive number that carries the same
    units as the mean.

41
Same Means (x 4) Different Standard Deviations
s 0
s 3.0
s 0.8
s 1.0
Frequency
Standard Deviation Increases as Data Gets More
Spread
42
LDI 5.13, p329 Increasing Spread
  • Consider the following three data sets.
  • I 20 20 20
  • II 18 20 22
  • III 17 20 23
  • (a) Which data set will have the smallest
    standard deviation?
  • (b) Which data set will have the largest standard
    deviation?
  • (c) Find the standard deviation for each data
    set and
  • check your answers to (a) and (b).

43
Which Distribution has a larger standard
deviation?
44
LDI 5.14, p331 What Type of Distribution?
45
  • Lets Do It! 5.15 Standard Deviation for Age
  • Use the ages of the subjects from your class.
  • (a) Find the standard deviation for these data.
  • (b) Complete the sentence
  • On average, the ages of these subjects are about
    _______ years from their mean of ____ years.
  • (c) How many of the 20 subjects had ages within
    one standard deviation of the mean
  • (d) How many of the 20 subjects had ages within
    two standard deviations of the mean?

46
Linear Transformations
  • Linear transformations of data can be used to
    change the units of data. For example, you
    collect a set of temperature data in Celsius
  • 40, 41, 39, 41, 41, 40, 38
  • Find the mean and standard deviation for this
    data.

47
What about Fahrenheit?
  • Recall how to convert from Celsius to
    Fahrenheitconvert our data using this
    formula then find the new mean and standard
    deviation.

48
LDI 5.16, p338 A Transformation
  • Data on number of children for 10 households in a
    neighborhood
  • 2, 3, 0, 2, 1, 0, 3, 0,
    1, 4

49
Linear Transformation Rules
  • If X represents the original values, x is the
    average of the original values, and sx is the
    standard deviation of the original values, and if
    the new values are a linear transformation of X,
    YaXb, then the new mean is given by
    and the new standard deviation by

50
LDI 5.17 Standardization A special
transformation p340
  • Lets perform a special transformation of the
    original data on the number of children in a
    household
  • 2, 3, 0, 2, 1, 0, 3, 0, 1, 4

51
Important Transformation
  • We want to be able to standardize data to the
    same scale so we can compare data that might be
    in differing units. For example, compare SAT and
    ACT scores or IQ scores from differing age groups.

52
The Z score
53
Examples
  • Standardize the AGE data
  • What are the mean and standard deviation for
    these transformed data?
  • Will this always happen? Why?

54
Chapter 5 Summary p344
  • In this chapter presented several different
    measures of center and variability in order to
    summarize data numerically. Standardization is a
    useful transformation which will be used on all
    data sets.
Write a Comment
User Comments (0)
About PowerShow.com