Describing Data Numerically - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Describing Data Numerically

Description:

The median is the value of the middle term in a data set ... Atlanta Braves. New York Yankees. St. Louis Cardinals. Tampa Bay Devil Rays. 2002 Total Payroll ... – PowerPoint PPT presentation

Number of Views:239
Avg rating:3.0/5.0
Slides: 76
Provided by: Mart458
Category:

less

Transcript and Presenter's Notes

Title: Describing Data Numerically


1
Chapter 3
  • Describing Data Numerically

2
Basic Summary Measurements
  • Measures of center
  • median
  • mean
  • mode
  • Measures of position
  • quartiles
  • percentiles
  • Measures of spread
  • range
  • interquartile range
  • variance
  • standard deviation

3
Median
  • Definition
  • The median is the value of the middle term in a
    data set that has been ranked in increasing order.

4
Median Continued
  • The calculation of the median consists of the
    following two steps
  • Rank the data set in increasing order
  • Find the middle term in a data set with n values.
    The value of this term is the median.

5
Median cont.
Value of Median for Ungrouped Data
6
Example
The following data give the weight lost (in
pounds) by a sample of five members of a health
club at the end of two months of membership 10
5 19 8 3 Find the median.
7
Solution
First, we rank the given data in increasing
order as follows 3 5 8 10 19
There are five observations in the data set.
Consequently, n 5 and
8
Solution
  • Therefore, the median is the value of the third
    term in the ranked data.
  • 3 5 8 10 19
  • The median weight loss for this sample of five
    members of this health club is 8 pounds.

Median
9
Median and Histogram
The median gives the center of a histogram,
with half the data values to the left of the
median and half to the right of the median. The
advantage of using the median as a measure of
central tendency is that it is not influenced by
outliers. Consequently, the median is preferred
over the mean as a measure of central tendency
for data sets that contain outliers.
10
Median
  • The median of a data set is a number such that
    half the data values are above it and half the
    data values are below it.
  • The median is the 50th percentile of the data

11
Example of Finding Median
  • Distance from the sum (in millions of miles) of
    the nine planets
  • 36, 67, 93, 142, 484, 887, 1765, 2791, 3654
  • What is the median distance from the sun?
  • 484 million miles

12
Another Example
  • Average daily temperature in Chicago for January
    through December (degrees F)
  • 41, 45, 54, 62, 69, 76, 79, 78, 73, 62, 53, 45
  • What is the median monthly temperature?
  • 41, 45, 45, 53, 54, 62, 62, 69, 73, 76, 78, 79
  • (62 62) 2 62 degrees

13
How to Find the Median
  • The sample size n
  • Find (n 1) /2
  • the median is the (n1)/2 observation
  • n 19 then (n1)/2 20/2 10 and the median is
    the 10 observation
  • n 20 then (n1)/2 21/2 10.5 and the median
    is the average of the 10th and 11 observation

14
Two Examples
  • Amount spent renting videos during 2004 for 15
    households
  • 396 52 8 120 140 54 360 230 50 150 700
    410 80 200 72
  • What is the median?
  • 140 (the 8th observation)

15
Stem-and-Leaf for Video Data
  • Prototype 120 120
  • 0 8 50 52 54 72 80
  • 1 20 40 50 Median 40
  • 2 00 30
  • 3 60 96
  • 4 10
  • High 700

16
Second Example
  • Number of stolen bases for National League in
    2002
  • 103, 86, 92, 74, 96, 71, 118, 177, 76, 104, 87,
    116, 94, 71, 63, 86
  • What is the median?
  • (87 92)/2 89.5 , average of the 8th 9th
    observations

17
Stem-and-Leaf for Stolen Bases
  • Prototype 7 1 71
  • 6 3
  • 7 1 1 4 6
  • 8 6 6 7 median (87 92)/2 89.5
  • 9 2 4 6
  • 10 3 4
  • 11 6 8
  • High 177

18
Warning
  • When using Excel to compute medians you will
    sometimes get a different solutions for Q1 and Q3
    than the method described by your text.
  • Excel uses a more complicated interpolation
    algorithm than your textbook for calculating
    quartiles so be aware the values you compute by
    hand may differ from those that Excel reports.

19
Sample Mean
  • The symbol for the sample mean is
  • The sample mean is the average of the data
  • The sample mean is the value where the histogram
    balances

20
Mean
The mean for ungrouped data is obtained by
dividing the sum of all values by the number of
values in the data set. Thus, Mean for
population data Mean for sample data
21
Example
The table on the next slide gives the 2002
total payrolls of five Major League Baseball
(MLB) teams. Find the mean of the 2002 payrolls
of these five MLB teams.
22
Table
23
Solution
Thus, the mean 2002 payroll of these five MLB
teams was 78 million.
24
Example
The following are the ages of all eight
employees of a small company 53
32 61 27 39 44 49 57 Find the mean age of
these employees.
25
Solution
Thus, the mean age of all eight employees of this
company is 45.25 years, or 45 years and 3 months.
26
Mean Continued
  • Definition
  • Values that are very small or very large relative
    to the majority of the values in a data set are
    called outliers or extreme values.

27
Example
The table lists the 2000 populations (in
thousands) of the five Pacific states.
An outlier
28
Discussion
  • Notice that the population of California is very
    large compared to the populations of the other
    four states. Hence, it is an outlier. How does
    the inclusion of this outlier affects the value
    of the mean?

29
Solution
  • If we do not include the population of California
    (the outlier) the mean population of the
    remaining four states (Washington, Oregon,
    Alaska, and Hawaii) is

30
Solution
  • Now, to see the impact of the outlier on the
    value of the mean, we include the population of
    California and find the mean population of all
    five Pacific states. This mean is

31
Mode
  • Definition
  • The mode is the value that occurs with the
    highest frequency in a data set.

32
Example
  • The following data give the speeds (in miles per
    hour) of eight cars that were stopped on I-95 for
    speeding violations.
  • 77 69 74 81 71 68 74 73
  • Find the mode.

33
Solution
  • In this data set, 74 occurs twice and each of the
    remaining values occurs only once. Because 74
    occurs with the highest frequency, it is the
    mode. Therefore,
  • Mode 74 miles per hour

34
Mode cont.
  • A data set may have none or many modes,
    whereas it will have only one mean and only one
    median.
  • The data set with only one mode is called
    unimodal.
  • The data set with two modes is called bimodal.
  • The data set with more than two modes is called
    multimodal.

35
Example
  • Last years incomes of five randomly selected
    families were
  • 36,150. 95,750, 54,985, 77,490, 23,740.
  • Find the mode.

36
Solution
  • Because each value in this data set occurs only
    once, this data set contains no mode.

37
Example
The prices of the same brand of television set
at eight stores are found to be 495, 486,
503, 495, 470, 505, 470, 499 Find the mode.
38
Solution
  • In this data set, each of the two values 495
    and 470 occurs twice and each of the remaining
    values occurs only once.
  • Therefore, this data set has two modes 495 and
    470.

39
Example
The ages of 10 randomly selected students from
a class are 21, 19, 27, 22, 29, 19, 25, 21, 22
and 30 Find the mode.
40
Solution
This data set has three modes 19, 21 and 22.
Each of these three values occurs with a
(highest) frequency of 2.
41
Mode cont.
One advantage of the mode is that it can be
calculated for both kinds of data, quantitative
and qualitative, whereas the mean and median can
be calculated for only quantitative data.
42
Example
  • The status of five students who are members of
    the student senate at a college are senior,
    sophomore, senior, junior, senior.
  • Find the mode.

43
Solution
  • Because senior occurs more frequently than the
    other categories, it is the mode for this data
    set.
  • We cannot calculate the mean and median for this
    data set.

44
Relationships among the Mean, Median, and Mode
For a symmetric histogram with one peak the
values of the mean, median, and mode are
identical, and they lie at the center of the
distribution.
45
Relationships among the Mean, Median, and Mode
cont.
  • For a histogram skewed to the right, the
    value of the mean is the largest, that of the
    mode is the smallest, and the value of the median
    lies between these two.
  • Notice that the mode always occurs at the peak
    point.
  • The value of the mean is the largest in this case
    because it is sensitive to outliers that occur in
    the right tail.
  • These outliers pull the mean to the right.

46
Mean, median, and mode for a histogram skewed to
the right.
47
Relationships among the Mean, Median, and Mode
cont.
If a histogram is skewed to the left,
the value of the mean is the smallest and that
of the mode is the largest, with the value of the
median lying between these two. Mean
Mode In this case, the outliers in the
left tail pull the mean to the left.
48
Mean, median, and mode for a histogram skewed to
the left.
49
When Does the Median Mean?
  • If the histogram is symmetrical then median and
    the median are close in value
  • If the histogram is skewed or there are outliers
    the mean and median will have different values

50
Effects of Skewness
  • A histogram is skewed right if the outliers are
    on the right (or high side)
  • A histogram is skewed left if the outliers are on
    the left (or low side)
  • Skewed right mean median
  • Skewed left mean

51
Measures of Position
  • The median is a measure of position - it marks
    the midpoint or 50th percentile of the data
  • Other important benchmarks are the 25th and 75th
    percentile which isolate the middle 50 of the
    data
  • Other measures of position include other
    percentiles such as the 10th, 80th, etc.

52
Quartiles
  • The 25th percentile is referred to as the first
    quartile or symbolically Q1
  • The 75th percentile is referred to as the third
    quartile or symbolically Q3
  • Sometimes the median is referred to as the second
    quartile or Q2

53
How to Find the Quartiles
The weight loss (in pounds) for 17 members of a
health club three months after joining are
5 8 10 7 2 6 3 9 4 11 7 5 9 4 6 11 5 Draw the
stem-and-leaf graph for the data Find the median
as well as Q1 and Q3
54
Stem-and Leaf Graph for Weight Loss Data
Prototype 0 5 5 0 2 3 4 4 0 5 5
5 6 6 7 7 8 9 9 1 0 1 1 median 6 Q1
(4 5)/2 4.5 Q3 (9 9)/2 9
55
Quartiles and Interquartile Range
  • Definition
  • Quartiles are three summery measures that divide
    a ranked data set into four equal parts. The
    second quartile is the same as the median of a
    data set. The first quartile is the value of the
    middle term among the observations that are less
    than the median, and the third quartile is the
    value of the middle term among the observations
    that are greater than the median.

56
Visually
Each of these portions contains 25 of the
observations of a data set arranged in increasing
order
25
25
25
25
Q1
Q2
Q3
57
Quartiles and Interquartile Range cont.
  • Calculating Interquartile Range
  • The difference between the third and first
    quartiles gives the interquartile range that is,
  • IQR Interquartile range Q3 Q1

58
Example
  • The following are the ages of nine employees of
    an insurance company
  • 47 28 39 51 33 37 59 24 33
  • Find the values of the three quartiles. Where
    does the age of 28 fall in relation to the ages
    of the employees?
  • Find the interquartile range.

59
Solution
a)
Values less than the median
Values greater than the median
The age of 28 falls in the lowest 25 of the ages.
60
Solution
b) IQR Interquartile range Q3 Q1
49 30.5 18.5 years
61
BOX-AND-WHISKER PLOT
  • Definition
  • A plot that shows the center, spread, and
    skewness of a data set. It is constructed by
    drawing a box and two whiskers that use the
    median, the first quartile, the third quartile,
    and the smallest and the largest values in the
    data set between the lower and the upper inner
    fences.

62
Example
  • The following data are the incomes (in thousands
    of dollars) for a sample of 12 households.
  • 35 29 44 72 34 64 41 50 54 104
    39 58
  • Construct a box-and-whisker plot for these data.

63
Solution
  • Step 1.
  • 29 34 35 39 41 44 50 54 58 64
    72 104
  • Median (44 50) / 2 47
  • Q1 (35 39) / 2 37
  • Q3 (58 64) / 2 61
  • IQR Q3 Q1 61 37 24

64
Solution Continued
  • Step 2.
  • 1.5 x IQR 1.5 x 24 36
  • Lower inner fence Q1 36 37 36 1
  • Upper inner fence Q3 36 61 36 97

65
Solution Continued
  • Step 3.
  • Smallest value within the two inner fences 29
  • Largest value within the two inner fences 72

66
Solution Continued
  • Step 4.

First quartile
Third quartile
Median
Income
67
Solution Continued
  • Step 5.

First quartile
Third quartile
An outlier
Median
Largest value within two inner fences
Smallest value within the two inner fences
?
Income
68
Five Number Summary
  • Another very convenient way to graph quantitative
    data is a boxplot which uses 5 numbers to
    summarize the data
  • Minimum value
  • 25th percentile (1st quartile) Q1
  • 50th percentile (2nd quartile or median) Q2
  • 75th percentile (3rd quartile) Q3
  • Maximum value

69
Why Boxplots?
  • Present information more compactly than
    histograms
  • Easier to make comparisons among several data
    sets

70
Main Components of a Boxplot
The boxplot represents the data of a random
sample of women who took an exam in elementary
statistics
71
lower quartile is 76.61 left side of the
box upper quartile is 89.59 right side of the
box median is 84.70 middle line of the
box fences bound all the data except for outliers
72
The Interquartile Range
The interquartile range is a measure of how
spread out the middle 50 of the data
is Interquartile range (IQR) Upper Quartile -
Lower Quartile IQR 89.59 - 76.61 12.98 So,
there is a spread of about 13 points in the
middle 50 of the exam scores
73
Outliers
Compute lower quartile - 1.5 (IQR) 76.76 -
1.5(12.98) 57.29 any data value below 57.29 is
a low outlier upper quartile 1.5(IQR) 89.59
1.5(12.98) 109.06 any data value above 109.06
is a high outlier
74
Fences
Lower fence is the smallest data value that is
not an outlier Upper fence is the largest data
value that is not an outlier
75
Calorie Content of Major Brands of Hotdogs
Write a Comment
User Comments (0)
About PowerShow.com