Numerical Methods for Describing Data - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Numerical Methods for Describing Data

Description:

Forty students were enrolled in a section of statistics class. ... FDA scientists analyzed McDonald's French fries purchased at seven different ... – PowerPoint PPT presentation

Number of Views:228

Avg rating:3.0/5.0

Slides: 38

Provided by: shish

Category:

more less

Transcript and Presenter's Notes

Title: Numerical Methods for Describing Data

1
Chapter 4

Numerical Methods for Describing Data

2
4.1 Describing the Center of a Data Set

Sample Mean
x the variable for which we have sample data
n the number of observations in the sample
(sample size)
x1, x2, . . ., xn the observations in the
sample
The sample mean
Populations Mean
The population mean, denoted by µ, is the
average of all x values in the entire population

3
4.1 Describing the Center of a Data Set

Example Traumatic knee dislocation often
requires surgery to repair ruptured ligaments.
One measure of recovery is range of motion
(measured as the angle formed when, starting with
the leg straight, the knee is bent as far as
possible). The following are postsurgical range
of motion (in degrees) for a sample of 13
patients
154, 142, 137, 133, 122, 126, 135, 135, 108,
120, 127, 134, 122
Find the sample mean range of motion.

4
Example Three Samples from the Population of All
US Counties (x number of residents)

Mean 41,639.2

Mean 202,769.0

5
Example Three Samples from the Population of All
US Counties (x number of residents)

Population
The population in US is 293,655,404 (2004
census), and there are 3137 counties in US.
Population Mean
µ ?

Mean ?
The mean of Sample 3 is 23,643.6. The population
mean is 93,610.27
6
Example Number of Visits to a Class Web Site
The Mean

Forty students were enrolled in a section of
statistics class. One month after the course
began, the instructor requested a report that
indicated how many times each student had
accessed the class web site. The 40 observations
were listed on the right.
Find the mean.
Does the mean well represent the center of the
data set?

Answer 2. 23.10 is not a very representative
value for the center of this sample, because it
is larger than most of the observations in the
data. The outlying values 331, and, to a lesser
extent, 84 have a substantial impact on the mean.
7
4.1 Describing the Center of a Data Set

The sample median is obtained by
first ordering the n observations from smallest
to largest (with any repeated values included, so
that every sample observation appear in the
ordered list).
Then
sample median the single middle value if n is
odd, OR
the average of the middle two values if n is
even.
Examples (a) The median of the data set 5, 3,
7, 9, 11, 4, 16 is 7, which is the single middle
value of the ordered observation 3, 4, 5, 7, 9,
11, 16 (In this sample, n 7 )
(b) The median of the data set 7, 3, 9, 5, 11,
4, 16, 10 is 8, which is the average of the two
middle values of the ordered observation 3, 4,
5, 7, 9, 10, 11, 16. ( ½ (7 9 ) 8 )

8
Example Number of Visits to a Class Web Site
The Median

Forty students were enrolled in a section of
statistics class. One month after the course
began, the instructor requested a report that
indicated how many times each student had
accessed the class web site. The 40 observations
were listed on the right.
Find the median.
Does the median better represent the center of
the data set than the mean?

Answer on the next slide
9
Solution to the Example Number of Visits to a
Class Web Site The Median

First arrange the data from the smallest to the
largest

The median is the average of the middle two
values (13 13)/2 13. The median better
represents the center of this data set than the
mean (23.10).
10
Comparing the Mean and the Median

The mean can be sensitive to even a single value
that lies far above or below the rest of the data
(outliers). Therefore, the mean for the number of
website visit is not a very representative value
of that sample because of the outlying value 331.
The median is quite insensitive to outliers.
If the histogram of the data set is symmetric,
mean median.

11
4.2 Describing Variability in a Data Set

The range of a data set is defined as
range largest observation smallest
observations
The n deviations from the sample mean are the
differences
Except for the effects of rounding in computing
in deviations, it is always true that

12
4.2 Describing Variability in a Data Set

The sample variance, denoted by s2, is defined by
The sample standard deviation is the positive
square root of the sample variance and is denoted
by s.

The mean, median, sample variance, and standard
deviation are all available in Excel by
clicking Data ? Data Analysis ? Descriptive
Statistics In the dialog box, choose Summary
statistics.
13
4.2 Describing Variability in a Data Set

Example Research by FDA shows that acrylamide (a
possible cancer-causing substance) forms in high
carbohydrate foods cooked at high temperature and
that acrylamide level can vary widely even within
the same brand of food. FDA scientists analyzed
McDonalds French fries purchased at seven
different locations and found the acrylamide
levels shown on the right
What is the range?
What are the deviations?
What is the sample variance?
What is the standard deviation?

Answers on the next slide.
14
Answers to the example of acrylamide data 1.
Range 497 155 342
(See the table on the right.)
15
Number of Visits to a Class Web Site

Forty students were enrolled in a section of
statistics class. One month after the course
began, the instructor requested a report that
indicated how many times each student had
accessed the class web site. The 40 observations
were listed on the right.
Find the sample variance and standard
deviation.
We are using Excel to solve the problem on next
three slides.

Input data
Click Data
Click Data
Analysis
Choose
Descriptive
Statistics
Click OK

A dialog
box will
open.
Enter
A1A40
in the
Input
Range.
Check
Summary
statistics.
Click OK.

Excel gives the results for Mean, Median, Sample
Variance, Standard Deviation and
many others. (you may need to adjust the width of
columns.)

19
Describing Variability in a Data Set
Interquartile Range

Lower quartile median of the lower half of the
sample
Upper quartile median of the upper half of the
sample
(If n is odd, the median of the entire sample is
excluded from both halves.)
The interquartile range (iqr), a measure of
variability that is not as sensitive to the
presence of outliers as the standard deviation,
is given by
iqr upper quartile lower quartile

20
Example Hospital Cost-to-Charge Ratios

The cost-to-charge ratio is computed as the ratio
of the actual cost of care to what the hospital
actually bills, and the ratio is usually
expressed as a percentage. A cost-to-charge ratio
of 60 means that the actual cost is 60 of what
was billed. The ratios for 31 hospitals in Oregon
for inpatient services published in the Oregon
Department of Health in 2002 were
68, 76, 60, 88, 69, 80, 75, 67, 71, 100, 63,
62
71, 74, 64, 48, 100, 72, 65, 50, 72, 100, 63,
45
54, 60, 75, 57, 74, 84, 83
Find the lower quartile, upper quartile and
interquartile range (iqr).

21
Solution to the Example Hospital Cost-to-Charge
Ratios

Ordered data
Lower half 45 48 50 54 57 60 60 62 63 63 64 65
67 68 69
Median 71
Upper half 71 72 72 74 74 75 75 76 80 83 84 88
100 100 100
Lower quartile 62
Upper quartile 76
iqr 76 62 14
The mean 70.65
The standard deviation 14.11

22
4.3 Summarizing a Data Set Boxplots

Construction of a Skeletal Boxplot
Draw a horizontal measurement scale
Construct a rectangular box with a left (or
lower) edge at the lower quartile and a right (or
upper) edge at the upper quartile. The box width
is then equal to the iqr.
Draw a vertical line segment inside the box at
the location of the median.
Extend horizontal line segments, called whiskers,
from each end of the box to the smallest and
largest observations in the data set.

23
Boxplot for Hospital Cost-to-Charge Ratio

Example Revisiting Hospital Cost-to-Charge
Ratios
Lower half 45 48 50 54 57 60 60 62 63 63 64 65
67 68 69
Median 71
Upper half 71 72 72 74 74 75 75 76 80 83 84 88
100 100 100
Construct a modified boxplot of the data.
smallest observation 45 largest observation
100
lower quartile 62 upper quartile 76
median 71

24
Modified Boxplot

An observation is an outlier if it is more than
1.5 iqr away from the nearest quartile (the
nearest end of the box).
An outlier is extreme if it is more than 3 iqr
from the nearest end of the box.
A modified boxplot
Extend whiskers from each end of the box to the
most extreme observation that is not an outlier.
Draw a solid circle to mark the location of any
mild outlier.
Draw an open circle to mark the location of any
extreme outliers in the data set.

25
Example Summarizing a Data Set

The 51 observations for gross state product (in
billions of dollars) for the 50 states and the D.
C. are as follows
16, 17, 18, 20, 21, 24, 30, 31, 32, 34, 40, 40,
41, 48, 52,
54, 60, 62, 62, 63, 77, 82, 85, 100, 105,
107,
110, 129, 134, 142, 142, 158, 160, 161, 163,
165, 174, 193, 231, 236, 239, 254, 295, 319, 341,
364, 419, 426, 646, 707, 1119
Construct a modified boxplot.

Lower quartile 41, upper quartile 231
iqr 231 41 190 1.5 iqr 285, 3 iqr
570
upper quartile 1.5 iqr 516, upper quartile
3 iqr 801
26
4.4 Interpreting Center and Variability
Chebyshevs Rule, the Empirical Rule, and z Scores

Combine the mean and standard deviation to obtain
informative statements about
how the values in a data set are distributed
the relative position of a particular value in a
data set.
It is useful to describe how far away a
particular observation is from the mean in terms
of the standard deviation.
A data set of scores on a standardized test with
a mean and standard deviation of 100 and 15,
respectively.
70 85 100 115 130
2 sd 1 sd Mean 1 sd 2 sd
below below above above

27
4.4 Interpreting Center and Variability
Chebyshevs Rule

Consider any number k, where k gt 1. Then the
percentage of observations that are within k
standard deviations of the mean is at least

28
Example Child Care Time for Preschool Kids

An article examined various modes of care for
preschool children. For a sample of families with
one child, the mean and standard deviation of
child care time per week were approximately 36
hours and 12 hours, respectively. Use Chebyshevs
Rule to
display values that are 1, 2, and 3 standard
deviations from the mean.
determine what percentage of observations are
between 12 and 60 hours.
determine what percentage of observations exceed
72.
determine what percentage of observations are
less than 18.

29
Answer to the Child Care Example

1 standard deviation from the mean is (24, 48).
(36 12)
2 standard deviation from the mean is (12, 60).
(36 12x2)
3 standard deviation from the mean is (0, 72).
(36 12x3)
The observations between 12 and 60 are within 2
standard deviations from the mean, and therefore,
k 2. By Chebyshevs Rule, at least
must be between 12 and 60 hours.
3. By Chebyshevs Rule, at least 89 of the
observations must be between 0 and 72, and
therefore, at most 11 are outside this interval.
Time cannot be negative, so at most 11 of the
observations exceed 72.
4. The values 18 and 54 are 1.5 standard
deviations from the mean, so using k1.5 in
Chebyshevs Rule implies that at least 55.6 of
the observations must be between thse two values.
Thus at most 44.4 of the observations are less
than 18. If the distribution of values is
symmetric, then at most 22.2 are less than 18.

30
Example IQ Score

The following is a stem-and-leaf display of IQ
scores of 112 children. (Summary Quantities Mean
104.5, Standard deviation16.3). (All
observations within two standard deviations of
the mean are shown in blue.)

6 7 8 9 10 11 12 13 14 15
1 25679 0000124555668 0000112333446666778889 00011
22222333566677778899999 00001122333344444477899 01
111123445669 006 26 Stem Tens 2 Leaf
Ones

Show how Chebyshevs Rule considerably
understates actual percentage.

Solution on next slide
31
Solution to the Example IQ Score

Chaebyshevs Rule considerably underestimate
actual percentages. The stem-and-leaf display of
IQ scores shows a symmetric shape, and in this
case Empirical Rule on next slide will have a
better estimate.

32
The Empirical Rule

If the histogram of values in a data set can be
reasonably well approximated by a normal curve.
Then
Approximately 68 of the observations are within
1 standard deviation of the mean.
Approximately 95 of the observations are within
2 standard deviation of the mean.
Approximately 99.7 of the observations are
within 3 standard deviation of the mean.

33
Example Heights of Mothers

A data set consists of 1052 measurements of
heights of mothers. The mean and standard
deviation were
Mean 62.484 in.
Standard deviation 2.390 in.
A normal curve did provide a good fit to the
data.
Use Chebyshevs Rule and the Empirical Rule to
summarize the distribution of mothers heights.

A summary table on next slide
34
Answer to the Example Heights of Mothers

The data distribution was approximately normal.
Therefore, the Empirical Rule is much more
successful and informative in this case than
Chebyshevs Rule.

35
Measure of Relative Standing The z Score

The z score corresponding to a particular value
is
The z score tells us how many standard deviations
the value is from the mean. It is positive or
negative according to whether the value lies
above or below the mean.

36
Example Relatively Speaking, Which is the Better
Offer?

Suppose that two graduating seniors, one a
marketing major and one an accounting major, are
comparing job offers. The accounting major has an
offer for 45,000 per year and the marketing
student has an offer for 42,500 per year. Which
is better offer relative to their own majors?
For accounting majors mean income salary
46,000 with standard deviation 1500
For marketing majors mean income salary
43,000 with standard deviation 1000.

Answer on the next slide.
37
Solution to the Example Relatively Speaking,
Which is the Better Offer?
The accounting majors salary (45,000) is 0.67
standard deviation below the mean salary of
accounting major, while the marketing majors
salary (42,500) is 0.5 standard deviation below
the mean. Relative to their own major, the
marketing offer is actually a little more
attractive.

Write a Comment

User Comments (0)