Title: Estimation Using a Single Sample
1Chapter 9
- Estimation Using a Single Sample
2Chapter 9 Estimation Using a Single Sample
- The objective of inferential statistics is to use
sample data to estimate the value of some
population characteristics. - The estimate is based only on a simple sample
rather than on a census of population, we have to
construct the estimate in a way that conveys
information about the anticipated accuracy. - Two main estimation techniques Point estimation
and interval estimation.
39.1 Point Estimate
- A point estimate of a population characteristic
is a single number that is based on sample data
and that represents a plausible value of the
characteristic. - Example An article reported that 537 of 1013
surveyed believe that affirmative action program
should be continued. Use this information to
estimate the true proportion p of all US adults
who favor continuing affirmative action program.
Answer p 537/1013 0.53 53
4Example Internet Use by College Students
- The following observations represent the number
of Internet hours per week reported by 20 college
students
Use the data to find a point estimate of µ, the
true mean Internet time per week for college
students. Find sample mean, sample median, and
the 10 trimmed mean average of the middle 80
observations.
sample mean 7.075 sample median ½ (7.00
7.25)7.125 After trim the top 10 and the
lowest 10 of the data values (two each), middle
80 observations the middle 16 observations, so
The 10 trimmed mean average of middle 16
observations 7.031.
5Sampling distributions of 3 different statistics
- Left Figure The distribution is centered to the
right of the true value. Estimate gt the true
value - Middle Figure The sampling distribution is
correctly centered, and it spreads out quite a
bit about the true value. - Right Figure The mean of the statistics is the
same as the true value of the population
characteristic and the statistics standard
deviation is relatively small.
6Unbiased and Biased Statistic
- A statistic whose mean value is equal to the
value of the population characteristic being
estimated is said to be an unbiased statistic. A
statistic that is not unbiased is said to be
biased. - Sample mean is an unbiased statistic for
estimating µ. - ( )
- Sample variance s2 is an unbiased statistic for
estimating s2. However, sample standard deviation
s tends to underestimate slightly the true value
s. - Sample range is a biased statistic for estimating
the population range.
7Choosing a statistic for computing an Estimate
- Given a choice between several unbiased
statistics that could be used for estimating a
population characteristic, the best statistic to
use is the one with the smallest standard
deviation. - For example, if the population distribution is
normal, - has a smaller standard deviation than any
other unbiased statistic for estimating µ. - When the population distribution is
symmetric with heavy tails, a trimmed mean is a
better statistic than x for estimating µ.
8Exercise Airborne Times for Flight 448
- According to the data provided by the Bureau of
Transportation Statistics, the airborne times (in
minutes) for United Airlines flight 448 from
Albuquerque to Denver on 10 randomly selected
days between 1-1-2003 and 3-31-2003 are - 57 54 55 51 56 48
52 51 59 59 - Use the sample variance s2 to estimate the
population variance s2.
Point estimate of s2 is s2 13.51
99.3 Confidence Interval for a Population Mean
- A confidence interval (CI) is constructed so
that, with a chosen degree of confidence, the
value of the population characteristic is
captured between the lower and upper endpoints of
the interval. - The confidence level associated with a confidence
interval estimate is the success rate of the
method used to construct the interval.
10How to find the z critical value based on a
particular confidence level
- Supposed that the selected confidence level is
95. We need to determine a value z such that a
central area of 0.95 falls between z and z.
The remaining area of 0.05 is divided equally
between the two tails. - Noticing that the total area to the left of z is
.975 (.95 central .025 area to the left of z),
we use z-table to find z1.96.
11Confidence Interval for µ, Large Sample and s
known
12- Example Radiation Exposure 111 US nuclear
reactors were ranked according to employee
exposure to radiation. Diablo Canyon Nuclear
Power Plants Unit 2 reactor ranked the 28th
worst with a mean annual radiation exposure of
0.481 rem from a sample of 100 workers. Suppose
that s 0.35 rem, construct a 95 confidence
interval for µ. - Solution Let µ denote the true mean radiation
exposure for Unit 2 workers at Diablo Canyon. The
z critical value for 95 confidence is 1.96.
The 95 confidence for µ is (0.412, 0.550).
The true mean annual radiation exposure for
Diables Unit 2 workers is between 0.412 and
0.550 rem.
13Confidence Interval for µ When s is Unknown
- s is rarely known in practice.
- If we use the sample standard deviation s in
place of s, the result is a different
standardized variable denoted by t. - The assumption that the population is normal is
not critical if the sample size is large, but it
is important when the sample size is small. - The following table lists the situations when we
use z distribution and when we use t distribution
14Properties of t Distribution
- The t curve corresponding to any fixed number of
degrees of freedom is bell shaped and is centered
at 0. (like the z curve) - Each t distribution is more spread out than the z
curve. - As the number of degrees of freedom increases,
the spread of the corresponding t curve
decreases. - As the number of degrees of freedom increases,
the corresponding sequence of t curves approaches
the z curve.
15Use A Table to Find t Critical Values
- Appendix Table 3 (page 708, also inside
the back cover) gives selected critical values
for various t distributions. The central areas
and confidence levels (from 0.80 to 0.999) are
listed on the first row. - To find a particular t critical value
- First go down the left margin of the table to the
row labeled with the desired number of degrees of
freedom. - Then move over that row to the column headed by
the desired central area (confidence level). - Example Find the t critical value with a 95
confidence and n 13. - The value in the 12-df (df 13 -1) row under
the column corresponding to confidence level 95
is 2.18. - Similarly, the t critical value with a 99
confidence and n 13 is 3.06. - Once the degrees of freedom exceeds 30, the
critical values change little as df increases.
For this reason, the Table jumps from 30 df to 40
df, then to 60df, and finally to 120 df . - For df gt 120, we use the z critical values
because the t curve closely resembles the z curve
as n (and hence df) becomes larger.
16Exercise Use Appendix Table 3 to find t critical
values
- If you use a t table online, you have to make
sure how to use it. - What is the t critical value with 95 confidence
and n 26? - What is the t critical value with 99 confidence
and n 20? - What is the t critical value with 90 confidence
and n 100? - What is the t critical value with 95 confidence
and n 140?
Answer 1) 2.06 2) 2.86 3) 1.66 4) 1.96
17t Distribution with (n - 1) df (degree of freedom)
- Let x1, x2, , xn constitute a random sample from
a normal population distribution. Then the
probability distribution of the standardized
variable -
-
- is the t distribution with df n -1.
18One-Sample t Confidence Interval for µ
- The general formula for a confidence interval
for a population mean µ based on a sample of size
n when - is the sample mean from a random sample,
- the population distribution is normal, or the
sample size n is large (generally n 30), and - s, the population standard deviation, is unknown
- is
- where the t critical value is based on (n -1)
df.
19- Example Executive Salaries An article presented
data from a random sample of 231 married male
executives with MBA degree. The information from
this sample is given in the table - n Mean Salary Standard Deviation
- Two-income family 140 95,140
15,000 - Sole source of income 91 124,510 18,000
- Do male executives whose wives stay at home
earn more? Construct a 90 confidence interval
for each group. - Solution For executives whose wives also work,
the t critical value with 90 confidence and n
140 is 1.645.
The 90 confidence interval for µ is (93054.58,
97225.42).
For executives whose wives stay at home, the t
critical value with 90 confidence and n 91 is
1.66.
The 90 confidence interval for µ is
(121377.72, 127642.28). Based on the two
interval estimates, it appears that the mean
salary for the two-income family group is lower
than the mean for the one-income group.
20Example Walking a straight line
- A study of the ability of individuals to walk in
a straight line reported the data on cadence
(strides per second) for a sample of n 20
randomly selected healthy men. Construct a 99
confidence interval for µ. - We solve this problem using Excel.
- See next three slides.
21As usual, we go to Data, and then Data Analysis
on the top right corner of the screen. In the
Data Analysis dialog box, select Descriptive
Statistics, and then click OK.
22In the Descriptive Statistics dialog box, choose
the input range. Make sure to check the box of
Confidence level for Mean, and enter the
confidence level (99 in this problem) in the
box.
23 In addition to the usual descriptive
statistics shown in the output box, there is a
value (0.051784) at the bottom for the confidence
level (99.0). Using this value we can calculate
the 99 confidence interval 0.9255 (mean)
.0518 ( .8737, .9773 ).Conclusion With 99
confidence, we estimate the population mean
cadence to be between 0.8737 and 0.9773 stride
per second.
24Example Housework
- How many minutes do school-age children helping
with housework? An article gave information on
the number of minutes per weekday school children
spent on housework. The mean and standard
deviation for a random sample of 26 girls in
two-parent families are 14.0 minutes and 8.6
minutes, respectively. Construct a 95 confidence
interval. What kind of assumption must we have on
the population distribution?
25- Solution to Example Housework
- Solution Because n 26, df 26 - 1 25, and
the t critical value for 95 confidence level is
2.06.
The 95 confidence interval is (10.5, 17.5). We
believe that the true mean time per weekday
spent on housework is between 10.5 and 17.5
minutes for girls in two-parent families with a
5 error rate.
Analysis We should be somewhat cautious in
interpreting this confidence interval because it
is a bit questionable that the population
distribution is approximately normal. (Minutes
cannot be negative, so the smallest possible
value, 0, is only 1.63 standard deviations below
the mean.) However, the sample size n 26 is
relatively close to 30, we are still able to use
the t confidence interval formula.
26Exercise Selfish Chimps?
- In a study, chimpanzees learned to use an
apparatus that dispensed food when either of two
ropes was pulled. When one of the ropes was
pulled, only the chimp controlling the apparatus
received food. When the other rope was pulled,
food was dispensed both to the chimp controlling
the apparatus and also to a chimp in the
adjoining cage. The following data represent the
number of times out of thirty-six trials that
each of seven chimps chose the option that would
provide food to both chimps (the charitable
response). Construct a 99 confidence interval.
Answer 99 confidence interval (18.77, 23.81)
27Choosing the Sample Size
- The sample size required to estimate a population
mean µ to within an error amount B with 95
confidence is - If s is unknown, it can be estimated based on
previous information s range / 4. - If the desired confidence level is something
other than 95, 1.96 is replaced by the
appropriate z critical value.
28- Example Choosing the Sample Size
- The financial aid office wishes to estimate the
mean cost of textbooks per quarter for students
at a particular university. For the estimate to
be useful, it should be within 20 of the true
population mean. How large a sample should be
used to be 95 confident of achieving this level
of accuracy? The financial office is pretty sure
that the amount spent on books is mostly between
50 and 450. - Solution A reasonable estimate of s is ¼ of
the range - s ¼ ( 450 50 ) 100.
- The required sample size is
Rounding up, a sample size of 97 is required.