Title: Inference on averages
1Inference on averages
- Data are collected to learn about certain
numerical characteristics of a process or
phenomenon that in most cases are unknown. - Example A study was conducted to analyze womens
bone health. Data on the daily intakes of calcium
(in milligrams) for 36 women, between the age of
18 and 24 years, were collected. What is the
estimated average calcium intake for women in
this age range? - The sample average is an estimate of the average
calcium intake for women between the age of 18
and 24 years. - Population all the women of age (18-24) years.
- Sample 36 women of age (18-24) years selected
at random
2Estimating the population average
- To estimate the population average
- Select a simple random sample of size n from
the population of interest, so that each unit in
the sample has the same probability to be
selected. - Collect data from the sample
- Compute the sample average and the standard
deviation. - The sample average x is an estimate of the
population average. - How accurate is such an estimate?
- A measure of the accuracy is given by the
standard error S.E. of the sample average. - where s is the standard deviation of the
observations. The larger the sample, the more
accurate the average is as an estimate of the
population average
3What is distribution of the sample average?
If the investigators takes several samples of
size n and compute the averages in each sample,
then all the sample averages will be somewhere
around the population average. sample average
population average m sampling error
S.E.
m
4What is the shape of the sampling distribution?
If the sample size n is large (ngt50), the sample
average is approximately normal with mean equal
to the population mean and standard deviation
equal to the standard error of the sample
average.
- The larger the sample, the more accurate the
normal approximation is. -
- If the distribution of the population is not
symmetric, the normal approximation is less
accurate, and you need a larger sample.
5Confidence Intervals for averages
Problem We want to estimate the unknown
population mean µ. Answer We compute a
confidence interval for µ, that is the set of
plausible values for µ in the light of the data.
A 95 confidence interval for µ is defined
as sample average ? margin of error Where
the margin of error indicates how accurate our
estimate is.
6Confidence Intervals
In samples of size n, a level C confidence
interval for the population average is sample
average ta/2S.E. where ta/2 is the
critical value, such that the area between - ta/2
and ta/2 under the curve of the t-distribution
with n-1 degrees of freedom is C1-a.
0.95
The value of ta/2 is computed using the Excel
function TINV(a, df) Where df sample size -1
ta/2
- ta/2
7Example
- Data on the daily intakes of calcium (in
milligrams) for 36 women, between the age of 18
and 24 years were collected. - The sample average is
- The standard deviation is s422
- The sample size is n36
- The standard error is S.E.422/sqrt(36)70.33
- The 95 confidence interval is
- (898.44 t 0.02570.33, 898.44 t 0.025 70.33)
- The value t 0.0252.03, thus a 95 C.I. for m
is (755.66mg, 1041.23mg) - We are 95 confident that the true average
calcium intake is a value between 755.66 mg and
1041.23 mg. -
8 COUNT(data) B4/sqrt(B5)
stdev/sqrt(n) B5-1
n-1 TINV((1-B6), B10) TINV(alpha, df)
9Understanding a 95 confidence interval
For about 95 out of 100 samples, the population
average m lies in the associated 95 confidence
intervals. Suppose we take 25 samples of 36
women between 18 and 24 years of age and for each
sample we compute the sample average and the 95
C.I.
Distribution of sample averages
Why do the intervals move around? How many
intervals contain the true value m?
m
In the long run, 95 of all the samples will
produce an interval that contains the true value
m. Be careful though, it might happen that the
C.I. computed with the sample collected in the
study DOES NOT contain the true average value!
10What is the t-distribution?
The t-distribution with n-1 degrees of freedom is
a symmetric distribution with center at 0. For
large n, the t-distribution is close to the
standard normal distribution.
11Comparing the t-distribution curve and the
standard normal curve
d.f.5
d.f.15
t
t
t-distribution Standard Normal
curve t-distribution curve has fatter tails.
For d.f. around 30, the t-distribution curve is
very similar to the standard normal curve.
d.f.30
t
12A different confidence level
- Suppose we want to compute a 90 confidence
interval for the average calcium intake. - We will use the same formula, with a different
critical value t - The sample average is 898.44 - The standard
deviation is s422 - The sample size is n36
- The standard error is S.E.422/sqrt(36)70.33
- The confidence level C0.90, alpha1-C0.10
- The 90 confidence interval is
- (898.44 t 0.0570.33, 898.44 t 0.05 70.33)
13The critical value t 0.05 1.688 The C.I. Is
(898.44 1.68870.33, 898.44 1.688
70.33) (779.72mg, 1017.168mg) With 90
confidence level, we state that the average
calcium intake is between 779.72mg and 1017.168
mg.
14Approximate Confidence Intervals
The normal approximation can be used to compute
approximate confidence intervals if the sample
size is large (ngt30).
Area under the normal curve 95
m-1.96SE m m1.96SE
1.64 S.E
Margin of error
90 Confidence Interval
1.96 S.E
95 Confidence Interval
2.57 S.E
99 Confidence Interval
15Expressions for C.I.s
is the sample average of n observations in a
simple random sample of size n, where n is large
(gt30)
s is the standard deviation of the n
observations.
The 90 C.I. for the population mean The
95 C.I. for the population mean The 99 C.I.
for the population mean
16General remarks on C.I.s
- The purpose of a C.I. is to estimate an unknown
parameter with an indication of how accurate the
estimate is and of how confident we are that the
result is correct. - The methods used here rely on the assumption that
the sample is randomly selected. - Any confidence interval has two parts
estimate margin of error - The confidence level states the probability that
the method will give a correct answer, i.e. the
confidence interval contains the true value of
the parameter. - The margin of error of a confidence interval
decreases as - The confidence level decreases
- The sample size n increases
17- Remarks
- Notice the trade off between the margin of error
and the confidence level. The greater the
confidence you want to place in your prediction,
the larger the margin of error is (and hence less
informative you have to make your interval). - A C.I. gives the range of values for the unknown
population average that are plausible, in the
light of the observed sample average. The
confidence level says how plausible. - A C.I. is defined for the population parameter,
NOT the sample statistic. - To make a margin of error smaller, you can take a
larger sample! - Use the t-distribution in small samples (nlt30).
For large samples, the t-distribution is
equivalent to the standard normal distribution.
18Testing hypotheses
- The recommended daily allowance (RDA) of calcium
for women between 18-24 years of age is 1300
milligrams. An health organization claims that,
on average, women in this age range take less
calcium than the RDA level. - Using the collected data, what can we conclude
regarding the claim of the health organization?
19Testing hypotheses
Confidence intervals can be used to test
conjectures or hypotheses about a certain
characteristic of interest. A trucking firm
suspects the claim that the average lifetime of
certain tires is at least 28,000 miles. To check
the claim, the firm puts 80 of these tires on its
trucks and gets an average lifetime of 27,563
miles with a standard deviation of 1,348 miles.
What can you conclude from the data ? We can
construct a confidence level and check if the
interval contains the value of 28,000 miles. In
such a case, we could conclude that 28,000 is
plausible in the light of the data!
20Testing hypotheses
- A 95 C.I. for the average lifetime is
- (Are we using the t-distribution or the normal
curve?) - 27,563 1.96 1,348/sqrt(80) 27,563 295.39
miles (27267, 27858). - Based on the data, the confidence interval
contains values that are lower than 28,000 miles
. It is more likely that the tires will last a
shorter time.