Introduction to Inference for Means - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Introduction to Inference for Means

Description:

Because the same car is used with both fuels, a 'matched pair' analysis possible. ... because much of the variability associated with car type is eliminated. ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 43

Provided by: puaf4

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Inference for Means

1
Introduction to Inference for Means

Sample means are normally distributed, with mean
population mean ?
standard deviation SE
Estimation
what is the population mean?
confidence interval certain probability that the
population mean is within the interval
CI also can be constructed for the difference
between the means in two populations, based on
samples from each population

2
Introduction to Inference for Means

Hypothesis tests
was the sample drawn randomly from the
population, or is there a systematic difference
between the sample and the population?
how likely is it that the difference between the
sample mean and the known population mean is due
to chance?
were two samples drawn randomly from the same
population?
how likely is it that the difference between the
sample means is due to chance?

3
Distribution of Sample Mean

The sample mean is normally distributed, with
a mean of m and a standard deviation of
We can express the sample mean in terms of the
standard normal distribution

4
Estimation Confidence Intervals

The sample mean is a point estimate of the
population mean. How accurate an estimate?
There is a 95 chance that any given sample mean
will lie within 2 SE of the population mean
Thus, also a 95 chance that the population mean
is within 2 SE of a sample mean
The range of values within which a parameter is
likely to be found is a confidence interval
The concept of a confidence interval is
illustrated in Confidence Intervals.xls.

5
Confidence Intervals and Levels

To obtain a confidence interval for a population
mean, we first specify a confidence level,
usually 95 (sometimes 90 or 99).
We then determine the multiple of the standard
error we need on either side of the sample mean
to achieve the given confidence level that the
interval will contain the population mean
So far, we assumed 2 SE for 95 confidence, but
that is only approximately true

6
Confidence Intervals

Confidence intervals have the form
(pop. mean) (sample mean) (multiple)(SE)
(multiple) is determined by desired confidence
level, which you choose
CL is usually 95, sometimes 90 or 99
? 1 CL probability outside the interval
If two-sided interval, area under each tail ?/2
If ? is known, (multiple) Z?/2
(multiple) is also called critical value of Z

7
Finding Multiples of SE with Excel

Let CL confidence level (e.g., 0.9, 0.95,
0.99)
? 1 CL (e.g., 0.1, 0.05,
0.01)

Z? NORMSINV(?) one-tailed Z?/2
NORMSINV(?/2) two-tailed
8
?/2 for two-sided confidence interval
Z 1.282
Z 2.576
Z 1.960
Z 1.645
Z 2.326
9
Distribution of Sample Mean (2)

The sample mean is normally distributed, with
a mean of m and a standard deviation of
We can express the sample mean in terms of the
standard normal distribution

We usually dont know s. We can use the sample
standard deviation, s, but this can be a poor
approximation to s, particularly if n is small

10
Sample Standard Deviation

Standard deviation is not a robust measure of
spread. It is sensitive to outliersin this case,
an unlucky sample in which the spread of the
values is either much smaller than the spread in
the population which it was drawn
In income sampling.xls, 0.3s lt s lt 2.4s ??for
n10. If SE is calculated using a low value of s,
SE can be seriously under-estimated.
overestimates are also possible, but not usually
as worrisome

11
n10
n10
12
Distribution of Sample Mean

There is an exact solution for the distribution
of the sample mean when we know s but not s
assumes population is normally distributed (but
works well in most other cases)
The standardized value

has a t distribution with ? (n 1) degrees of
freedom
13
The t Distribution

The t distribution is a close relative of the
normal distribution as n ? 8 t ? normal
The degrees of freedom parameter, (n 1),
defines the precise shape of the t distribution
The t distribution is a little more spread out
than the normal distribution this increase in
spread is greater for smaller n
The t distribution is used when we want to make
inferences about a population mean and the
population standard deviation is unknown

14
The t Distribution
15
Excel Commands t Distribution

TDIST(t,deg_freedom,tails)
if tails 1, gives the area or probability in
the right-hand tail of the distributionthat is,
the probability of finding a value greater than t
unlike NORMSDIST, gives prob. to the right
t must be positive to calculate the area of the
left-hand tail (the probability of less than t),
use the probability of greater than t.
if tails 2, gives the probability of gtt or lt t
(both the left-hand and the right-hand tails)

16
Using TDIST n 6, tails 1
For comparison 1-NORMSDIST(2)
0.023 NORMSDIST(-2) 0.023
17
Using TDIST n 6, tails 2
For comparison 2(1-NORMSDIST(2))
0.046 2(NORMSDIST(-2)) 0.046
18
Excel Commands t Distribution

TINV(probability,deg_freedom)
gives the value of t, given the total probability
in both tails half of this goes in the
right-hand tail and half goes in the left-hand
tail.
tdist.xls contains sample calculations that
illustrate the TDIST and TINV functions

19
Using TINV tails always 2
For comparison NORMSINV(0.05)
-1.645 NORMSINV(0.95) 1.645 Z0.05
1.645 t0.05,5 2.015
20
Confidence Intervals

Confidence intervals have the form
(pop. mean) (sample mean) (multiple)(SE)
(multiple) is determined by desired confidence
level, which you choose
CL is usually 95, sometimes 90 or 99
? 1 CL probability outside the interval
If two-sided interval, area under each tail ?/2
If ? is known, (multiple) Z?/2
Otherwise, (multiple) t?/2,?
(multiple) is also called critical value of Z
or t

21
Finding Multiples of SE with Excel

Let CL confidence level (e.g., 0.9, 0.95,
0.99)
? 1 CL (e.g., 0.1, 0.05,
0.01)
? degrees of freedom (e.g., n 1)

22
Multiples of the SE required for a given
confidence level and number of degrees of freedom
Note if you know s, you can use Z instead of t
23
for one-sided, ?/2 for two-sided confidence
interval
24
If s unknown and n 31 (? 30)

90 confidence interval

95 confidence interval

99 confidence interval

25
Example

Collect income data for 31 households

TINV(0.05,30) 2.042

95 confidence interval for population mean (mean
household income)

26
Assumptions

Sample is a simple random sample
Population distribution is normal
If s is known, this assumption not needed use Z
rather than t
Confidence intervals based on t distribution are
robust to violations of normality, particularly
for n gt 30
Intervals could be too narrow for highly
asymmetrical distributions with small n

27
Confidence Interval for a Total

If T Nx

So if we want a confidence interval for the total
income of the city, just multiple the average
household incomeand its standard errorby the
number of households, N.
28
One-Sided Confidence Intervals

Previous examples were two-sided confidence
intervals we were concerned with establishing
both lower and upper limits for the mean
In some cases we are only interested in the upper
limit (e.g., EPA or OSHA regulations limiting
exposure to a chemical)
In other cases we are interested only in the
lower limit (e.g., specifications for the minimum
reliability of a nuclear reactor component)

29
One-sided Confidence Intervals

In these cases, we use one-sided intervals that
establish a given level of confidence that the
value is below or above a certain level
(pop. mean) lt (sample mean) (multiple)(SE)
(pop. mean) gt (sample mean) (multiple)(SE)
In this case ? 1 CL should be the area under
one tail, and (multiple) Z? or t?,?

30
Example Radon Concentrations

Three measurements 2.5, 3.0, 3.5 pCi/L
Does mean concentration exceed EPA limit of 4.0
pCi/L? Construct a one-sided 95 CI

31
for one-sided, ?/2 for two-sided confidence
interval
32
Difference of Means

If x1 and x2 are independent random variables,
and if y x1 x2, then

Sample means are independent random variables
(assuming samples are drawn randomly), so these
rules apply to the difference of sample means

33
Why square root of sum of squares?

Independence can be represented graphically by
perpendicular lines or shapes (knowledge of one
gives no information about the other)

34
CI for Difference Between Means

Let and be the means of two samples of
size n1 and n2. (e.g., average household income
in August and September).
The difference between sample means
is a random variable with a mean of (m1 m2)
and a standard deviation of

35
CI for Difference Between Means

What if we dont know s1 or s2? Two solutions.
1. Assume s1 s2 sp (pooled standard dev.)

36
CI for Difference Between Means

2. Assume s1 ? s2

In both cases, use t with ? n1 n2 2 Excel
(Data Analysis) can do either. Which to use? If
s1, s2 different and n1 or n2 is small (lt30) and
no reason to believe s1 ? s2, then use method
1 Otherwise use whichever is most convenient
37
Final Exam Scores in PUAF 610, 1994-2000, by
Gender

Why is this considered a sample, rather than a
population?

38
Final Exam Scores Method 2
39
Final Exam Scores Method 1
40
Confidence Intervals for Paired Samples

Sometimes wed like to compare two samples, in
which each member of one sample is naturally
paired with a member of the other sample
employment status or income of individuals in the
CPS in consecutive months
blood pressure or blood count of an individual
before and after a treatment
IQ or health status of identical twins
test score before and after a Kaplan review course

41
Confidence Intervals for Paired Samples

Let x1 value for member in the first sample,
x2 corresponding value in second sample
Define a new variable
y x1 x2
Calculate the sample mean and sample standard
deviation of y

42
Example Gasoline Substitute

Before a new fuel can be sold, the Clean Air Act
requires that the producer demonstrate that it
will not increase emissions of air pollutants.
Petrocoal.xls contains data for NOx emissions for
16 different cars driven with gasoline and with
Petrocoal (gasoline mixed with methanol derived
from coal).
Because the same car is used with both fuels, a
matched pair analysis possible. This is more
sensitive, because much of the variability
associated with car type is eliminated.