Introduction to Inference for Means - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Introduction to Inference for Means

Description:

Because the same car is used with both fuels, a 'matched pair' analysis possible. ... because much of the variability associated with car type is eliminated. ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 43
Provided by: puaf4
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Inference for Means


1
Introduction to Inference for Means
  • Sample means are normally distributed, with mean
    population mean ?
  • standard deviation SE
  • Estimation
  • what is the population mean?
  • confidence interval certain probability that the
    population mean is within the interval
  • CI also can be constructed for the difference
    between the means in two populations, based on
    samples from each population

2
Introduction to Inference for Means
  • Hypothesis tests
  • was the sample drawn randomly from the
    population, or is there a systematic difference
    between the sample and the population?
  • how likely is it that the difference between the
    sample mean and the known population mean is due
    to chance?
  • were two samples drawn randomly from the same
    population?
  • how likely is it that the difference between the
    sample means is due to chance?

3
Distribution of Sample Mean
  • The sample mean is normally distributed, with
    a mean of m and a standard deviation of
  • We can express the sample mean in terms of the
    standard normal distribution

4
Estimation Confidence Intervals
  • The sample mean is a point estimate of the
    population mean. How accurate an estimate?
  • There is a 95 chance that any given sample mean
    will lie within 2 SE of the population mean
  • Thus, also a 95 chance that the population mean
    is within 2 SE of a sample mean
  • The range of values within which a parameter is
    likely to be found is a confidence interval
  • The concept of a confidence interval is
    illustrated in Confidence Intervals.xls.

5
Confidence Intervals and Levels
  • To obtain a confidence interval for a population
    mean, we first specify a confidence level,
    usually 95 (sometimes 90 or 99).
  • We then determine the multiple of the standard
    error we need on either side of the sample mean
    to achieve the given confidence level that the
    interval will contain the population mean
  • So far, we assumed 2 SE for 95 confidence, but
    that is only approximately true

6
Confidence Intervals
  • Confidence intervals have the form
  • (pop. mean) (sample mean) (multiple)(SE)
  • (multiple) is determined by desired confidence
    level, which you choose
  • CL is usually 95, sometimes 90 or 99
  • ? 1 CL probability outside the interval
  • If two-sided interval, area under each tail ?/2
  • If ? is known, (multiple) Z?/2
  • (multiple) is also called critical value of Z

7
Finding Multiples of SE with Excel
  • Let CL confidence level (e.g., 0.9, 0.95,
    0.99)
  • ? 1 CL (e.g., 0.1, 0.05,
    0.01)

Z? NORMSINV(?) one-tailed Z?/2
NORMSINV(?/2) two-tailed
8
?/2 for two-sided confidence interval
Z 1.282
Z 2.576
Z 1.960
Z 1.645
Z 2.326
9
Distribution of Sample Mean (2)
  • The sample mean is normally distributed, with
    a mean of m and a standard deviation of
  • We can express the sample mean in terms of the
    standard normal distribution
  • We usually dont know s. We can use the sample
    standard deviation, s, but this can be a poor
    approximation to s, particularly if n is small

10
Sample Standard Deviation
  • Standard deviation is not a robust measure of
    spread. It is sensitive to outliersin this case,
    an unlucky sample in which the spread of the
    values is either much smaller than the spread in
    the population which it was drawn
  • In income sampling.xls, 0.3s lt s lt 2.4s ??for
    n10. If SE is calculated using a low value of s,
    SE can be seriously under-estimated.
  • overestimates are also possible, but not usually
    as worrisome

11
n10
n10
12
Distribution of Sample Mean
  • There is an exact solution for the distribution
    of the sample mean when we know s but not s
  • assumes population is normally distributed (but
    works well in most other cases)
  • The standardized value

has a t distribution with ? (n 1) degrees of
freedom
13
The t Distribution
  • The t distribution is a close relative of the
    normal distribution as n ? 8 t ? normal
  • The degrees of freedom parameter, (n 1),
    defines the precise shape of the t distribution
  • The t distribution is a little more spread out
    than the normal distribution this increase in
    spread is greater for smaller n
  • The t distribution is used when we want to make
    inferences about a population mean and the
    population standard deviation is unknown

14
The t Distribution
15
Excel Commands t Distribution
  • TDIST(t,deg_freedom,tails)
  • if tails 1, gives the area or probability in
    the right-hand tail of the distributionthat is,
    the probability of finding a value greater than t
  • unlike NORMSDIST, gives prob. to the right
  • t must be positive to calculate the area of the
    left-hand tail (the probability of less than t),
    use the probability of greater than t.
  • if tails 2, gives the probability of gtt or lt t
    (both the left-hand and the right-hand tails)

16
Using TDIST n 6, tails 1
For comparison 1-NORMSDIST(2)
0.023 NORMSDIST(-2) 0.023
17
Using TDIST n 6, tails 2
For comparison 2(1-NORMSDIST(2))
0.046 2(NORMSDIST(-2)) 0.046
18
Excel Commands t Distribution
  • TINV(probability,deg_freedom)
  • gives the value of t, given the total probability
    in both tails half of this goes in the
    right-hand tail and half goes in the left-hand
    tail.
  • tdist.xls contains sample calculations that
    illustrate the TDIST and TINV functions

19
Using TINV tails always 2
For comparison NORMSINV(0.05)
-1.645 NORMSINV(0.95) 1.645 Z0.05
1.645 t0.05,5 2.015
20
Confidence Intervals
  • Confidence intervals have the form
  • (pop. mean) (sample mean) (multiple)(SE)
  • (multiple) is determined by desired confidence
    level, which you choose
  • CL is usually 95, sometimes 90 or 99
  • ? 1 CL probability outside the interval
  • If two-sided interval, area under each tail ?/2
  • If ? is known, (multiple) Z?/2
  • Otherwise, (multiple) t?/2,?
  • (multiple) is also called critical value of Z
    or t

21
Finding Multiples of SE with Excel
  • Let CL confidence level (e.g., 0.9, 0.95,
    0.99)
  • ? 1 CL (e.g., 0.1, 0.05,
    0.01)
  • ? degrees of freedom (e.g., n 1)

22
Multiples of the SE required for a given
confidence level and number of degrees of freedom
Note if you know s, you can use Z instead of t
23
for one-sided, ?/2 for two-sided confidence
interval
24
If s unknown and n 31 (? 30)
  • 90 confidence interval
  • 95 confidence interval
  • 99 confidence interval

25
Example
  • Collect income data for 31 households

TINV(0.05,30) 2.042
  • 95 confidence interval for population mean (mean
    household income)

26
Assumptions
  • Sample is a simple random sample
  • Population distribution is normal
  • If s is known, this assumption not needed use Z
    rather than t
  • Confidence intervals based on t distribution are
    robust to violations of normality, particularly
    for n gt 30
  • Intervals could be too narrow for highly
    asymmetrical distributions with small n

27
Confidence Interval for a Total
  • If T Nx

So if we want a confidence interval for the total
income of the city, just multiple the average
household incomeand its standard errorby the
number of households, N.
28
One-Sided Confidence Intervals
  • Previous examples were two-sided confidence
    intervals we were concerned with establishing
    both lower and upper limits for the mean
  • In some cases we are only interested in the upper
    limit (e.g., EPA or OSHA regulations limiting
    exposure to a chemical)
  • In other cases we are interested only in the
    lower limit (e.g., specifications for the minimum
    reliability of a nuclear reactor component)

29
One-sided Confidence Intervals
  • In these cases, we use one-sided intervals that
    establish a given level of confidence that the
    value is below or above a certain level
  • (pop. mean) lt (sample mean) (multiple)(SE)
  • (pop. mean) gt (sample mean) (multiple)(SE)
  • In this case ? 1 CL should be the area under
    one tail, and (multiple) Z? or t?,?

30
Example Radon Concentrations
  • Three measurements 2.5, 3.0, 3.5 pCi/L
  • Does mean concentration exceed EPA limit of 4.0
    pCi/L? Construct a one-sided 95 CI

31
for one-sided, ?/2 for two-sided confidence
interval
32
Difference of Means
  • If x1 and x2 are independent random variables,
    and if y x1 x2, then
  • Sample means are independent random variables
    (assuming samples are drawn randomly), so these
    rules apply to the difference of sample means

33
Why square root of sum of squares?
  • Independence can be represented graphically by
    perpendicular lines or shapes (knowledge of one
    gives no information about the other)

34
CI for Difference Between Means
  • Let and be the means of two samples of
    size n1 and n2. (e.g., average household income
    in August and September).
  • The difference between sample means
    is a random variable with a mean of (m1 m2)
    and a standard deviation of

35
CI for Difference Between Means
  • What if we dont know s1 or s2? Two solutions.
  • 1. Assume s1 s2 sp (pooled standard dev.)

36
CI for Difference Between Means
  • 2. Assume s1 ? s2

In both cases, use t with ? n1 n2 2 Excel
(Data Analysis) can do either. Which to use? If
s1, s2 different and n1 or n2 is small (lt30) and
no reason to believe s1 ? s2, then use method
1 Otherwise use whichever is most convenient
37
Final Exam Scores in PUAF 610, 1994-2000, by
Gender
  • Why is this considered a sample, rather than a
    population?

38
Final Exam Scores Method 2
39
Final Exam Scores Method 1
40
Confidence Intervals for Paired Samples
  • Sometimes wed like to compare two samples, in
    which each member of one sample is naturally
    paired with a member of the other sample
  • employment status or income of individuals in the
    CPS in consecutive months
  • blood pressure or blood count of an individual
    before and after a treatment
  • IQ or health status of identical twins
  • test score before and after a Kaplan review course

41
Confidence Intervals for Paired Samples
  • Let x1 value for member in the first sample,
  • x2 corresponding value in second sample
  • Define a new variable
  • y x1 x2
  • Calculate the sample mean and sample standard
    deviation of y

42
Example Gasoline Substitute
  • Before a new fuel can be sold, the Clean Air Act
    requires that the producer demonstrate that it
    will not increase emissions of air pollutants.
  • Petrocoal.xls contains data for NOx emissions for
    16 different cars driven with gasoline and with
    Petrocoal (gasoline mixed with methanol derived
    from coal).
  • Because the same car is used with both fuels, a
    matched pair analysis possible. This is more
    sensitive, because much of the variability
    associated with car type is eliminated.
Write a Comment
User Comments (0)
About PowerShow.com