Title: Comparing two populations
1Comparing two populations
- Sometimes we want to compare two populations
rather making decisions about a single
population. - For example, we might want to compare two
population means or two population proportions to
see if they are equal. - Is the expected drying time for one type of paint
lower than that of another type of paint? - Is the proportion of republicans who favor
withdrawing from Iraq higher than the proportion
of democrats who favor withdrawal?
2Comparing two population means
- Suppose we have two independent samples, X1,,Xm
and Y1,,Yn, from two separate populations. - A natural statistic for comparing the two
population means, mX and mY, is . -
-
- The distribution of is also Normal for m
and n both large.
3Large samples test for comparing population means
- To test H0 mX mY D0, use the test statistic
4Home sales data
- A realtor in Albuquerque wants to argue that
houses in the Northeast are more expensive on
average than those in the rest of town. The
data below contain sale prices (in 100s) for
homes in the city. NE 1 indicates a home was
in the Northeast. NE 0 indicates a home was
not in the Northeast. Test the appropriate
hypotheses with a 0.01.
5Home sales data
6Large samples confidence interval for the
difference between two population means
- A large sample (1-a)100 confidence interval for
mX mY is - For the home sales data, what is a 99 confidence
interval for the difference between sale prices
in the Northeast and the rest of town? - Home sales data
7Large sample confidence intervals
- Interpret the confidence interval
8Equal population variances
- Suppose we assume that the two populations have a
common variance s2. -
- We can then estimate this common variance using
the pooled sample variance
9Small samples test for comparing population means
from Normal distributions with equal variances
- To test H0 mX mY D0, use the test statistic
10THC example with equal variances
- The active component in marijuana is THC. An
experiment was conducted to compare two slightly
different configurations of this substance. The
THC data set contains the time until the effect
was perceived for 6 subjects exposed to each
configuration. Is there any evidence that the
mean time to perception is different between the
two configurations using a 0.01?
11Small samples confidence interval for the
difference between two population means
- Assuming equal variances, a small sample
(1-a)100 confidence interval for mX mY is - For the THC data, what is a 99 confidence
interval for the mean difference between the
detection times for the two configurations? - THC data set
12Unequal population variances
- The pooled procedures we have discussed
previously are fairly robust to the assumption of
equal variances. - In other words if the two population variances
are relatively close, the procedures perform
well - The level of significance for the hypothesis test
is close to what it should be - The coverage probability for the confidence
interval is close to what it should be - If the variances are quite different, then we
need a different procedure.
13Small samples test for comparing population means
from Normal distributions with unequal variances
- To test H0 mX mY D0, use the test statistic
- with degrees of freedom
14THC example with unequal variances
15Small samples confidence interval for the
difference between two population means
- Assuming unequal variances, a small sample
(1-a)100 confidence interval for mX mY is - For the THC data, what is a 99 confidence
interval for the mean difference between the
detection times for the two configurations? - THC data set
16Paired data
- Sometimes we have a third variable that connects
elements from the X and Y samples. - In this case, the assumption of independence
between the two samples may be violated. - Is there any evidence that the first twin and the
second twin have different average weights among
boy-boy twins? - In this case, the twins are clearly connected by
the mother. - It might be better to base our test on the n
pairwise differences, Di Xi Yi.
17Paired test for comparing population means
- To test H0 mX mY D0, use the test statistic
18Twins example
- Load the Twins data from StatCrunch sample data
sets. Is there any evidence that Twin A and Twin
B have different average weights among boy-boy
twins with a 0.1? - StatCrunch
19Paired confidence interval for the difference
between two population means
- A small sample (1-a)100 confidence interval for
mX mY is - For the twins data, what is a 90 confidence
interval for the mean difference between the twin
A and twin B weights? - StatCrunch
20Comparing two population proportions
- A natural statistic for comparing the two
population proportions, pX and pY, is . -
-
- The distribution of is also Normal
for m and n both large.
21Large samples test for comparing population
proportions
- To test H0 pX pY 0, use the test statistic
- where
22Polio example
- The following table summarizes a study of the
efficacy of the Salk vaccine. - Was the vaccine effective? Test at a 0.05.
- StatCrunch
23Large samples confidence interval for the
difference between two population proportions
- A large sample (1-a)100 confidence interval for
pX pY is - For the Polio data, what is a 95 confidence
interval for the difference between the
proportion who contract the disease under each
treatment? - StatCrunch
24Comparing two population variances
- Suppose two chemical companies can supply a raw
material, but we suspect the variability in
concentration may differ between the two. - The standard deviation of concentration in a
random sample of 15 batches from company 1 was
found to be 4.7 g/l. A sample of 21 batches from
company 2 yielded a standard deviation of 5.8
g/l. - Is there sufficient evidence to conclude that the
variability in concentration differs for the two
companies?
25Test for comparing population variances from
Normal distributions
- To test H0 sX2 sY2, use the test statistic
F calculator
26Chemical example
- Is there sufficient evidence to conclude that the
variability in concentration differs for the two
companies with a 0.05? - F Calculator
27Confidence interval for the ratio of two Normal
population variances
- A large sample (1-a)100 confidence interval for
sX2/sY2 is - For the THC example, what is a 95 confidence
interval for the ratio of concentration
variances? - THC data set