Title: Comparing Populations
1Comparing Populations
2Comparing proportions
- Situation
- We have two populations (1 and 2)
- Let p1 denote the probability (proportion) of
success in population 1. - Let p2 denote the probability (proportion) of
success in population 2. - Objective is to compare the two population
proportions
3We want to test either
or
or
4The test statistic
5Where
A sample of n1 is selected from population 1
resulting in x1 successes
A sample of n2 is selected from population 2
resulting in x2 successes
6(No Transcript)
7Estimating a difference proportions using
confidence intervals
- Situation
- We have two populations (1 and 2)
- Let p1 denote the probability (proportion) of
success in population 1. - Let p2 denote the probability (proportion) of
success in population 2. - Objective is to estimate the difference in the
two population proportions d p1 p2.
8Confidence Interval for d p1 p2 100P 100(1
a)
9Example
- Estimating the increase in the mortality rate for
pipe smokers higher over that for non-smokers d
p2 p1
10Comparing Means
- Situation
- We have two normal populations (1 and 2)
- Let m1 and s1 denote the mean and standard
deviation of population 1. - Let m2 and s2 denote the mean and standard
deviation of population 1. - Let x1, x2, x3 , , xn denote a sample from a
normal population 1. - Let y1, y2, y3 , , ym denote a sample from a
normal population 2. - Objective is to compare the two population means
11We want to test either
12Consider the test statistic
13If
- will have a standard Normal distribution
- This will also be true for the approximation
(obtained by replacing s1 by sx and s2 by sy) if
the sample sizes n and m are large (greater than
30)
14Note
15(No Transcript)
16Example
- A study was interested in determining if an
exercise program had some effect on reduction of
Blood Pressure in subjects with abnormally high
blood pressure. - For this purpose a sample of n 500 patients
with abnormally high blood pressure were required
to adhere to the exercise regime. - A second sample m 400 of patients with
abnormally high blood pressure were not required
to adhere to the exercise regime. - After a period of one year the reduction in blood
pressure was measured for each patient in the
study.
17We want to test
The exercise group did not have a higher average
reduction in blood pressure
The exercise group did have a higher average
reduction in blood pressure
18The test statistic
19Suppose the data has been collected and
20The test statistic
21We reject H0 if
True hence we reject H0.
Conclusion There is a significant (a 0.05)
effect due to the exercise regime on the
reduction in Blood pressure
22Estimating a difference means using confidence
intervals
- Situation
- We have two populations (1 and 2)
- Let m1 denote the mean of population 1.
- Let m2 denote the mean of population 2.
- Objective is to estimate the difference in the
two population proportions d m1 m2.
23Confidence Interval for d m1 m2 100P 100(1
a)
24Example
- Estimating the increase in the average reduction
in Blood pressure due to the exercise regime d
m1 m2
25Sample size determination
- When comparing two or more populations
26Estimating a difference proportions using
confidence intervals
- Situation
- We have two populations (1 and 2)
- Let p1 denote the probability (proportion) of
success in population 1. - Let p2 denote the probability (proportion) of
success in population 2. - Objective is to estimate the difference in the
two population proportions d p1 p2.
27Confidence Interval for d p1 p2 100P 100(1
a)
where
Note B is determined by
- The sample sizes n1 and n2.The level of
confidence 1 a.The probability of success in
both populations, p1 and p2.
28Note if B, a, p1 and p2 are given
then
and
Note there are many solutions for n1 and n2.
29Special solutions - case 1 n1 n2 n.
then
and
30Special solutions - case 2 Choose n1 and n2 to
minimize N n1 n2 total sample size
Note
31hence
if
or
32Also
33Summary The sample sizes required, n1 and n2,
to estimate p1 p2 within an error bound B with
level of confidence 1 a are
if the objectives are to minimize the total
sample size N n1 n2 .
34Special solutions - case 3 Choose n1 and n2 to
minimize C C0 c1 n1 c2 n2 total cost of
the study
Note
C0 fixed (set-up) costs c1 cost per unit in
population 1 c2 cost per unit in population 2
35hence
if
or
36Also
37Summary The sample sizes required, n1 and n2,
to estimate p1 p2 within an error bound B with
level of confidence 1 a are
Summary The sample sizes required, n1 and n2,
to estimate p1 p2 within an error bound B with
level of confidence 1 a are
if the objectives are to minimize the total
cost C C0 c1 n1 c2 n2 .
38Example It is known that approximately 4 of
individuals aged 70-80 with high cholesterol
suffer a heart attack or stroke within a 10 year
period. One is interested in determining if this
rate is decreased for individuals who receive a
new medication
A study is proposed in which n1 individuals will
receive the new medication while n2 will receive
a placebo in a double blind study. double blind
study both patient and physician administering
the treatment are unaware of the treatment (drug
or placebo)
What should the sample sizes be in each group if
we want to estimate the difference in the rate of
heart attack or stroke within 0.5 with a 99
level of confidence and minimize the total
cost C C0 c1 n1 c2 n2 . Assume that the
cost for the medication is 100 times that of the
cost of administering a placebo
39The sample sizes required are
Where za/2 z0.005 2.576 B 0.005 p1 ? p2 ?
0.04 and
40hence
and
41Estimating a difference means using confidence
intervals
- Situation
- We have two populations (1 and 2)
- Let m1 denote the mean of population 1.
- Let m2 denote the mean of population 2.
- Objective is to estimate the difference in the
two population proportions d m1 m2.
42Confidence Interval for d m1 m2 100P
100(1 a)
43The sample sizes required, n1 and n2, to estimate
m1 m2 within an error bound B with level of
confidence 1 a are
Equal sample sizes
Minimizing the total sample size N n1 n2 .
Minimizing the total cost C C0 c1n1 c2n2 .
44Comparing Means small samples
- Situation
- We have two normal populations (1 and 2)
- Let m1 and s1 denote the mean and standard
deviation of population 1. - Let m2 and s2 denote the mean and standard
deviation of population 1. - Let x1, x2, x3 , , xn denote a sample from a
normal population 1. - Let y1, y2, y3 , , ym denote a sample from a
normal population 2. - Objective is to compare the two population means
45We want to test either
or
or
46Consider the test statistic
47If the sample sizes (m and n) are large the
statistic
will have approximately a standard normal
distribution
This will not be the case if sample sizes (m and
n) are small
48The t test for comparing means small samples
- Situation
- We have two normal populations (1 and 2)
- Let m1 and s denote the mean and standard
deviation of population 1. - Let m2 and s denote the mean and standard
deviation of population 1. - Note we assume that the standard deviation for
each population is the same. - s1 s2 s
49Let
50The pooled estimate of s.
Note both sx and sy are estimators of s.
These can be combined to form a single estimator
of s, sPooled.
51The test statistic
If m1 m2 this statistic has a t distribution
with n m 2 degrees of freedom
52are critical points under the t distribution with
degrees of freedom n m 2.
53Example
- A study was interested in determining if
administration of a drug reduces cancerous tumor
size. - For this purpose n m 9 test animals are
implanted with a cancerous tumor. - n 3 are selected at random and administered the
drug. - The remaining m 6 are left untreated.
- Final tumour sizes are measured at the end of the
test period
54We want to test
The treated group did not have a lower average
final tumour size.
vs
The exercize group did have a lower average final
tumour size.
55The test statistic
56Suppose the data has been collected and
57The test statistic
58We reject H0 if
with d.f. n m 2 7
Hence we accept H0.
Conclusion The drug treatment does not result in
a significant (a 0.05) smaller final tumour
size,