Title: Chapter 9 Comparing Two Groups
1Chapter 9Comparing Two Groups
- Learn .
- How to Compare Two Groups On a Categorical or
Quantitative Outcome Using Confidence Intervals
and Significance Tests
2Bivariate Analyses
- The outcome variable is the response variable
- The binary variable that specifies the groups is
the explanatory variable
3Bivariate Analyses
- Statistical methods analyze how the outcome on
the response variable depends on or is explained
by the value of the explanatory variable
4Independent Samples
- The observations in one sample are independent of
those in the other sample - Example Randomized experiments that randomly
allocate subjects to two treatments - Example An observational study that separates
subjects into groups according to their value for
an explanatory variable
5Dependent Samples
- Data are matched pairs each subject in one
sample is matched with a subject in the other
sample - Example set of married couples, the men being
in one sample and the women in the other. - Example Each subject is observed at two times,
so the two samples have the same people
6 Section 9.1
- Categorical Response How Can We Compare Two
Proportions?
7Categorical Response Variable
- Inferences compare groups in terms of their
population proportions in a particular category - We can compare the groups by the difference in
their population proportions - (p1 p2)
8Example Aspirin, the Wonder Drug
- Recent Titles of Newspaper Articles
- Aspirin cuts deaths after heart attack
- Aspirin could lower risk of ovarian cancer
- New study finds a daily aspirin lowers the risk
of colon cancer - Aspirin may lower the risk of Hodgkins
9Example Aspirin, the Wonder Drug
- The Physicians Health Study Research Group at
Harvard Medical School - Five year randomized study
- Does regular aspirin intake reduce deaths from
heart disease?
10Example Aspirin, the Wonder Drug
- Experiment
- Subjects were 22,071 male physicians
- Every other day, study participants took either
an aspirin or a placebo - The physicians were randomly assigned to the
aspirin or to the placebo group - The study was double-blind the physicians did
not know which pill they were taking, nor did
those who evaluated the results
11Example Aspirin, the Wonder Drug
- Results displayed in a contingency table
12Example Aspirin, the Wonder Drug
- What is the response variable?
- What are the groups to compare?
13Example Aspirin, the Wonder Drug
- The response variable is whether the subject had
a heart attack, with categories yes or no - The groups to compare are
- Group 1 Physicians who took a placebo
- Group 2 Physicians who took aspirin
14Example Aspirin, the Wonder Drug
- Estimate the difference between the two
population parameters of interest
15Example Aspirin, the Wonder Drug
- p1 the proportion of the population who would
have a heart attack if they participated in this
experiment and took the placebo - p2 the proportion of the population who would
have a heart attack if they participated in this
experiment and took the aspirin
16Example Aspirin, the Wonder Drug
Sample Statistics
17Example Aspirin, the Wonder Drug
- To make an inference about the difference of
population proportions, (p1 p2), we need to
learn about the variability of the sampling
distribution of
18Standard Error for Comparing Two Proportions
- The difference, , is obtained from
sample data - It will vary from sample to sample
- This variation is the standard error of the
sampling distribution of
19Confidence Interval for the Difference between
Two Population Proportions
- The z-score depends on the confidence level
- This method requires
- Independent random samples for the two groups
- Large enough sample sizes so that there are at
least 10 successes and at least 10 failures
in each group
20Confidence Interval Comparing Heart Attack Rates
for Aspirin and Placebo
21Confidence Interval Comparing Heart Attack Rates
for Aspirin and Placebo
- Since both endpoints of the confidence interval
(0.005, 0.011) for (p1- p2) are positive, we
infer that (p1- p2) is positive - Conclusion The population proportion of heart
attacks is larger when subjects take the placebo
than when they take aspirin
22Confidence Interval Comparing Heart Attack Rates
for Aspirin and Placebo
- The population difference (0.005, 0.011) is small
- Even though it is a small difference, it may be
important in public health terms - For example, a decrease of 0.01 over a 5 year
period in the proportion of people suffering
heart attacks would mean 2 million fewer people
having heart attacks
23Confidence Interval Comparing Heart Attack Rates
for Aspirin and Placebo
- The study used male doctors in the U.S
- The inference applies to the U.S. population of
male doctors - Before concluding that aspirin benefits a larger
population, wed want to see results of studies
with more diverse groups
24Interpreting a Confidence Interval for a
Difference of Proportions
- Check whether 0 falls in the CI
- If so, it is plausible that the population
proportions are equal - If all values in the CI for (p1- p2) are
positive, you can infer that (p1- p2) gt0 - If all values in the CI for (p1- p2) are
negative, you can infer that (p1- p2) lt0 - Which group is labeled 1 and which is labeled
2 is arbitrary
25Interpreting a Confidence Interval for a
Difference of Proportions
- The magnitude of values in the confidence
interval tells you how large any true difference
is - If all values in the confidence interval are near
0, the true difference may be relatively small in
practical terms
26Significance Tests Comparing Population
Proportions
- 1. Assumptions
- Categorical response variable for two groups
- Independent random samples
27Significance Tests Comparing Population
Proportions
- Assumptions (continued)
- Significance tests comparing proportions use the
sample size guideline from confidence intervals
Each sample should have at least about 10
successes and 10 failures - Twosided tests are robust against violations of
this condition - At least 5 successes and 5 failures is
adequate
28Significance Tests Comparing Population
Proportions
- 2. Hypotheses
- The null hypothesis is the hypothesis of no
difference or no effect - H0 (p1- p2) 0
- Under the presumption that p1 p2, we create a
pooled estimate of the common value of p1and p2 - This pooled estimate is
29Significance Tests Comparing Population
Proportions
- 2. Hypotheses (continued)
- Ha (p1- p2) ? 0 (two-sided test)
- Ha (p1- p2) lt 0 (one-sided test)
- Ha (p1- p2) gt 0 (one-sided test)
30Significance Tests Comparing Population
Proportions
31Significance Tests Comparing Population
Proportions
- 4. P-value Probability obtained from the
standard normal table - 5. Conclusion Smaller P-values give stronger
evidence against H0 and supporting Ha
32Example Is TV Watching Associated with
Aggressive Behavior?
- Various studies have examined a link between TV
violence and aggressive behavior by those who
watch a lot of TV - A study sampled 707 families in two counties in
New York state and made follow-up observations
over 17 years - The data shows levels of TV watching along with
incidents of aggressive acts
33Example Is TV Watching Associated with
Aggressive Behavior?
34Example Is TV Watching Associated with
Aggressive Behavior?
- Test the Hypotheses
- H0 (p1- p2) 0
- Ha (p1- p2) ? 0
- Using a significance level of 0.05
- Group 1 less than 1 hr. of TV per day
- Group 2 at least 1 hr. of TV per day
35Example Is TV Watching Associated with
Aggressive Behavior?
36Example Is TV Watching Associated with
Aggressive Behavior?
- Conclusion Since the P-value is less than 0.05,
we reject H0 - We conclude that the population proportions of
aggressive acts differ for the two groups - The sample values suggest that the population
proportion is higher for the higher level of TV
watching
37Section 9.2
- Quantitative Response How Can We Compare Two
Means?
38Comparing Means
- We can compare two groups on a quantitative
response variable by comparing their means
39Example Teenagers Hooked on Nicotine
- A 30-month study
- Evaluated the degree of addiction that teenagers
form to nicotine - 332 students who had used nicotine were evaluated
- The response variable was constructed using a
questionnaire called the Hooked on Nicotine
Checklist (HONC)
40Example Teenagers Hooked on Nicotine
- The HONC score is the total number of questions
to which a student answered yes during the
study - The higher the score, the more hooked on nicotine
a student is judged to be
41Example Teenagers Hooked on Nicotine
- The study considered explanatory variables, such
as gender, that might be associated with the HONC
score
42Example Teenagers Hooked on Nicotine
- How can we compare the sample HONC scores for
females and males? - We estimate (µ1 - µ2) by (x1 - x2)
- 2.8 1.6 1.2
- On average, females answered yes to about one
more question on the HONC scale than males did
43Example Teenagers Hooked on Nicotine
- To make an inference about the difference between
population means, (µ1 µ2), we need to learn
about the variability of the sampling
distribution of
44Standard Error for Comparing Two Means
- The difference, , is obtained from
sample data. It will vary from sample to sample. - This variation is the standard error of the
sampling distribution of
45Confidence Interval for the Difference between
Two Population Means
- A 95 CI
- Software provides the t-score with right-tail
probability of 0.025
46Confidence Interval for the Difference between
Two Population Means
- This method assumes
- Independent random samples from the two groups
- An approximately normal population distribution
for each group - this is mainly important for small sample sizes,
and even then the method is robust to violations
of this assumption
47Example Nicotine How Much More Addicted Are
Smokers than Ex-Smokers?
- Data as summarized by HONC scores for the two
groups - Smokers x1 5.9, s1 3.3, n1 75
- Ex-smokersx2 1.0, s2 2.3, n2 257
48Example Nicotine How Much More Addicted Are
Smokers than Ex-Smokers?
- Were the sample data for the two groups
approximately normal? - Most likely not for Group 2 (based on the sample
statistics) x2 1.0, s2 2.3) - Since the sample sizes are large, this lack of
normality is not a problem
49Example Nicotine How Much More Addicted Are
Smokers than Ex-Smokers?
- 95 CI for (µ1- µ2)
- We can infer that the population mean for the
smokers is between 4.1 higher and 5.7 higher than
for the ex-smokers
50How Can We Interpret a Confidence Interval for a
Difference of Means?
- Check whether 0 falls in the interval
- When it does, 0 is a plausible value for (µ1
µ2), meaning that it is possible that µ1 µ2 - A confidence interval for (µ1 µ2) that contains
only positive numbers suggests that (µ1 µ2) is
positive - We then infer that µ1 is larger than µ2
51How Can We Interpret a Confidence Interval for a
Difference of Means?
- A confidence interval for (µ1 µ2) that contains
only negative numbers suggests that (µ1 µ2) is
negative - We then infer that µ1 is smaller than µ2
- Which group is labeled 1 and which is labeled
2 is arbitrary
52Significance Tests Comparing Population Means
- 1. Assumptions
- Quantitative response variable for two groups
- Independent random samples
53Significance Tests Comparing Population Means
- Assumptions (continued)
- Approximately normal population distributions for
each group - This is mainly important for small sample sizes,
and even then the two-sided test is robust to
violations of this assumption
54Significance Tests Comparing Population Means
- 2. Hypotheses
- The null hypothesis is the hypothesis of no
difference or no effect - H0 (µ1- µ2) 0
-
55Significance Tests Comparing Population
Proportions
- 2. Hypotheses (continued)
- The alternative hypothesis
- Ha (µ1- µ2) ? 0 (two-sided test)
- Ha (µ1- µ2) lt 0 (one-sided test)
- Ha (µ1- µ2) gt 0 (one-sided test)
56Significance Tests Comparing Population Means
57Significance Tests Comparing Population Means
- 4. P-value Probability obtained from the
standard normal table - 5. Conclusion Smaller P-values give stronger
evidence against H0 and supporting Ha
58Example Does Cell Phone Use While Driving
Impair Reaction Times?
- Experiment
- 64 college students
- 32 were randomly assigned to the cell phone group
- 32 to the control group
59Example Does Cell Phone Use While Driving
Impair Reaction Times?
- Experiment (continued)
- Students used a machine that simulated driving
situations - At irregular periods a target flashed red or
green - Participants were instructed to press a brake
button as soon as possible when they detected a
red light
60Example Does Cell Phone Use While Driving
Impair Reaction Times?
- For each subject, the experiment analyzed their
mean response time over all the trials - Averaged over all trials and subjects, the mean
response time for the cell-phone group was 585.2
milliseconds - The mean response time for the control group was
533.7 milliseconds
61Example Does Cell Phone Use While Driving
Impair Reaction Times?
62Example Does Cell Phone Use While Driving
Impair Reaction Times?
- Test the hypotheses
- H0 (µ1- µ2) 0
- vs.
- Ha (µ1- µ2) ? 0
- using a significance level of 0.05
63Example Does Cell Phone Use While Driving
Impair Reaction Times?
64Example Does Cell Phone Use While Driving
Impair Reaction Times?
- Conclusion
- The P-value is less than 0.05, so we can reject
H0 - There is enough evidence to conclude that the
population mean response times differ between the
cell phone and control groups - The sample means suggest that the population mean
is higher for the cell phone group
65Example Does Cell Phone Use While Driving
Impair Reaction Times?
- What do the box plots tell us?
- There is an extreme outlier for the cell phone
group - It is a good idea to make sure the results of the
analysis arent affected too strongly by that
single observation - Delete the extreme outlier and redo the analysis
- In this example, the t-statistic changes only
slightly
66Example Does Cell Phone Use While Driving
Impair Reaction Times?
- Insight
- In practice, you should not delete outliers from
a data set without sufficient cause (i.e., if it
seems the observation was incorrectly recorded) - It is however, a good idea to check for
sensitivity of an analysis to an outlier - If the results change much, it means that the
inference including the outlier is on shaky ground
67How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week.
Gender Sample Size Mean St. Dev.
Women 6764 32.6 18.2
Men 4252 18.1 12.9
- What is a point estimate of µ1- µ2?
- 18.2 12.9
- 32.6 18.1
- 6764 - 4252
- 32.6/18.2 18.1/12.9
68How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week.
Gender Sample Size Mean St. Dev.
Women 6764 32.6 18.2
Men 4252 18.1 12.9
- What is the standard error for comparing the
means? - 5.3
- .076
- .297
- .088
69How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week. How much more time do women spend on housework than men? Data is Hours per Week.
Gender Sample Size Mean St. Dev.
Women 6764 32.6 18.2
Men 4252 18.1 12.9
- What factor causes the standard error to be
small compared to the sample standard deviations
for the two groups? - sample means
- sample standard deviations
- sample sizes
- genders
70Section 9.3
- Other Ways of Comparing Means and Comparing
Proportions
71Alternative Method for Comparing Means
- An alternative t- method can be used when, under
the null hypothesis, it is reasonable to expect
the variability as well as the mean to be the
same - This method requires the assumption that the
population standard deviations be equal
72The Pooled Standard Deviation
- This alternative method estimates the common
value s of s1 and s1 by -
73Comparing Population Means, Assuming Equal
Population Standard Deviations
- Using the pooled standard deviation estimate, a
95 CI for (µ1 - µ2) is - This method has df n1 n2- 2
74Comparing Population Means, Assuming Equal
Population Standard Deviations
- The test statistic for H0 µ1µ2 is
- This method has df n1 n2- 2
75Comparing Population Means, Assuming Equal
Population Standard Deviations
- These methods assume
- Independent random samples from the two groups
- An approximately normal population distribution
for each group - This is mainly important for small sample sizes,
and even then, the CI and the two-sided test are
usually robust to violations of this assumption - s1s2