Title: Comparing Two Population Proportions
1Comparing Two Population Proportions
- Goal Compare two populations/treatments wrt a
nominal (binary) outcome - Sampling Design Independent vs Dependent Samples
- Methods based on large vs small samples
- Contingency tables used to summarize data
- Measures of Association Absolute Risk, Relative
Risk, Odds Ratio
2Contingency Tables
- Tables representing all combinations of levels of
explanatory and response variables - Numbers in table represent Counts of the number
of cases in each cell - Row and column totals are called Marginal counts
32x2 Tables - Notation
4Example - Firm Type/Product Quality
- Groups Not Integrated (Weave only) vs
Vertically integrated (Spin and Weave) Cotton
Textile Producers - Outcomes High Quality (High Count) vs Low
Quality (Count)
Source Temin (1988)
5Notation
- Proportion in Population 1 with the
characteristic of interest p1 - Sample size from Population 1 n1
- Number of individuals in Sample 1 with the
characteristic of interest X1 - Sample proportion from Sample 1 with the
characteristic of interest - Similar notation for Population/Sample 2
6Example - Cotton Textile Producers
- p1 - True proportion of all Non-integretated
firms that would produce High quality - p2 - True proportion of all vertically
integretated firms that would produce High
quality
7Notation (Continued)
- Parameter of Primary Interest p1-p2, the
difference in the 2 population proportions with
the characteristic (2 other measures given below) - Estimator
- Standard Error (and its estimate)
- Pooled Estimated Standard Error when p1p2p
8Cotton Textile Producers (Continued)
- Parameter of Primary Interest p1-p2, the
difference in the 2 population proportions that
produce High quality output - Estimator
- Standard Error (and its estimate)
- Pooled Estimated Standard Error when p1p2p
9Confidence Interval for p1-p2 (Wilsons Estimate)
- Method adds a success and a failure to each group
to improve the coverage rate under certain
conditions - The confidence interval is of the form
10Example - Cotton Textile Production
95 Confidence Interval for p1-p2
Providing evidence that non-integrated producers
are more likely to provide high quality output
(p1-p2 gt 0)
11Significance Tests for p1-p2
- Deciding whether p1p2 can be done by
interpreting plausible values of p1-p2 from the
confidence interval - If entire interval is positive, conclude p1 gt p2
(p1-p2 gt 0) - If entire interval is negative, conclude p1 lt p2
(p1-p2 lt 0) - If interval contains 0, do not conclude that p1 ?
p2 - Alternatively, we can conduct a significance
test - H0 p1 p2 Ha p1 ? p2 (2-sided) Ha
p1 gt p2 (1-sided) - Test Statistic
- P-value 2P(Z?zobs) (2-sided) P(Z?
zobs) (1-sided)
12Example - Cotton Textile Production
Again, there is strong evidence that
non-integrated performs are more likely to
produce high quality output than integrated firms
13Measures of Association
- Absolute Risk (AR) p1-p2
- Relative Risk (RR) p1 / p2
- Odds Ratio (OR) o1 / o2 (o p/(1-p))
- Note that if p1 p2 (No association between
outcome and grouping variables) - AR0
- RR1
- OR1
14Relative Risk
- Ratio of the probability that the outcome
characteristic is present for one group, relative
to the other - Sample proportions with characteristic from
groups 1 and 2
15Relative Risk
95 Confidence Interval for Population Relative
Risk
16Relative Risk
- Interpretation
- Conclude that the probability that the outcome is
present is higher (in the population) for group 1
if the entire interval is above 1 - Conclude that the probability that the outcome is
present is lower (in the population) for group 1
if the entire interval is below 1 - Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 1
17Example - Concussions in NCAA Athletes
- Units Game exposures among college socer players
1997-1999 - Outcome Presence/Absence of a Concussion
- Group Variable Gender (Female vs Male)
- Contingency Table of case outcomes
Source Covassin, et al (2003)
18Example - Concussions in NCAA Athletes
There is strong evidence that females have a
higher risk of concussion
19Odds Ratio
- Odds of an event is the probability it occurs
divided by the probability it does not occur - Odds ratio is the odds of the event for group 1
divided by the odds of the event for group 2 - Sample odds of the outcome for each group
20Odds Ratio
95 Confidence Interval for Population Odds Ratio
21Odds Ratio
- Interpretation
- Conclude that the probability that the outcome is
present is higher (in the population) for group 1
if the entire interval is above 1 - Conclude that the probability that the outcome is
present is lower (in the population) for group 1
if the entire interval is below 1 - Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 1
22Osteoarthritis in Former Soccer Players
- Units 68 Former British professional football
players and 136 age/sex matched controls - Outcome Presence/Absence of Osteoathritis (OA)
- Data
- Of n1 68 former professionals, X1 9 had OA,
n1-X159 did not - Of n2 136 controls, X2 2 had OA, n2-X2134 did
not
Interval gt 1
Source Shepard, et al (2003)
23Fishers Exact Test
- Method of testing for association for 2x2 tables
when one or both of the group sample sizes is
small - Measures (conditional on the group sizes and
number of cases with and without the
characteristic) the chances we would see
differences of this magnitude or larger in the
sample proportions, if there were no differences
in the populations
24Example Echinacea Purpurea for Colds
- Healthy adults randomized to receive EP (n1.24)
or placebo (n2.22, two were dropped) - Among EP subjects, 14 of 24 developed cold after
exposure to RV-39 (58) - Among Placebo subjects, 18 of 22 developed cold
after exposure to RV-39 (82) - Out of a total of 46 subjects, 32 developed cold
- Out of a total of 46 subjects, 24 received EP
Source Sperber, et al (2004)
25Example Echinacea Purpurea for Colds
- Conditional on 32 people developing colds and 24
receiving EP, the following table gives the
outcomes that would have been as strong or
stronger evidence that EP reduced risk of
developing cold (1-sided test). P-value from SPSS
is .079.
26Example - SPSS Output
27McNemars Test for Paired Samples
- Common subjects being observed under 2 conditions
(2 treatments, before/after, 2 diagnostic tests)
in a crossover setting - Two possible outcomes (Presence/Absence of
Characteristic) on each measurement - Four possibilities for each subjects wrt outcome
- Present in both conditions
- Absent in both conditions
- Present in Condition 1, Absent in Condition 2
- Absent in Condition 1, Present in Condition 2
28McNemars Test for Paired Samples
29McNemars Test for Paired Samples
- H0 Probability the outcome is Present is same
for the 2 conditions - HA Probabilities differ for the 2 conditions
(Can also be conducted as 1-sided test)
30Example - Juveniles Tried as Adults
- Subjects - 2097 pairs of juveniles matched on
prior criminal record and severity of current
crime - Condition Adult vs Juvenile Court (one of each
in pair) - Outcome Whether juvenile was re-arrested during
follow-up
Source Bishop et al (1996)
31Example - Juveniles Tried as Adults
- H0 Tendency to for rearrest is not different
between children tried as adults as those tried
as juveniles - HA Tendencies differ
Evidence that tendencies differ (higher risk of
rearrest among juveniles tried in adult court)
32Data Sources
- Temin, P. (1988). Product Quality and Vertical
Integration in the Early Cotton Textile
Industry, The Journal of Economic History,
48(4), pp891-907 - Covassin, T., C.B. Swanik, and M.L. Sachs (2003).
Sex Differences and the Incidence of Concussions
Among Collegiate Athletes, Journal of Athletic
Training, 38(3) pp238-244. - Shepard, G.J., A.J. Banks, and W.G. Ryan (2003).
Ex-Professional Association Footballers Have an
Increased Prevalence of Osteoarthritis of the Hip
Compared with Age Matched Controls Desite Not
Having Sustained Notable Hip Injuries, British
Journal of Sports Medicine, 37, pp80-81. - Sperber, S.J., L.P. Shah, R.D. Gilbert, et al
(2004). Echinacea purpurea for Prevention of
Experimental Rhinovirus Colds, Clinical
Infectious Diseases, 38, pp1367-1371. - Bishop,D.M, C.E. Frazier, L. Lanza-Kaduce, L.
Winner (1996). The Transfer of Juveniles to
Criminal Court Does it Make a Difference? Crime
Delinquency, 42, pp171-191.