Title: Proportions, Contingency Tables, and Measures of Association
1Proportions, Contingency Tables, and Measures of
Association
Ellen P. Fischer, Ph.D. Departments of Psychiatry
and of Epidemiology GCRC Clinical Research
Course March 11, 2005
2Acknowledgement and Absolution
Much credit for the content and organization of
these sessions must be given to Paula K.
Roberson, Ph.D. Chair, Division and Department of
Biostatistics None of the errors should be
attributed to Dr. Roberson
3Longitudinal (fictitious) data from a
representative national sample of patients seen
in hospital-affiliated outpatient clinics re
medication adherence for incident hypertension
over a 12 month period
gt 80 adherence 1175 lt 80 adherence 3825 Total
5000
4Proportion adherent (p) x/n 1175/5000
0.235 95 Confidence Interval (CI) p 1.96
SE (p) where SE(p) v (p (1-p)/n) SE
v ((0.235) (1-0.235)/5000
0.006 95 CI 0.235 1.96 (0.006)
(0.223 0.247)
5Rate frequency of an event or attribute in a
group at risk for the event/attribute over a
specified period of time Adherence rate x/n x
10k over a specified time 1175/5000 x 100
23.5 per year 1175/5000 x 10,000
2,350 per 10,000 per year
6Estimate prevalence of malignant melanoma in
45-54 year old women in U.S. Random sample of n
5000 x 28 women have the disease Point
estimate of prevalence p 28/5000
.0056 SE(p)
0.0011
795 Confidence Interval p 1.96 SE (p) .0056
1.96 (.0011) (.0034, .0078) Exact 95 Cl using
binomial distribution is (0.0037, 0.0081)
812-month longitudinal study of Neo-med v
Archaeo-med in the treatment of incident
hypertension data on medication adherence
9- Difference in effect (proportions adherent)
- Statistical significance of the difference
- Relative magnitude of the effect
10Difference of Proportions
95 Confidence Interval (p1 - p2) 1.96 SE(p1
- p2) where SE(p1 - p2)
11Difference in proportions adherent Neo-med
94/200 0.47 Archaeo-med 47/200
0.235 Difference (0.47-0.235)
0.235 Precision of the estimate
SE(difference) v ((.47(.53)/200) (.235
(.765)/200)) 0.046 95 CI 0.235
1.96 (0.046) 0.235 0.091 (0.144 0.326)
12Significance of the effect Chi-square (X2)
test of association or
Normal approximation
13Chi-square test of association X2 ?
(Observed Expected)2/Expected df
(rows-1)(columns-1)
14- If there is no association between adherence and
type of - medication, the probability of being gt80
adherent and taking - Neo-med should be the product of the individual
probabilities - Probability of being gt 80 adherent and taking
Neo-med 94/400 0.235 - Probability of being gt 80 adherent 141/400
0.3525 - Probability of taking Neo-med 200/400 0.5
- Product of probabilities 0.3514 x 0.5 0.17625
- 0.17625 ? 0.235
15 Where i, j index the rows and columns of the
table ni,j are the cell entries and Ei,j are
the expected cell entries if the probabilities
were independent. The degrees of freedom
for the ?2 are (r-1)x(c-1) where r and c are the
number of rows and columns, respectively.
16Expected number gt 80 adherent and taking Neo-med
(141 x 200)/400 70.5
17 X2 ? (Observed Expected)2/Expected df
(rows-1)(columns-1) 1 From tables for 1
df, the a 0.001 critical value is 10.83 We
reject the null hypothesis of no association
between adherence and medication
18- The ?2 can be used for any number of rows and
columns. - Chi-square test is inherently two-sided.
- The distribution is only approximately
chi-square. - This approximation is very good if all the
expected - cell frequencies are greater than 5. Otherwise
use - Fishers Exact Test.
- This test is appropriate if the sampling is based
on - two independent groups which are then classified
- according to a second factor (as in the example)
- or if a total sample of size n is identified and
then - cross-classified according to two factors.
19Study of wart therapy - ?2 example with more than
2 proportions 3 randomized treatment groups
(fictitious data) 100 patients assigned to
each Placebo 40/100 clear warts in 2
weeks Therapy A 53/100 clear warts in 2
weeks Therapy B 71/100 clear warts in 2 weeks
P A
B TOTAL Clear 40 53 71 164 Dont
60 47 29 136 Is there evidence for
differences in clearance rates?
20- The chi-square statistic is 19.557 with 2 df
- --Highly significant (plt0.0001)
- We would conclude that there is a significant
- effect of treatment on the rate of wart
clearance. - This is a global test among all three groups.
- To determine which groups are significantly
different from - each other, multiple comparisons procedures
(similar - to those discussed in ANOVA) could be used.
21Comparison of medication adherence among study
participants with incident hypertension
prescribed Neo-med vs Archaeo-med
- Difference in effect (proportions adherent)
23.5 - percentage points
- Statistical significance of the difference
plt.001 - Relative magnitude of the effect??
22- Relative Risk/Risk Ratio the ratio of risk in
the exposed to the risk in the unexposed -
- RR
- Can be measured directly in longitudinal
(prospective, cohort, follow-up) studies
Incidence in the exposed Incidence in the
unexposed
23Measures of Association Relative Risk
Alcohol No Consumption Breast Cancer Breast
Cancer Total
? 6 drinks/day 14 286 300 None (No
Exposure) 7 493 500 Total 21 779 800
Incidence in the Exposed 14/300 Incidence in the
Non-Exposed 7/500
0.047 0.014
?
Relative Risk
3.3
24Measures of Association Relative Risk
Relative Risk values range from 0 to
8
RR gt1 Variable is associated with increased risk
(Risk Factor) RR 1 Variable does not affect
risk RR lt1 Variable is associated with lower
risk (Protective Factor)
25Measures of Association Relative Risk
Lifetime History of No Breast Feeding Breast
Cancer Breast Cancer Total
Ever (Exposed) 20 280 300 Never (Not
Exposed) 107 393 500 Total 127 673 800
Incidence in the Exposed 20/300 Incidence in
the Non-Exposed 107/500
0.067 0.214
?
Relative Risk
0.31
26Comparison of Relative and Attributable Risk
Annual Death Rates per 100,000 Persons
Smoking Status Lung Cancer Coronary Heart
Disease
Heavy Smokers 166 599 Non Smokers 7 422
Relative Risk
Attributable Risk 166 - 7 159 599 - 422
177
Doll and Hill, BMJ 21071, 1956
27Annual Death Rates per 100,000Heavy Smokers
- Attributable Risk - among heavy smokers, what is
the lung cancer mortality rate attributable to
smoking?
599
600 400 200 0
422
166 - 7/100,000/year 159/100,000/year
Lung Cancer
CHD
28Annual Death Rates per 100,000Heavy Smokers
- Attributable Risk Fraction - among heavy smokers,
what proportion of the mortality rate is
attributable to smoking?
599
600 400 200 0
422
159
LC 166 per 100,000/year 95.8
177
CHD 599 per 100,000/year 29.5
Lung Cancer
CHD
29Relative Risk cannot be calculated from
case-control data Odds Ratio good estimator of
relative risk in a case-control study, if the
disease (outcome/event) is rare Derivation
suppose in a large cohort study we can accurately
classify every individual as /- for disease
status Disease Exposure
- Total
A B
AB - C
D CD Total
A C B D NABCD
30Relative Risk is
If the number of diseased individuals is small
relative to the number of non-diseased
individuals C D ? D and A B ?
B so the Relative Risk is approximately AD (cro
ss product ratio) BC
31 Odds Ratio (OR) Odds of exposure for
those with the disease RR ?
AD/BC OR In practice, unfortunately, Odds
Ratios are often interpreted as though they were
Relative Risks rather than estimates of Relative
Risks
Odds of exposure for those without the disease
A/C B/D
AD BC
32Example Data on males from hospital-based
non-matched case- control study of pancreatitis
in eastern Massachusetts and Rhode Island
between 1975 and 1979. Cigarette use
Cases Controls Never
2 56 Former 13
80 Current
38
81 53 217
Odds Ratio for ex-smokers relative to never
smokers (13)(56) 4.55
(80) (2)
33 Interpreting the OR as RR, we conclude that the
risk of pancreatitis among ex-smokers is 4.55
times the risk among those who never smoked If
we had calculated the odds for never smokers
relative to ex-smokers we would have had OR
(80)(2)(13)(56) 0.22 i.e., those who never
smoked had 22 of the risk of pancreatitis of
former smokers
34Example In a seroepidemiology survey of health
workers with a spectrum of exposure to blood and
patients with hepatitis B virus (HBV), it was
found that infection was associated with
frequency of contact. Data for workers of
uniform socioeconomic status at a teaching
hospital in Boston, Massachusetts Personnel Exp
osure n HBV Physicians Frequent
81 17
Infrequent 89 7 Nurses
Frequent 104 22
Infrequent 126 11
35Proportion HBV-positive Physician Frequent
exposure 1781 0.210 Infrequent exposure
789 0.079 Nurses Frequent exposure
22104 0.212 Infrequent exposure 11126
0.087
36Physicians
HBV Status Exposure
Positive Negative
Total Frequent 17 (81-17) 64
81 Infrequent 7 (89-7)
82 89 OR (17)(89-7) (17)(82)
177 3.11 (7)(8 -17)
(7)(64) 6482 Physicians who are HBV
positive are 3.11 times as likely to have had
frequent exposure as those who are HBV negative.
37Nurses HBV
Status Exposure Positive
Negative Total Frequent
22 (104-22)82
104 Infrequent 11
(126-11)115 126 OR (22)(115) 2211
2.80 (11)(82)
82115 Nurses who are HBV are 2.8 times as
likely to have had frequent exposure as those
who are HBV- Difference in ORs for nurses (2.8)
and physicians (3.11) is likely to be within
random error, particularly given the sample
size.
38 Confidence intervals for Odds Ratios
Several methods for calculation of confidence
intervals for odds ratios this is only
one method Recall 2x2 table showing data
from case - control study Exposed
Unexposed Diseased a
b Disease - Free c
d
OR ad bc
39Confidence interval derived from a normal
approximation to sampling distribution of
ln(OR) Note ln is logarithm to base e, i.e.,
natural log Varianceln(OR) ?
4095 Confidence Interval for OR is Exp ln OR
1.96 reversing the
logs to get back to original units
41Example Case-control study examining role of
smoking in pancreatitis Use of Cigarettes
Cases Controls Current smokers
38 81 Ex-smokers 13 80 Never
2 56 Ex-smokers compared to never
smokers OR ad/bc(1356)/(280)4.55
4295 CI Exp ln OR (1.96 x SE
(lnOR) Expln 4.55 1.96
Exp-0.0123, 3.0425 (0.99, 20.96)
Interpretation
43Multivariable/multivariate analysis of data
with categorical outcome/dependent
variables Logistic Regression Polychotomous/Mu
ltinominal Logistic Regression Model building
and testing follow the same processes followed in
linear regression
44Far better an approximate answer to the right
question,which is often vague, than an exact
answer to the wrong question, which can always
be made precise.
----J. Tukey, 1962
45The government is extremely fond of amassing
great quantities of statistics. These are
raised to the 9th degree, the cube roots are
extracted, and the results are arranged into
elaborate and impressive displays. What must be
kept ever in mind, however, is that in every
case, the figures are first put down by a
village watchman, and he puts down anything he
damn well pleases. ---Sir Josiah Stamp
46It is three times as dangerous to be an
intoxicated pedestrian as to be an intoxicated
driver. This is shown by the fact that in one
year 13,943 intoxicated pedestrians were injured
and only 4,399 intoxicated drivers.
47Among the approximately 70,000 plant employees of
the American Tobacco Company, the per capita
cigarette consumption was higher and there were
twice as many cigarette smokers and heavy smokers
as in the general population. Despite these
facts, compared with general population, ATC
employees showed (a) increased longevity (b)
lower death rates, at each age, from all forms of
cancer and from cardiovascular disease and (c)
essentially similar death rates, at each age,
from respiratory cancer. Since no sampling error
was involved in methodology, this evidence bears
heavily against the claims that cigarette smoking
is associated with increased mortality from all
causes, from lung cancer, and from cardiovascular
diseases.
48The infant mortality rate in North Southbury has
declined by one point a year. In 1925 it was 50
per 1000 live births in 1950 it was 25 per 1000
and in 1957 it was 18 per 1000. It is predicted
that the rate will become zero in 1975.
49Study of the records of women who had had a
certain operation has revealed that, on average,
the more children a woman had after this surgery,
the longer she survived after the operation. The
conclusion was that pregnancy increased life
expectancy in this condition.
50From the New Yorker, September 20, 1951
Husbands are more lethal to wives than lovers.
Out of 324 murdered women, 102 were done in by
husbands, 49 by lovers, 37 by relatives, the rest
by strangers.
51In the North Dakota Study of the Epidemiology of
Coronary Heart Disease, a total of 228 males 35
years of age and over developed coronary heart
disease during the year of study (1957). Of
these cases almost half, namely 101, occurred
among farmers, and yet the incidence rate for
farmers was much lower than the rate for others.
How can this be?
52In a recent survey of graduates from Boston
business administration schools, 70 of the
replies came from Harvard graduates. It seems a
shame that graduates of other schools (Boston
University, M.I.T. and Northeastern) would not
cooperate as well as the Harvard graduates did.
53A recent study, taking into account the fact that
1 person in 8 eventually develops cancer, showed
that of cancer patients only 1 in 20 developed a
second cancer. This is evidence of immunity
developed as a result of the first chance.
54The risk of death from congenital malformations
has almost doubled in this country. In 1920,
only 7.3 of infant deaths were due to congenital
malformations, but in 1950, 13.7 of infant
deaths were due to this cause.
55In an article entitled, Myelofibrosis Clinical,
Hematologic and Pathologic Study of 110 Patients
(Amer. J. Med. Sci., June. 1962), the authors
report of the 85 patients for whom we have a
complete follow-up, 55 are dead and 30 are still
living. The average survival time of these 55
patients from the diagnosis of myelofibrosis to
their death was 2 years and three months. The
authors conclude, The average duration of life
of our patients after diagnosis was between 2 and
3 years.
56Though heart attacks are often associated by
laymen with over-exertion, they are far more
likely to occur during periods of rest. Over
half of the victims of coronary heart attacks are
stricken while resting or sleeping. Less than 2
per cent are afflicted when engaging in sports,
running, lifting, or moving a load.
57After 50 of the population of a village has been
attacked by cholera, the remainder is vaccinated.
The results of the vaccination are excellent,
since only 25 of its vaccinated population
develops the disease.
58A study of accidents in one of the states showed
that 61 percent of those involved in accidents
have spent more than 10 years behind the wheel.
The study also showed that 21 percent of those
involved in accidents had six to ten years
driving experience, and 17 percent, one to five.
The conclusion was that Apparently drivers
become more complacent about their driving as the
years go by. As a consequence their records
become worse.
59A new and puzzling disease has become epidemic in
the Midwest. It is noted that 80 of people
affected live within a mile of a railroad track.
It is therefore obvious that railroads somehow
enter into the epidemiology of the disease.
60One of the leading American magazines recently
ran a quiz entitled Are you a potential
alcoholic? One of the questions was Did you
enjoy playing hopscotch as a child?The
discussion was If you answer yes to this
question, you are a potential alcoholic because
80 of all alcoholics who took the quiz answered
yes.
61In the city of Halifax, N.S. several years ago,
over 90 of the children were immunized against
diphtheria yet, the number of cases in immunized
children during an outbreak of this disease was
about the same as the number in the unimmunized.
Discuss the possible factors underlying such
figures and evaluate the apparent effectiveness
of the immunization program.
62Immediately following the 1918 influenza
pandemic, there was a sharp drop in the
tuberculosis death rate in the U.S. This shows
that an attack of influenza protects against
tuberculosis.
63In Abletown, 80 of deaths in the age group 5-9
are due to malaria, while in Bakerville only 50
of deaths in this group are ascribed to malaria.
The malaria problem in Abletown is therefore much
more severe.
64That pneumonia constitutes a more serious problem
for industry than the common cold is indicated by
the fact that the individuals who contract
pneumonia remain away from work approximately 10
days, whereas individuals with colds are confined
at home only 2 days.
65A survey of physicians who had treated patients
who died from appendicitis indicates that roughly
60 of such patients were known to take laxatives
from time to time. This evidence indicates that
the public should be warned about the alarming
risk involved in taking laxatives.
66In England and Wales in 1931, deaths from heart
disease accounted for 13.8 of all deaths among
malesin the age group 45-59. By 1949, this rate
had risen to 20.8. This clearly indicates the
effect of stress on the modern male.
67News item from the Boston Daily Globe, Tuesday,
January 18, 1955More Americans attended
concerts last year than attended professional
baseball games.What factors should be
considered before concluding that Americans are
more music- than sports-minded?
68For white males, 15 years of age or older in the
U.S. in 1950, the death rate among single males
was 7.6 per 1000 and for married males was 12.0
per 1000. This makes it clear that it is
healthier to stay single.
69Letter to the Editor, Journal of the American
Public Health Association, 1922, p. 857Among
the professions in the U.S., physicians head the
list of suicides for the year 1921. The
following figures are interesting
Physicians 86 Editors 10
Judges 57 Mayors 7 Bank Presidents
37 Members of Clergymen 21
legislature 7 This record seems to indicate
that the occupational strain is greater in
medicine than in any of the other professions.
Should our scheme of medical practice as it
relates to hours and relief be revised and, if
so, how should this be accomplished?
70There are interesting minor variations in the
frequency of accidents by time of day and day of
week afternoons and weekends are slightly more
likely to be the time when home accidents occur,
indicating that fatigue or presence of more
people in the home may play a minor part in the
cause of some accidents.
71As dangerous as winter driving is, only 3.6 of
fatalities occur on snow and ice.