Title: Linear Models III Thursday May 31, 10:15-12:00
1Linear Models IIIThursday May 31, 1015-1200
- Deborah Rosenberg, PhD
- Research Associate Professor
- Division of Epidemiology and Biostatistics
- University of IL School of Public Health
- Training Course in MCH Epidemiology
2Ordinal and Nominal Outcomes
- Outcomes with More than 2 Categories
- Examples of Outcomes which might be suited for
ordinal or nominal regression - Ordinal or Nominal bmi categories
- Nominal cause of death categories
- Ordinal or nominal severity of illness categories
- Ordinal or nominal categories of program
participation
3Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- The primary motivation for using a logistic model
with an ordinal outcome is to accommodate a truly
ordinal variable that has a "ceiling" and "floor"
effect and one in which the intervals between
each response category can be somewhat arbitrary
that is, it is not a continuous variable. - Modeling an ordinal outcome as a continuous
variable can yield biased results because it will
yield predicted values outside the range of the
ordinal variable.
4Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- An ordered outcome may reflect an underlying
continuous variable for which we have no data or
for which we don't know the "real" threshold
values. - For example, a Likert scale for satisfactionvery
dissatisfied to very satisfiedor for
agreementstrongly disagree to strongly agreehas
response categories reflecting a continuous scale
for which there is no data.
5Modeling Ordinal Outcomes
- Some other ordinal variables that may reflect an
underlying continuous construct that cannot be
measured as such. The ordered values are
intended to reflect distinct threshold values. - Examples of ordinal variables of this type
- access to care index
- reports of experience of life stress
- assessment of overall health status
- satisfaction with care
4
6Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- To appropriately model an outcome as ordinal, the
proportional odds assumption must hold. - The proportional odds assumption
- if an independent variable increases (or
decreases) the odds of being in category 1 v. the
remaining categories, then it also similarly
increases (or decreases) the odds of being in
category 2 and 1 combined v. the remaining
categories, in categories 3, 2, and 1 combined v.
the remaining categories, etc.
7Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- The null hypothesis for the proportional odds
assumption is that the odds ratios for the
association between a risk factor and an ordinal
outcome are constant regardless of how the
category boundaries are drawn. - If the proportional odds assumption holds, then
the association between an independent variable
and the outcome can be expressed as a single
summary estimatea common odds ratioacross all
categories.
8Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- The proportional odds assumption can be tested
with a chi-square statistic a score test. A
nonsignificant result means that the null
hypothesis will not be rejected and that the
cumulative logit model is appropriate a
significant result means that the proportional
odds assumption may not hold.
9Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- For an ordered outcome with k categories
- Both the numerator and denominator change
- http//www.indiana.edu/7Estatmath/stat/all/cat/2b
1.html
10Ordinal and Nominal Outcomes
- Odds Among the exposed a / bcd
- Odds Among the exposed ab / cd
- Odds Among the exposed abc / d
11Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- Given k categories of an ordered outcome
variable, a cumulative logit model yields k-1
intercept terms. Each intercept corresponds to a
category combined with all adjacent lower-ordered
categories. - Since proportional odds are assumed, and
therefore a common odds ratio, the effect of each
covariate is reflected in a single beta
coefficient.
12Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- Suppose an outcome variable has 4 categories and
we are modeling one independent variable. The
cumulative logit model will look as follows - ln(Odds) b0,1 b0,12 b0,123 b1
- The odds ratio is the same regardless of
category
13Ordinal and Nominal Outcomes
- A stratified approach to mimic a cumulative logit
model for a 4 category variable, would mean
creating new dichotomous variables something like
the following - if ordvar 1 then ordvar1 1
- else if ordvar . then ordvar1 0
- if 1ltordvarlt2 then ordvar2 1
- else if ordvar . then ordvar2 0
- if 1ltordvarlt3 then ordvar3 1
- else if ordvar . then ordvar3 0
14Ordinal and Nominal Outcomes
- Mimicking Cumulative Logit with Binary Logistic
Models - proc logistic The OR from each model
- model ordvar1 factors will be approx. the
same if - run the proportional odds
- proc logistic assumption holds.
- model ordvar2 factors
- run
- proc logistic Note that all observations
- model ordvar3 factors are used in each
model. - run
15Ordinal and Nominal Outcomes
- The Cumulative Logit Model
- If the proportional odds assumption does not
hold, it might be because the outcome variable is
nominal rather than ordinal, or it might be that
we have mis-specified the categories, failing to
pinpoint important thresholds on the underlying
continuum. - The score test is quite sensitiveit is up to the
analyst to examine the pattern of ORs for
different dichotomous cutpoints and decide
whether it is reasonable to use a cumulative
logit model.
16Ordinal and Nominal Outcomes
- The Generalized Logit Model
- In contrast to the cumulative logit model, in a
generalized logit model, the outcome categories
are like dummy variablesmutually exclusive
categories compared to a common reference group.
17Ordinal and Nominal Outcomes
- The Generalized Logit Model
- For a nominal outcome with k categories
- Fixed denominator (reference category)
- http//www.indiana.edu/7Estatmath/stat/all/cat/2b
1.html
18Ordinal and Nominal Outcomes
- Odds Among the exposed a / d
- Odds Among the exposed b / d
- Odds Among the exposed c / d
19Ordinal and Nominal Outcomes
- The Generalized Logit Model
- Given k categories of an outcome variable, a
generalized logit model yields k-1 intercept
terms. Each intercept corresponds to a single
category. - Since proportional odds are not assumed, odds
ratios can vary across categories, and therefore
the effect of each covariate is reflected in k-1
slope parameters.
20Ordinal and Nominal Outcomes
- The Generalized Logit Model
- Suppose an outcome variable has 4 categories and
we are modeling one independent variable. The
generalized logit model is as follows - ln(Odds) b0,1 b0,2 b0,3 b1,1 b1,2 b1,3
- 1. The odds ratios are
- distinct for each category
- 2. 3.
21Ordinal and Nominal Outcomes
- The Generalized Logit Model
- Each slope parameter tests the odds of being in
one outcome category compared to the odds of
being in the reference category - Compared to those without Factor A, individuals
with factor A have ___ times the odds of having
the outcomecategory 1 - Compared to those without Factor A, individuals
with factor A have ___ times the odds of having
the outcomecategory 2 - Compared to those without Factor A, individuals
with factor A have ___ times the odds of having
the outcomecategory 3
22Ordinal and Nominal Outcomes
- A stratified approach to mimic generalized logit
model for a 4 category variable, would not
require creation of new variables, but would mean
running models like the following
23Ordinal and Nominal Outcomes
- proc logistic Mimicking Generalized Logit
- where ordvar in(1,4) with Binary
Logistic Models - model ordvar factors
- run
- proc logistic The ORs from the
- where ordvar in(2,4) models will differ.
- model ordvar factors
- run
- proc logistic Note that different
- where ordvar in(3,4) subsets of
observations - model ordvar factors are used in each
model. - run
24Example 1.
- The Association of Smoking and Fetal/Infant Death
- in Preterm Deliveries
- Crude OR1.07
25Example 1.
- The Association of Smoking and Fetal/Infant Death
in Preterm Deliveries - Crude Logistic Model with Dichotomous Outcome
-
26Example 1.
- Cumulative Logit Odds of type of death among
smokers - and the OR for smoker v. nonsmoker
- Odds46 / (331135)0.04 Odds(4633) /
11350.07 - OR 1.04 OR 1.07
27Example 1.
- Cumulative Logit Model with 3 Categories
- Ordered Value outcome5
Frequency - 1 fetal death gt20 wks
332 - 2 neonatal death 0-28 days
229 - 3 survivor gt28 days
8520 - Probabilities modeled are cumulated over the
lower Ordered Values. - Score Test for the Proportional Odds Assumption
- Chi-Square DF Pr gt ChiSq The
proportional - 0.0400 1 0.8414 odds
assumption - holds
28Example 1.
- Cumulative Logit Each intercept corresponds to a
category plus all categories with lower ordered
values v. the remaining categories. - The odds ratio is an average of the cumulative
logits -
- 46 / (331135) e-3.28030.0635 0.04
- (4633) / 1135 e-2.72910.0635 0.07
29Example 1.
- Generalized Logit Model with 3 Categories
- In a generalized logit model, each intercept and
slope correspond to a single category. - Is 1.07 a reasonable summary of 1.047 and 1.096?
30Example 2.
- The Association of Maternal Risk and Fetal/Infant
Death in Preterm Deliveries
31Example 2.
- The Association of Maternal Risk and Fetal/Infant
Death in Preterm Deliveries - Crude Logistic Model with Dichotomous Outcome
32Example 2.
- Cumulative Logit Model with 3 Categories
- Ordered Value outcome5
Frequency - 1 fetal death gt20 wks
418 - 2 neonatal death 0-28 days
261 - 3 survivor gt28 days
9549 - Probabilities modeled are cumulated over the
lower Ordered Values. - Score Test for the Proportional Odds Assumption
- Chi-Square DF Pr gt ChiSq The
proportional - 10.7077 1 0.0011 odds
assumption - does not hold.
33Example 2.
- Cumulative Logit Model with 3 Categories
- The odds ratio is an average of the cumulative
logits -
-
- e-3.17500.0473 0.04
- e-2.66290.0473 0.07
34Example 2.
- Generalized Logit Model with 3 Categories
- Is 1.048 a reasonable summary of 0.86 and 1.5?
35Example 3. LBW
- Modeling a 3 category birthweight variable
- /cumulative logit /
- proc logistic orderformatted
- model bwcat smoking late_no_pnc
- run
36Example 3. LBW
37Example 3. LBW
- /mimicking cumulative logit with binary models/
- proc logistic orderformatted
- model vlbw smoking late_no_pnc
- run
- vlbw v.
- mlbw and normal
- proc logistic orderformatted
- model lbw smoking late_no_pnc
- run
- vlbw and mlbw v.
- normal
- Both models include all observations in the sample
38Example 3. LBW
- / generalized logit /
- proc logistic orderformatted
- model bwcat(ref'normal bw') smoking
late_no_pnc - / linkglogit
- run
39Example 3. LBW
- vlbw v. normal and mlbw v. normal
40Example 3. LBW
- / mimicking generalized logit with binary
models/ - proc logistic orderformatted
- where bwcat 2 or bwcat 0
- model bwcat(ref'normal bw') smoking
late_no_pnc - / linkglogit
- run
- proc logistic orderformatted
- where bwcat 1 or bwcat 0
- model bwcat(ref'normal bw') smoking
late_no_pnc - / linkglogit
- run
41Example 3. LBW
- Generalized logit approach using binary models
with only a subset of observations in each model - vlbw v.
- normal
- mlbw v.
- normal
42Example 3. LBW
- Generalized logit models can get complicated,
- but custom estimates can still be obtained in the
usual way. - proc logistic orderformatted
- where 2ltmomagelt3
- class parityrisk(ref'no hx preterm') /
paramref - model bwcat smoking late_no_pnc matrisk
momage - parityrisk smokingparityrisk /
linkglogit - contrast 'sm-risk, hxpreterm' smoking 1
matrisk 1 - smokingparityrisk 1 0 / estimateexp
- contrast 'sm-risk, primips'smoking 1 matrisk 1
- smokingparityrisk 0 1 / estimateexp
- contrast 'sm-risk, lorisk multips' smoking 1
matrisk 1 - smokingparityrisk 0 0 / estimateexp
- run
43Example 3. LBW
- The tests for the constructs in the model are all
statistically significant
44Example 3. LBW
- Not all beta coefficients are statistically
significant.
45Example 3. LBW
- Parity-specific contrasts of the joint effect of
smoking and having some antepartum medical risk,
adjusting for entry into prenatal care and
maternal age. - Should we leave the smokingparityrisk term in
the model?
46Example 4. Prenatal Care
- Should we consider the categories ordinal or
nominal?
47Example 4. Prenatal Care
- The Overlapping dichotomous Contrasts
- No Pnc v. Any PNC, OR 3.2 Inad/No
v. Adeq/Adeq/Inter, OR2.7 -
- Inter/Inad/No v. Adeq/Adeq, OR1.8 All others
v. Adeq, OR0.60
48Example 4. Prenatal Care
- Non-overlapping dichotomous contrasts
49Example 4. Prenatal Care
- Cumulative Logit
- The null hypothesis of
- proportional odds is rejected.
- Any association is
- obscured by averaging
- across levels of APNCU.
50Example 4. Prenatal Care
51Example 4. Prenatal Care
- Women with a prior lbw delivery had more than 4
times the odds of receiving no or inadequate
prenatal care rather than adequate care compared
to women with no history of lbw delivery. - Compared to women without a history of lbw
delivery, however, these high risk women also had
more than twice the odds of appropriately
receiving care beyond what is considered adequate
for most women.
52Example 5.
- Outcome is a
- 3 level rating
- of MCH
- epidemiology
- functioning
- above average
- average
- below average
53Summary Ordinal and Nominal Outcomes
- Proportional odds assumptionassess the series of
binary comparisons from collapsing categories - k-1 intercepts
- 1 slope / 1 odds ratio
- No assumption of the shape of the association
- Categories compared to a reference group
- k-1 intercepts
- k-1 slopes / k-1 odds ratios
54Summary Ordinal and Nominal Outcomes
- Issues for categorizing an outcome variable are
similar to those for defining categories for
independent variables - Conceptual meaning of the categories
- Statistical tests v. judgment about differences
between categories - Sample size and power
55Summary Ordinal and Nominal Outcomes
- Model Building
- Similar to beginning with examining dummy
variables for an independent variable prior to
deciding whether to use it in an ordinal form,
sometimes it is useful to run a generalized logit
model first, since it requires no assumption
about the ordering of the categories, and
empirically assess whether the variation in
category-specific odds ratios is important or
negligible.
56Summary Ordinal and Nominal Outcomes
- And even if the proportional odds assumption
holds, reporting separate odds ratios for each
categoryusing generalized logitmay be important
in order to emphasize the similarity of the
strength of the association across categories. - In addition, the cumulative logit model will not
only force the strength of association to be
uniform, the predicted values will also be forced
to be linear. Using generalized logit, the
predicted odds and odds ratios will both more
closely reflect the observed values.
57Summary Ordinal and Nominal Outcomes
- Why Not Just Always Run Stratified Models for
Generalized Logit? - For nominal outcomes, using a single model may be
more efficient than using separate binary models - With separate binary models, need to decide
whether each model should include the same
independent variables or whether different final,
category-specific models make sense, each
including only those variables which are risk or
protective factors for a particular binary
comparison
56
58Summary Ordinal and Nominal Outcomes
- Using a single multinomial model permits a
unified profile of risk and protective factors
across the categoriesboth significant and
insignificant
59Summary Ordinal and Nominal Outcomes
- For a variable that is actually continuous, are
there reasons to use a cumulative logit model
instead of a continuous outcome model? - For example, when would modeling ordinal
categories of birthweight be preferable either to
modeling birthweight continuously in grams or
categorized into nominal groups? - using a variable as ordinal (with fewer
categories) as opposed to continuous will yield
odds ratios instead of mean differences - No assumption of normality required
60Summary Ordinal and Nominal Outcomes
- For a variable that meets the proportional odds
assumption, is it still appropriate to choose to
use a generalized logit approach? - using ordinal as opposed to nominal categories
will be more efficient if there is truly an
ordinal effect - Why "waste" degrees of freedom on multiple odds
ratios, if the effect is constant across
categories?
61Which Modeling Approach?
- Choosing the form of the outcome variable
- Stressful Life Events
- Any stressful life event (y/n) independent vars
- (dichotomous)
- Fin. Emot. Traum. Partner independent vars
- (Nominal - No stressful life events as the
reference) - Sum of stressful life events independent vars
- (continuous)
- Scale of stressful life events independent vars
- (ordinal)
62Which Modeling Approach?
- Choosing the form of the outcome variable
- Maternal Depression
- Any depression (y/n) independent vars
- PrePost Pre_Only PP_Only independent vars
- (Nominal - No depression as the reference)
- Severe Moderate Mild independent vars
- (Ordinal or Nominal)
- Depression Severity Scale independent vars
- (ordinal)
63Which Modeling Approach?
- Choosing the form of the outcome variable
- Breastfeeding
- Ever Breastfed (yes v. no) independent vars
- Exclusive BFgt2 mos. (yes v. no) independent
vars - Exclusive gt2 mo. Exclusive BFlt2 mo.
independent vars - Never Breastfed as reference
- BFlt2 mo. BF 2-6 mo. BF gt 6 mo. independent vars
- Never Breastfed as reference
- Breastfeeding duration in weeks independent
vars