Linear Models III Thursday May 31, 10:15-12:00

About This Presentation

Title:

Linear Models III Thursday May 31, 10:15-12:00

Description:

Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University of IL School ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 64

Provided by: DebR135

Category:

more less

Transcript and Presenter's Notes

Title: Linear Models III Thursday May 31, 10:15-12:00

1
Linear Models IIIThursday May 31, 1015-1200

Deborah Rosenberg, PhD
Research Associate Professor
Division of Epidemiology and Biostatistics
University of IL School of Public Health
Training Course in MCH Epidemiology

2
Ordinal and Nominal Outcomes

Outcomes with More than 2 Categories
Examples of Outcomes which might be suited for
ordinal or nominal regression
Ordinal or Nominal bmi categories
Nominal cause of death categories
Ordinal or nominal severity of illness categories
Ordinal or nominal categories of program
participation

3
Ordinal and Nominal Outcomes

The Cumulative Logit Model
The primary motivation for using a logistic model
with an ordinal outcome is to accommodate a truly
ordinal variable that has a "ceiling" and "floor"
effect and one in which the intervals between
each response category can be somewhat arbitrary
that is, it is not a continuous variable.
Modeling an ordinal outcome as a continuous
variable can yield biased results because it will
yield predicted values outside the range of the
ordinal variable.

4
Ordinal and Nominal Outcomes

The Cumulative Logit Model
An ordered outcome may reflect an underlying
continuous variable for which we have no data or
for which we don't know the "real" threshold
values.
For example, a Likert scale for satisfactionvery
dissatisfied to very satisfiedor for
agreementstrongly disagree to strongly agreehas
response categories reflecting a continuous scale
for which there is no data.

5
Modeling Ordinal Outcomes

Some other ordinal variables that may reflect an
underlying continuous construct that cannot be
measured as such. The ordered values are
intended to reflect distinct threshold values.
Examples of ordinal variables of this type
access to care index
reports of experience of life stress
assessment of overall health status
satisfaction with care

4
6
Ordinal and Nominal Outcomes

The Cumulative Logit Model
To appropriately model an outcome as ordinal, the
proportional odds assumption must hold.
The proportional odds assumption
if an independent variable increases (or
decreases) the odds of being in category 1 v. the
remaining categories, then it also similarly
increases (or decreases) the odds of being in
category 2 and 1 combined v. the remaining
categories, in categories 3, 2, and 1 combined v.
the remaining categories, etc.

7
Ordinal and Nominal Outcomes

The Cumulative Logit Model
The null hypothesis for the proportional odds
assumption is that the odds ratios for the
association between a risk factor and an ordinal
outcome are constant regardless of how the
category boundaries are drawn.
If the proportional odds assumption holds, then
the association between an independent variable
and the outcome can be expressed as a single
summary estimatea common odds ratioacross all
categories.

8
Ordinal and Nominal Outcomes

The Cumulative Logit Model
The proportional odds assumption can be tested
with a chi-square statistic a score test. A
nonsignificant result means that the null
hypothesis will not be rejected and that the
cumulative logit model is appropriate a
significant result means that the proportional
odds assumption may not hold.

9
Ordinal and Nominal Outcomes

The Cumulative Logit Model
For an ordered outcome with k categories
Both the numerator and denominator change
http//www.indiana.edu/7Estatmath/stat/all/cat/2b
1.html

10
Ordinal and Nominal Outcomes

Odds Among the exposed a / bcd
Odds Among the exposed ab / cd
Odds Among the exposed abc / d

11
Ordinal and Nominal Outcomes

The Cumulative Logit Model
Given k categories of an ordered outcome
variable, a cumulative logit model yields k-1
intercept terms. Each intercept corresponds to a
category combined with all adjacent lower-ordered
categories.
Since proportional odds are assumed, and
therefore a common odds ratio, the effect of each
covariate is reflected in a single beta
coefficient.

12
Ordinal and Nominal Outcomes

The Cumulative Logit Model
Suppose an outcome variable has 4 categories and
we are modeling one independent variable. The
cumulative logit model will look as follows
ln(Odds) b0,1 b0,12 b0,123 b1
The odds ratio is the same regardless of
category

13
Ordinal and Nominal Outcomes

A stratified approach to mimic a cumulative logit
model for a 4 category variable, would mean
creating new dichotomous variables something like
the following
if ordvar 1 then ordvar1 1
else if ordvar . then ordvar1 0
if 1ltordvarlt2 then ordvar2 1
else if ordvar . then ordvar2 0
if 1ltordvarlt3 then ordvar3 1
else if ordvar . then ordvar3 0

14
Ordinal and Nominal Outcomes

Mimicking Cumulative Logit with Binary Logistic
Models
proc logistic The OR from each model
model ordvar1 factors will be approx. the
same if
run the proportional odds
proc logistic assumption holds.
model ordvar2 factors
run
proc logistic Note that all observations
model ordvar3 factors are used in each
model.
run

15
Ordinal and Nominal Outcomes

The Cumulative Logit Model
If the proportional odds assumption does not
hold, it might be because the outcome variable is
nominal rather than ordinal, or it might be that
we have mis-specified the categories, failing to
pinpoint important thresholds on the underlying
continuum.
The score test is quite sensitiveit is up to the
analyst to examine the pattern of ORs for
different dichotomous cutpoints and decide
whether it is reasonable to use a cumulative
logit model.

16
Ordinal and Nominal Outcomes

The Generalized Logit Model
In contrast to the cumulative logit model, in a
generalized logit model, the outcome categories
are like dummy variablesmutually exclusive
categories compared to a common reference group.

17
Ordinal and Nominal Outcomes

The Generalized Logit Model
For a nominal outcome with k categories
Fixed denominator (reference category)
http//www.indiana.edu/7Estatmath/stat/all/cat/2b
1.html

18
Ordinal and Nominal Outcomes

Odds Among the exposed a / d
Odds Among the exposed b / d
Odds Among the exposed c / d

19
Ordinal and Nominal Outcomes

The Generalized Logit Model
Given k categories of an outcome variable, a
generalized logit model yields k-1 intercept
terms. Each intercept corresponds to a single
category.
Since proportional odds are not assumed, odds
ratios can vary across categories, and therefore
the effect of each covariate is reflected in k-1
slope parameters.

20
Ordinal and Nominal Outcomes

The Generalized Logit Model
Suppose an outcome variable has 4 categories and
we are modeling one independent variable. The
generalized logit model is as follows
ln(Odds) b0,1 b0,2 b0,3 b1,1 b1,2 b1,3
1. The odds ratios are
distinct for each category
2. 3.

21
Ordinal and Nominal Outcomes

The Generalized Logit Model
Each slope parameter tests the odds of being in
one outcome category compared to the odds of
being in the reference category
Compared to those without Factor A, individuals
with factor A have ___ times the odds of having
the outcomecategory 1
Compared to those without Factor A, individuals
with factor A have ___ times the odds of having
the outcomecategory 2
Compared to those without Factor A, individuals
with factor A have ___ times the odds of having
the outcomecategory 3

22
Ordinal and Nominal Outcomes

A stratified approach to mimic generalized logit
model for a 4 category variable, would not
require creation of new variables, but would mean
running models like the following

23
Ordinal and Nominal Outcomes

proc logistic Mimicking Generalized Logit
where ordvar in(1,4) with Binary
Logistic Models
model ordvar factors
run
proc logistic The ORs from the
where ordvar in(2,4) models will differ.
model ordvar factors
run
proc logistic Note that different
where ordvar in(3,4) subsets of
observations
model ordvar factors are used in each
model.
run

24
Example 1.

The Association of Smoking and Fetal/Infant Death
in Preterm Deliveries
Crude OR1.07

25
Example 1.

The Association of Smoking and Fetal/Infant Death
in Preterm Deliveries
Crude Logistic Model with Dichotomous Outcome

26
Example 1.

Cumulative Logit Odds of type of death among
smokers
and the OR for smoker v. nonsmoker
Odds46 / (331135)0.04 Odds(4633) /
11350.07
OR 1.04 OR 1.07

27
Example 1.

Cumulative Logit Model with 3 Categories
Ordered Value outcome5
Frequency
1 fetal death gt20 wks
332
2 neonatal death 0-28 days
229
3 survivor gt28 days
8520
Probabilities modeled are cumulated over the
lower Ordered Values.
Score Test for the Proportional Odds Assumption
Chi-Square DF Pr gt ChiSq The
proportional
0.0400 1 0.8414 odds
assumption
holds

28
Example 1.

Cumulative Logit Each intercept corresponds to a
category plus all categories with lower ordered
values v. the remaining categories.
The odds ratio is an average of the cumulative
logits
46 / (331135) e-3.28030.0635 0.04
(4633) / 1135 e-2.72910.0635 0.07

29
Example 1.

Generalized Logit Model with 3 Categories
In a generalized logit model, each intercept and
slope correspond to a single category.
Is 1.07 a reasonable summary of 1.047 and 1.096?

30
Example 2.

The Association of Maternal Risk and Fetal/Infant
Death in Preterm Deliveries

31
Example 2.

The Association of Maternal Risk and Fetal/Infant
Death in Preterm Deliveries
Crude Logistic Model with Dichotomous Outcome

32
Example 2.

Cumulative Logit Model with 3 Categories
Ordered Value outcome5
Frequency
1 fetal death gt20 wks
418
2 neonatal death 0-28 days
261
3 survivor gt28 days
9549
Probabilities modeled are cumulated over the
lower Ordered Values.
Score Test for the Proportional Odds Assumption
Chi-Square DF Pr gt ChiSq The
proportional
10.7077 1 0.0011 odds
assumption
does not hold.

33
Example 2.

Cumulative Logit Model with 3 Categories
The odds ratio is an average of the cumulative
logits
e-3.17500.0473 0.04
e-2.66290.0473 0.07

34
Example 2.

Generalized Logit Model with 3 Categories
Is 1.048 a reasonable summary of 0.86 and 1.5?

35
Example 3. LBW

Modeling a 3 category birthweight variable
/cumulative logit /
proc logistic orderformatted
model bwcat smoking late_no_pnc
run

36
Example 3. LBW
37
Example 3. LBW

/mimicking cumulative logit with binary models/
proc logistic orderformatted
model vlbw smoking late_no_pnc
run
vlbw v.
mlbw and normal
proc logistic orderformatted
model lbw smoking late_no_pnc
run
vlbw and mlbw v.
normal
Both models include all observations in the sample

38
Example 3. LBW

/ generalized logit /
proc logistic orderformatted
model bwcat(ref'normal bw') smoking
late_no_pnc
/ linkglogit
run

39
Example 3. LBW

vlbw v. normal and mlbw v. normal

40
Example 3. LBW

/ mimicking generalized logit with binary
models/
proc logistic orderformatted
where bwcat 2 or bwcat 0
model bwcat(ref'normal bw') smoking
late_no_pnc
/ linkglogit
run
proc logistic orderformatted
where bwcat 1 or bwcat 0
model bwcat(ref'normal bw') smoking
late_no_pnc
/ linkglogit
run

41
Example 3. LBW

Generalized logit approach using binary models
with only a subset of observations in each model
vlbw v.
normal
mlbw v.
normal

42
Example 3. LBW

Generalized logit models can get complicated,
but custom estimates can still be obtained in the
usual way.
proc logistic orderformatted
where 2ltmomagelt3
class parityrisk(ref'no hx preterm') /
paramref
model bwcat smoking late_no_pnc matrisk
momage
parityrisk smokingparityrisk /
linkglogit
contrast 'sm-risk, hxpreterm' smoking 1
matrisk 1
smokingparityrisk 1 0 / estimateexp
contrast 'sm-risk, primips'smoking 1 matrisk 1
smokingparityrisk 0 1 / estimateexp
contrast 'sm-risk, lorisk multips' smoking 1
matrisk 1
smokingparityrisk 0 0 / estimateexp
run

43
Example 3. LBW

The tests for the constructs in the model are all
statistically significant

44
Example 3. LBW

Not all beta coefficients are statistically
significant.

45
Example 3. LBW

Parity-specific contrasts of the joint effect of
smoking and having some antepartum medical risk,
adjusting for entry into prenatal care and
maternal age.
Should we leave the smokingparityrisk term in
the model?

46
Example 4. Prenatal Care

Should we consider the categories ordinal or
nominal?

47
Example 4. Prenatal Care

The Overlapping dichotomous Contrasts
No Pnc v. Any PNC, OR 3.2 Inad/No
v. Adeq/Adeq/Inter, OR2.7
Inter/Inad/No v. Adeq/Adeq, OR1.8 All others
v. Adeq, OR0.60

48
Example 4. Prenatal Care

Non-overlapping dichotomous contrasts

49
Example 4. Prenatal Care

Cumulative Logit
The null hypothesis of
proportional odds is rejected.
Any association is
obscured by averaging
across levels of APNCU.

50
Example 4. Prenatal Care

Generalized
Logit

51
Example 4. Prenatal Care

Women with a prior lbw delivery had more than 4
times the odds of receiving no or inadequate
prenatal care rather than adequate care compared
to women with no history of lbw delivery.
Compared to women without a history of lbw
delivery, however, these high risk women also had
more than twice the odds of appropriately
receiving care beyond what is considered adequate
for most women.

52
Example 5.

Outcome is a
3 level rating
of MCH
epidemiology
functioning
above average
average
below average

53
Summary Ordinal and Nominal Outcomes

Cumulative--Ordinal

Generalized--Nominal

Proportional odds assumptionassess the series of
binary comparisons from collapsing categories
k-1 intercepts
1 slope / 1 odds ratio

No assumption of the shape of the association
Categories compared to a reference group
k-1 intercepts
k-1 slopes / k-1 odds ratios

54
Summary Ordinal and Nominal Outcomes

Issues for categorizing an outcome variable are
similar to those for defining categories for
independent variables
Conceptual meaning of the categories
Statistical tests v. judgment about differences
between categories
Sample size and power

55
Summary Ordinal and Nominal Outcomes

Model Building
Similar to beginning with examining dummy
variables for an independent variable prior to
deciding whether to use it in an ordinal form,
sometimes it is useful to run a generalized logit
model first, since it requires no assumption
about the ordering of the categories, and
empirically assess whether the variation in
category-specific odds ratios is important or
negligible.

56
Summary Ordinal and Nominal Outcomes

And even if the proportional odds assumption
holds, reporting separate odds ratios for each
categoryusing generalized logitmay be important
in order to emphasize the similarity of the
strength of the association across categories.
In addition, the cumulative logit model will not
only force the strength of association to be
uniform, the predicted values will also be forced
to be linear. Using generalized logit, the
predicted odds and odds ratios will both more
closely reflect the observed values.

57
Summary Ordinal and Nominal Outcomes

Why Not Just Always Run Stratified Models for
Generalized Logit?
For nominal outcomes, using a single model may be
more efficient than using separate binary models
With separate binary models, need to decide
whether each model should include the same
independent variables or whether different final,
category-specific models make sense, each
including only those variables which are risk or
protective factors for a particular binary
comparison

56
58
Summary Ordinal and Nominal Outcomes

Using a single multinomial model permits a
unified profile of risk and protective factors
across the categoriesboth significant and
insignificant

59
Summary Ordinal and Nominal Outcomes

For a variable that is actually continuous, are
there reasons to use a cumulative logit model
instead of a continuous outcome model?
For example, when would modeling ordinal
categories of birthweight be preferable either to
modeling birthweight continuously in grams or
categorized into nominal groups?
using a variable as ordinal (with fewer
categories) as opposed to continuous will yield
odds ratios instead of mean differences
No assumption of normality required

60
Summary Ordinal and Nominal Outcomes

For a variable that meets the proportional odds
assumption, is it still appropriate to choose to
use a generalized logit approach?
using ordinal as opposed to nominal categories
will be more efficient if there is truly an
ordinal effect
Why "waste" degrees of freedom on multiple odds
ratios, if the effect is constant across
categories?

61
Which Modeling Approach?