Title: Logistic Regression
1Logistic Regression Clinical Prediction Rules
- DOC Research
- October 21, 2009
- Brian F. Gage, MD
- Thanks to Curtis A. Parvin, Ph.D
2Regression Relate 1 predictor (independent)
variables to an outcome dependent) variable
- Cox regression
- Binary outcome variable
- Used to quantify the time to event (the hazard)
- Assumptions
- 1. Proportional hazard (multiplicative risk) and
- 2. Non-informative censoring
- Logistic regression
- Binary outcome variable
- Quantify the relationship between the odds of the
outcome occurring and the predictor variable(s) - Ordinary linear regression
- Continuous outcome variable
- Determine the relationship between a continuous
outcome variable and the predictor variable(s)
3Other Flavors of Logistic Regression
- Conditional Logistic Regression
- Matched pairs data (11, 1k, k1k2 matching)
- Ordinal Logistic Regression
- More than two ordered groups for outcome
- Multinomial Logistic Regression
- More than two unordered groups for outcome
4Example 1 Odds of CA After Background Exposure
to Radiation
- Study Question Does the background level of
radiation fallout cause CA? - Predictor Variable ?
- http//www.elementsdatabase.com/
5- Hypothesis Higher levels of radioactive
strontium (Sr-90) in deciduous teeth lost in the
1960s predicts subsequent CA over the next 40
years. - Subjects St. Louis children
- Statistical Analysis Could be Cox or logistic
regression - Study Design What do you recommend?
- Assume that you are collaborating w/ the tooth
fairy - Finding http//newsok.com/st.-louis-baby-teeth-y
ield-new-findings-on-nuclear-fallout/article/feed/
95578
6Example 2 Development of Angina. Logistic
Regression Michael P. LaValleyCirculation
20081172395-2399
7Angina Goodness of Fit Calibration
8Angina Discrimination
- C-statistic measures how well we can
differentiate volunteers in the 2 groups - Generally ranges from 0.5 (no discrimination
better than chance) to 1.0 (perfect
discrimination) - 0.8 0.9 is excellent
- C 0.64 for this model
9Example 3 Odds of Major Bleeding Around Time of
NSTEMI
- Background Tx of MI often causes bleeding
- Significance Being able to predict major
bleeding, could allow us to minimize that risk - Hypothesis We could develop validate an
accurate clinical prediction rule for bleeding - Study Design Split-sample, retrospective cohort
- Subjects 89,134 participants in CRUSADE
10Split Samples 80 Derivation 20 Validation
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16http//www.crusadebleedingscore.org
17Example 4 Relationship between gestational age
(GA) and whether an infant is breast feeding at
time of hospital discharge
18Ordinary Linear Regression
19Logistic Regression
20How do we get an S-shaped curve?
- Rather than using probability as our outcome
variable, we use a transformation that is a
function of probability - We choose our transformation so that it ranges
between (-8,8) as probability ranges between
(0,1) - We will use the logit transform
- Fitting a straight line using the logit transform
as the outcome variable is called logistic
regression - After we estimate the straight line we can
transform back to get our S-shaped curve
21Probability, Odds, and the Logit Transform
- Probability, P, ranges between 0 and 1
- Define Odds P/(1-P)
- Odds range between 0 and 8
- Note P Odds/(1Odds)
- The Logit transform is the logarithm of the Odds
- Logit log(Odds) logP/(1-P)
- Logit ranges between -8 and 8
- Note Odds eLogit
- Note P eLogit/(1eLogit)
22Log(Odds) -16.72 0.577GA
23(No Transcript)
24Logistic Regression
- Model the logarithm of the odds of an outcome as
a linear combination of predictor variables - Logit log(Odds) abXcY. . .
- Estimate the coefficients a, b, c based on a
random sample of subjects data - Determine which of the predictors are good
- Assess model fit
- Use the model to predict future cases
25Logistic Regression Coefficients
- For a single predictor variable, logistic
regression fits a straight line to the log of the
Odds - log(Odds) a bX
- b is the slope coefficient for X
- Each 1 unit change in X, changes the log(Odds) by
b units
26(No Transcript)
27Logistic Regression Coefficients
- b logOdds(X1) logOdds(X)
- Note log(A) log(B) log(A/B)
- b logOdds(X1)/Odds(X)
- Note Odds(X1)/Odds(X) is called an Odds ratio
28Odds and Odds Ratios
- Odds defines the probability that an event occurs
divided by the probability that the event doesnt
occur - An Odds ratio is the ratio of two Odds
- An Odds ratio could represent the ratio of the
odds in two different groups - An Odds ratio could represent the ratio of the
odds at two different values for a risk variable
29Breast Feeding Example
The Odds ratio for breast feeding at hospital
discharge for GA32 compared to GA28 is 4.0/0.5
8.0
30Logistic Regression Coefficients and Odds Ratios
- b logOdds(X1)/Odds(X)
- b estimates the log of the Odds ratio associated
with a 1 unit increase in X - eb estimates the the odds ratio for a 1 unit
increase in X - For the breast feeding example
- log(Odds) -16.72 0.577GA
- the odds of breast feeding at hospital discharge
increase by a factor of e0.577 1.78 for each
additional week of GA
31Logistic Regression Odds Ratios
32Logistic Regression When There is Only One Binary
Predictor
- This situation can be handled as a classic
case-control study
Disease Cases Controls
Risk Yes a b Factor No c
d Odds Ratio (OR) a/c ad b/d bc
33The Real Strength of Logistic Regression is When
There are Multiple Predictor Variables
- The independent variables (predictors, risk
factors) can be categorical or continuous - Example TDx-FLM II and gestational age as
predictors of risk for respiratory distress
syndrome (RDS) - TDx-FLM II measures mg surfactant/g of albumin in
amniotic fluid
34The Data (some of it)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Logistic Regression Parameter Estimates
--------------------------------------------------
---------------------------- rds
Coef. Std. Err. z Pgtz 95 Conf.
Interval ---------------------------------------
--------------------------------------
tdxflm -.1121656 .0163848 -7.11 0.000
-.1442792 -.0800520 ga -.3661113
.1192559 -2.58 0.010 -.5998486
-.1323740 _cons 15.68597 4.322678
3.63 0.000 7.213680 24.15827 -----------
--------------------------------------------------
-----------------
log(Odds) 15.69 - 0.112TDxFLM - 0.366GA
Odds Ratio for a 1 g/mg increase in TDxFLM
e-0.112 0.894 Odds Ratio for a 1 week increase
in GA e-0.366 0.693
39Using the Logistic Model to Predict Risk of RDS
- We can use the logistic model equation to
- Identify variables that are significant
predictors - calculate the absolute risk (probability) of RDS
(may give biased estimates) - calculate the relative risk (odds ratio) of RDS
- develop a classifier for diagnosing RDS
40Logistic Regression Parameter Estimates
Significant coefficients mean significantly
different from zero
--------------------------------------------------
---------------------------- rds
Coef. Std. Err. z Pgtz 95 Conf.
Interval ---------------------------------------
--------------------------------------
tdxflm -.1121656 .0163848 -7.11 0.000
-.1442792 -.0800520 ga -.3661113
.1192559 -2.58 0.010 -.5998486
-.1323740 _cons 15.68597 4.322678
3.63 0.000 7.213680 24.15827 -----------
--------------------------------------------------
-----------------
Significant Odds ratios mean significantly
different from one
--------------------------------------------------
---------------------------- rds Odds
Ratio Std. Err. z Pgtz 95 Conf.
Interval ---------------------------------------
--------------------------------------
tdxflm .893896 .0154269 -7.11 0.000
.8636601 .9241324 ga .6934256
.0871025 -2.58 0.010 .5227078
.8641434 -----------------------------------------
-------------------------------------
41Odds ratios for RDS relative to a TDX FLM II
ratio of 70 mg/g at 37 weeks gestational age
42Logistic Regression Predicted Probabilities and
Classification with 0.20 cutoff
TDxFLM GA RDS Logistic
P Classify 75 30 0
.0115517 0 TN 7 31 1
.9521286 1 TP 14.8 31
1 .8912354 1 TP 18.3
31 1 .8462539 1 TP 27
31 1 .6718219 1 TP
22 31 0 .7832782 1
FP 29 31 0 .6198854
1 FP 135 31 0
.0000095 0 TN 4 32
1 .9543484 1 TP 15
32 1 .8568574 1 TP 16.5
32 1 .8346432 1 TP
25 32 1 .6575863 1
TP 44.2 32 1 .1779585
0 FN 35.5 32 0
.3679177 1 FP 41 32
0 .2374989 1 FP 48
32 0 .1232235 0 TN 49
32 0 .1114575 0 TN
55.8 32 0 .0547323 0
TN 59 32 0 .0386864
0 TN 59 32 0
.0386864 0 TN
43(No Transcript)
44(No Transcript)
45Software Packages that perform Logistic Regression
- STATA
- SAS
- SPSS
- R
- JMP
- Others
46References
- Hosmer DW, Lemeshow S. Applied logistic
regression, 2nd ed., New York, NY John Wiley
Sons, 2000. - Kleinbaum DG. Logistic regression a
self-learning text. New York, NY
Springer-Verlag, 1994. - Bagley SC, White H, Golumb BA. Logistic
regression in the medical literature standards
for use and reporting, with particular attention
to one medical domain. J Clin Epidemiol
200154979-85. - (http//www.sciencedirect.com/science/publications
/journal) - Ostir GV, Uchida T. Logistic regression a
nontechnical review. Am J Phys Med Rehabil
200079565-72. - (pdf file available online through Ovid gateway)
- http//www.ioa.pdx.edu/newsom/pa551/lectur21.htm
- http//personal.ecu.edu/whiteheadj/data/logit/
- Parvin CA, Kaplan LA, Chapman JF, McManamon TG,
Gronowski AM. Predicting respiratory distress
syndrome using gestational age and fetal lung
maturity by fluorescent polarization. Am J Obstet
Gynecol 2005192199-207.
47Next Week
- Attend 4th Annual Research Symposium and Poster
Session 1230-430 - Farrell Learning and Teaching Centers Connor
Auditorium - Lecture 440-545
- Read Hulley et al. Chapter 9
- Problem Set 4