Title: Lecture 15: Logistic Regression: Inference and link functions
1Lecture 15Logistic Regression Inference and
link functions
- BMTRY 701Biostatistical Methods II
2More on our example
gt pros5.reg lt- glm(cap.inv log(psa) gleason,
familybinomial) gt summary(pros5.reg) Call glm(f
ormula cap.inv log(psa) gleason, family
binomial) Coefficients Estimate
Std. Error z value Pr(gtz) (Intercept)
-8.1061 0.9916 -8.174 2.97e-16 log(psa)
0.4812 0.1448 3.323 0.000892
gleason 1.0229 0.1595 6.412
1.43e-10 --- Signif. codes 0 0.001
0.01 0.05 . 0.1 1 (Dispersion
parameter for binomial family taken to be 1)
Null deviance 512.29 on 379 degrees of
freedom Residual deviance 403.90 on 377
degrees of freedom AIC 409.9
3Other covariates Simple logistic models
Covariate Beta exp(Beta) Z
Age -0.0082 0.99 -0.51
Race -0.054 0.95 -0.15
Vol -0.014 0.99 -2.26
Dig Exam (vs. no nodule)
Unilobar left 0.88 2.41 2.81
Unilobar right 1.56 4.76 4.78
Bilobar 2.10 8.17 5.44
Detection in RE 1.71 5.53 4.48
LogPSA 0.87 2.39 6.62
Gleason 1.24 3.46 8.12
4What is a good multiple regression model?
- Principles of model building are analogous to
linear regression - We use the same approach
- Look for significant covariates in simple models
- consider multicollinearity
- look for confounding (i.e. change in betas when a
covariate is removed)
5Multiple regression model proposal
- Gleason, logPSA, Volume, Digital Exam result,
detection in RE - But, what about collinearity? 5 choose 2 pairs.
gleason log.psa. vol gleason 1.00
0.46 -0.06 log.psa. 0.46 1.00 0.05 vol
-0.06 0.05 1.00
6Categorical pairs
gt dpros.dcaps lt- epitab(dpros, dcaps) gt
dpros.dcapstab Outcome Predictor 1
p0 2 p1 oddsratio lower
upper 1 95 0.2802360 4 0.09756098
1.000000 NA NA 2 123
0.3628319 9 0.21951220 1.737805 0.5193327
5.815089 3 84 0.2477876 12 0.29268293
3.392857 1.0540422 10.921270 4 37
0.1091445 16 0.39024390 10.270270 3.2208157
32.748987 Outcome Predictor
p.value 1 NA 2
4.050642e-01 3 3.777900e-02 4
1.271225e-05 gt fisher.test(table(dpros, dcaps))
Fisher's Exact Test for Count Data data
table(dpros, dcaps) p-value 2.520e-05 alternati
ve hypothesis two.sided
7Categorical vs. continuous
- t-tests and anova means by category
gt summary(lm(log(psa)dcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 1.2506 0.1877 6.662 9.55e-11
dcaps 0.8647 0.1632 5.300
1.97e-07 --- Residual standard error 0.9868
on 378 degrees of freedom Multiple R-squared
0.06917, Adjusted R-squared 0.06671
F-statistic 28.09 on 1 and 378 DF, p-value
1.974e-07 gt summary(lm(log(psa)factor(dpros)))
Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept)
2.1418087 0.0992064 21.589 lt 2e-16
factor(dpros)2 -0.1060634 0.1312377 -0.808
0.419 factor(dpros)3 0.0001465 0.1413909
0.001 0.999 factor(dpros)4 0.7431101
0.1680055 4.423 1.28e-05 --- Residual
standard error 0.9871 on 376 degrees of
freedom Multiple R-squared 0.07348, Adjusted
R-squared 0.06609 F-statistic 9.94 on 3 and
376 DF, p-value 2.547e-06
8Categorical vs. continuous
gt summary(lm(voldcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 22.905 3.477 6.587 1.51e-10
dcaps -6.362 3.022 -2.106
0.0359 --- Residual standard error 18.27 on
377 degrees of freedom (1 observation deleted
due to missingness) Multiple R-squared 0.01162,
Adjusted R-squared 0.009003 F-statistic
4.434 on 1 and 377 DF, p-value 0.03589 gt
summary(lm(volfactor(dpros))) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 17.417 1.858 9.374
lt2e-16 factor(dpros)2 -1.638 2.453
-0.668 0.505 factor(dpros)3 -1.976
2.641 -0.748 0.455 factor(dpros)4
-3.513 3.136 -1.120 0.263
--- Residual standard error 18.39 on 375
degrees of freedom (1 observation deleted due
to missingness) Multiple R-squared 0.003598,
Adjusted R-squared -0.004373 F-statistic
0.4514 on 3 and 375 DF, p-value 0.7164
9Categorical vs. continuous
gt summary(lm(gleasondcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 5.2560 0.1991 26.401 lt 2e-16
dcaps 1.0183 0.1730 5.885
8.78e-09 --- Residual standard error 1.047
on 378 degrees of freedom Multiple R-squared
0.08394, Adjusted R-squared 0.08151
F-statistic 34.63 on 1 and 378 DF, p-value
8.776e-09 gt summary(lm(gleasonfactor(dpros)))
Coefficients Estimate Std. Error
t value Pr(gtt) (Intercept) 5.9798
0.1060 56.402 lt 2e-16 factor(dpros)2
0.4217 0.1403 3.007 0.00282
factor(dpros)3 0.4890 0.1511 3.236
0.00132 factor(dpros)4 0.9636 0.1795
5.367 1.40e-07 --- Residual standard error
1.055 on 376 degrees of freedom Multiple
R-squared 0.07411, Adjusted R-squared
0.06672 F-statistic 10.03 on 3 and 376 DF,
p-value 2.251e-06
10Lots of correlation between covariates
- We should expect that there will be
insignificance and confounding. - Still, try the full model and see what happens
11Full model results
gt mreg lt- glm(cap.inv gleason log(psa) vol
dcaps factor(dpros), familybinomial) gt gt
summary(mreg) Coefficients
Estimate Std. Error z value Pr(gtz)
(Intercept) -8.617036 1.102909 -7.813
5.58e-15 gleason 0.908424 0.166317
5.462 4.71e-08 log(psa) 0.514200
0.156739 3.281 0.00104 vol
-0.014171 0.007712 -1.838 0.06612 . dcaps
0.464952 0.456868 1.018 0.30882
factor(dpros)2 0.753759 0.355762 2.119
0.03411 factor(dpros)3 1.517838 0.372366
4.076 4.58e-05 factor(dpros)4 1.384887
0.453127 3.056 0.00224 --- Null
deviance 511.26 on 378 degrees of
freedom Residual deviance 376.00 on 371
degrees of freedom (1 observation deleted due
to missingness) AIC 392
12What next?
- Drop or retain?
- How to interpret?
13Likelihood Ratio Test
- Recall testing multiple coefficients in linear
regression - Approach ANOVA
- We dont have ANOVA for logistic
- More general approach Likelihood Ratio Test
- Based on the likelihood (or log-likelihood) for
competing nested models
14Likelihood Ratio Test
- Ho small model
- Ha large model
- Example
15Recall the likelihood function
16Estimating the log-likelihood
- Recall that we use the log-likelihood because it
is simpler (back to linear regression) - MLEs
- Betas are selected to maximize the likelihood
- Betas also maximize the log-likelihood
- If we plus the estimated betas, we get our
maximized log-likelihood for that model - We compare the log-likelihoods from competing
(nested) models
17Likelihood Ratio Test
- LR statistic G2 -2(LogL(H0)-LogL(H1))
- Under the null G2 ?2(p-q)
- If G2 lt ?2(p-q),1-a, conclude H0
- If G2 gt ?2(p-q),1-a conclude H1
18LRT in R
- -2LogL Residual Deviance
- So, G2 Dev(0) - Dev(1)
- Fit two models
19gt mreg1 lt- glm(cap.inv gleason log(psa) vol
factor(dpros), familybinomial) gt mreg0 lt-
glm(cap.inv gleason log(psa) vol,
familybinomial) gt mreg1 Coefficients
(Intercept) gleason log(psa)
vol -8.31383 0.93147
0.53422 -0.01507 factor(dpros)2
factor(dpros)3 factor(dpros)4 0.76840
1.55109 1.44743 Degrees of
Freedom 378 Total (i.e. Null) 372 Residual
(1 observation deleted due to missingness) Null
Deviance 511.3 Residual Deviance 377.1
AIC 391.1 gt mreg0 Coefficients (Intercept)
gleason log(psa) vol
-7.76759 0.99931 0.50406 -0.01583
Degrees of Freedom 378 Total (i.e. Null) 375
Residual (1 observation deleted due to
missingness) Null Deviance 511.3 Residual
Deviance 399 AIC 407
20Testing DPROS
- Dev(0) Dev(1)
- p q
- ?2(p-q),1-a,
- Conclusion?
- p-value?
21More in R
qchisq(0.95,3) -2(logLik(mreg0) -
logLik(mreg1)) 1-pchisq(21.96, 3) gt
anova(mreg0, mreg1) Analysis of Deviance
Table Model 1 cap.inv gleason log(psa)
vol Model 2 cap.inv gleason log(psa) vol
factor(dpros) Resid. Df Resid. Dev Df
Deviance 1 375 399.02 2
372 377.06 3 21.96 gt
22Notes on LRT
- Again, models have to be NESTED
- For comparing models that are not nested, you
need to use other approaches - Examples
- AIC
- BIC
- DIC
- Next time.
23For next time, read the following article
Low Diagnostic Yield of Elective Coronary
Angiography Patel, Peterson, Dai et al. NEJM,
362(10). pp. 2886-95 March 11, 2010
http//content.nejm.org/cgi/content/short/362/10/8
86?ssourcemfv