Lecture 15: Logistic Regression: Inference and link functions - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 15: Logistic Regression: Inference and link functions

Description:

Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II More on our example Other covariates: Simple logistic models What is ... – PowerPoint PPT presentation

Number of Views:234

Avg rating:3.0/5.0

Slides: 24

Provided by: peopleMu1

Learn more at: http://people.musc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 15: Logistic Regression: Inference and link functions

1
Lecture 15Logistic Regression Inference and
link functions

BMTRY 701Biostatistical Methods II

2
More on our example
gt pros5.reg lt- glm(cap.inv log(psa) gleason,
familybinomial) gt summary(pros5.reg) Call glm(f
ormula cap.inv log(psa) gleason, family
binomial) Coefficients Estimate
Std. Error z value Pr(gtz) (Intercept)
-8.1061 0.9916 -8.174 2.97e-16 log(psa)
0.4812 0.1448 3.323 0.000892
gleason 1.0229 0.1595 6.412
1.43e-10 --- Signif. codes 0 0.001
0.01 0.05 . 0.1 1 (Dispersion
parameter for binomial family taken to be 1)
Null deviance 512.29 on 379 degrees of
freedom Residual deviance 403.90 on 377
degrees of freedom AIC 409.9
3
Other covariates Simple logistic models
Covariate Beta exp(Beta) Z
Age -0.0082 0.99 -0.51
Race -0.054 0.95 -0.15
Vol -0.014 0.99 -2.26
Dig Exam (vs. no nodule)
Unilobar left 0.88 2.41 2.81
Unilobar right 1.56 4.76 4.78
Bilobar 2.10 8.17 5.44
Detection in RE 1.71 5.53 4.48
LogPSA 0.87 2.39 6.62
Gleason 1.24 3.46 8.12
4
What is a good multiple regression model?

Principles of model building are analogous to
linear regression
We use the same approach
Look for significant covariates in simple models
consider multicollinearity
look for confounding (i.e. change in betas when a
covariate is removed)

5
Multiple regression model proposal

Gleason, logPSA, Volume, Digital Exam result,
detection in RE
But, what about collinearity? 5 choose 2 pairs.

gleason log.psa. vol gleason 1.00
0.46 -0.06 log.psa. 0.46 1.00 0.05 vol
-0.06 0.05 1.00

6
Categorical pairs
gt dpros.dcaps lt- epitab(dpros, dcaps) gt
dpros.dcapstab Outcome Predictor 1
p0 2 p1 oddsratio lower
upper 1 95 0.2802360 4 0.09756098
1.000000 NA NA 2 123
0.3628319 9 0.21951220 1.737805 0.5193327
5.815089 3 84 0.2477876 12 0.29268293
3.392857 1.0540422 10.921270 4 37
0.1091445 16 0.39024390 10.270270 3.2208157
32.748987 Outcome Predictor
p.value 1 NA 2
4.050642e-01 3 3.777900e-02 4
1.271225e-05 gt fisher.test(table(dpros, dcaps))
Fisher's Exact Test for Count Data data
table(dpros, dcaps) p-value 2.520e-05 alternati
ve hypothesis two.sided
7
Categorical vs. continuous

t-tests and anova means by category

gt summary(lm(log(psa)dcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 1.2506 0.1877 6.662 9.55e-11
dcaps 0.8647 0.1632 5.300
1.97e-07 --- Residual standard error 0.9868
on 378 degrees of freedom Multiple R-squared
0.06917, Adjusted R-squared 0.06671
F-statistic 28.09 on 1 and 378 DF, p-value
1.974e-07 gt summary(lm(log(psa)factor(dpros)))
Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept)
2.1418087 0.0992064 21.589 lt 2e-16
factor(dpros)2 -0.1060634 0.1312377 -0.808
0.419 factor(dpros)3 0.0001465 0.1413909
0.001 0.999 factor(dpros)4 0.7431101
0.1680055 4.423 1.28e-05 --- Residual
standard error 0.9871 on 376 degrees of
freedom Multiple R-squared 0.07348, Adjusted
R-squared 0.06609 F-statistic 9.94 on 3 and
376 DF, p-value 2.547e-06
8
Categorical vs. continuous
gt summary(lm(voldcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 22.905 3.477 6.587 1.51e-10
dcaps -6.362 3.022 -2.106
0.0359 --- Residual standard error 18.27 on
377 degrees of freedom (1 observation deleted
due to missingness) Multiple R-squared 0.01162,
Adjusted R-squared 0.009003 F-statistic
4.434 on 1 and 377 DF, p-value 0.03589 gt
summary(lm(volfactor(dpros))) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 17.417 1.858 9.374
lt2e-16 factor(dpros)2 -1.638 2.453
-0.668 0.505 factor(dpros)3 -1.976
2.641 -0.748 0.455 factor(dpros)4
-3.513 3.136 -1.120 0.263
--- Residual standard error 18.39 on 375
degrees of freedom (1 observation deleted due
to missingness) Multiple R-squared 0.003598,
Adjusted R-squared -0.004373 F-statistic
0.4514 on 3 and 375 DF, p-value 0.7164
9
Categorical vs. continuous
gt summary(lm(gleasondcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 5.2560 0.1991 26.401 lt 2e-16
dcaps 1.0183 0.1730 5.885
8.78e-09 --- Residual standard error 1.047
on 378 degrees of freedom Multiple R-squared
0.08394, Adjusted R-squared 0.08151
F-statistic 34.63 on 1 and 378 DF, p-value
8.776e-09 gt summary(lm(gleasonfactor(dpros)))
Coefficients Estimate Std. Error
t value Pr(gtt) (Intercept) 5.9798
0.1060 56.402 lt 2e-16 factor(dpros)2
0.4217 0.1403 3.007 0.00282
factor(dpros)3 0.4890 0.1511 3.236
0.00132 factor(dpros)4 0.9636 0.1795
5.367 1.40e-07 --- Residual standard error
1.055 on 376 degrees of freedom Multiple
R-squared 0.07411, Adjusted R-squared
0.06672 F-statistic 10.03 on 3 and 376 DF,
p-value 2.251e-06
10
Lots of correlation between covariates

We should expect that there will be
insignificance and confounding.
Still, try the full model and see what happens

11
Full model results
gt mreg lt- glm(cap.inv gleason log(psa) vol
dcaps factor(dpros), familybinomial) gt gt
summary(mreg) Coefficients
Estimate Std. Error z value Pr(gtz)
(Intercept) -8.617036 1.102909 -7.813
5.58e-15 gleason 0.908424 0.166317
5.462 4.71e-08 log(psa) 0.514200
0.156739 3.281 0.00104 vol
-0.014171 0.007712 -1.838 0.06612 . dcaps
0.464952 0.456868 1.018 0.30882
factor(dpros)2 0.753759 0.355762 2.119
0.03411 factor(dpros)3 1.517838 0.372366
4.076 4.58e-05 factor(dpros)4 1.384887
0.453127 3.056 0.00224 --- Null
deviance 511.26 on 378 degrees of
freedom Residual deviance 376.00 on 371
degrees of freedom (1 observation deleted due
to missingness) AIC 392
12
What next?

Drop or retain?
How to interpret?

13
Likelihood Ratio Test

Recall testing multiple coefficients in linear
regression
Approach ANOVA
We dont have ANOVA for logistic
More general approach Likelihood Ratio Test
Based on the likelihood (or log-likelihood) for
competing nested models

14
Likelihood Ratio Test

Ho small model
Ha large model
Example

15
Recall the likelihood function
16
Estimating the log-likelihood

Recall that we use the log-likelihood because it
is simpler (back to linear regression)
MLEs
Betas are selected to maximize the likelihood
Betas also maximize the log-likelihood
If we plus the estimated betas, we get our
maximized log-likelihood for that model
We compare the log-likelihoods from competing
(nested) models

17
Likelihood Ratio Test

LR statistic G2 -2(LogL(H0)-LogL(H1))
Under the null G2 ?2(p-q)
If G2 lt ?2(p-q),1-a, conclude H0
If G2 gt ?2(p-q),1-a conclude H1

18
LRT in R

-2LogL Residual Deviance
So, G2 Dev(0) - Dev(1)
Fit two models

19
gt mreg1 lt- glm(cap.inv gleason log(psa) vol
factor(dpros), familybinomial) gt mreg0 lt-
glm(cap.inv gleason log(psa) vol,
familybinomial) gt mreg1 Coefficients
(Intercept) gleason log(psa)
vol -8.31383 0.93147
0.53422 -0.01507 factor(dpros)2
factor(dpros)3 factor(dpros)4 0.76840
1.55109 1.44743 Degrees of
Freedom 378 Total (i.e. Null) 372 Residual
(1 observation deleted due to missingness) Null
Deviance 511.3 Residual Deviance 377.1
AIC 391.1 gt mreg0 Coefficients (Intercept)
gleason log(psa) vol
-7.76759 0.99931 0.50406 -0.01583
Degrees of Freedom 378 Total (i.e. Null) 375
Residual (1 observation deleted due to
missingness) Null Deviance 511.3 Residual
Deviance 399 AIC 407
20
Testing DPROS

Dev(0) Dev(1)
p q
?2(p-q),1-a,
Conclusion?
p-value?

21
More in R
qchisq(0.95,3) -2(logLik(mreg0) -
logLik(mreg1)) 1-pchisq(21.96, 3) gt
anova(mreg0, mreg1) Analysis of Deviance
Table Model 1 cap.inv gleason log(psa)
vol Model 2 cap.inv gleason log(psa) vol
factor(dpros) Resid. Df Resid. Dev Df
Deviance 1 375 399.02 2
372 377.06 3 21.96 gt
22
Notes on LRT

Again, models have to be NESTED
For comparing models that are not nested, you
need to use other approaches
Examples
AIC
BIC
DIC
Next time.

23
For next time, read the following article
Low Diagnostic Yield of Elective Coronary
Angiography Patel, Peterson, Dai et al. NEJM,
362(10). pp. 2886-95 March 11, 2010
http//content.nejm.org/cgi/content/short/362/10/8
86?ssourcemfv

Write a Comment

User Comments (0)