Title: Implementation and application of weighted Cox regression
1Implementation and application of weighted Cox
regression
- Georg Heinze, Daniela Dunkler,
- Samo Wakounig and Michael Schemper
- Section of Clinical Biometrics
- Core Unit of Medical Statistics and Informatics
- Medical University of Vienna, Austria
- Project sponsored by the Austrian Research Fund
2Weighted estimation in Cox regression
- The standard Cox model weights each risk set
equally - log-rank test
- For estimating br , weighted Cox regression
assigns weights wr (tj) to the risk sets j - Breslow (1974) test weight by Rj
- Prentice (1978) test weight by Sj
- In practice, weighting may be required for some
but not for all covariates in a model
3Outline
- Investigate applicability of inference methods in
weighted estimation - Wald
- Score
- Likelihood ratio
- Implementation of weighted Cox regression
- Application lung cancer data
4Applicability of inference methods
- Are Wald, score, LR methods applicable to
weighted Cox regression with - equal weighting
- mixed weighting?
- Do these methods depend on the scaling of the
weights? - Prentice weights Sj ? 0, 1
- Breslow weights Rj ? 1, N
- Assume a factor c such that
5The Wald statistic
- The Wald test statistic
- Divide all weights by c
- but hence
- Thus, the Wald test requires properly normalized
weights. - Use of the robust variance circumvents the
normalization problem.
6The score statistic
- The score statistic
- For inference about b1
- Dividing weights by a factor c
7Score confidence interval
- The score test can be inverted to obtain
confidence intervals - Choose b L (b U) such that
- Estimation of b L (b U) by separate binary
searches - Requires multiple of 2k additional iterative
estimations
8Inference based on likelihood
- r th score function
- Log likelihood
- Requires w(tj) w1 (tj) wk(tj)
- (equal weighting for all covariates)
- Dividing weights by c
r ?
9Inference summary
- Wald
- Requires proper normalization of weights
- Robust (sandwich) estimate is independent of
normalization - Score
- Available, but confidence limits numerically
intensive - Independent of normalization of weights
- Likelihood ratio
- Unavailable for mixed weighting
- Requires proper normalization of weights
10Proper normalization of weights
- Cox regression
- Weighted Cox regression
- Proper normalization
11Comparison of score, Wald and robust confidence
intervals
- Simulation study compare coverage of confidence
intervals by - Score
- Wald (normalized weights)
- Robust standard error (only for equal weighting)
- All CI methods approximately equivalent
- Suggestion use simplest method
- (Wald with normalized weights)
12Implementation of WCR
- No standard software available for WCR
- In the special case of equal type of weighting
for all covariates - Some data transformation, then use
- SAS/PROC PHREG
- R/coxph
13Implementation example
14Implementation example
Wrong naive weighting of observations
15Implementation example
16Implementation example
- Original data set Transformed data set
17Implementation example
- Original data set Transformed data set
18Implementation example
- Original data set Transformed data set
19Implementation example
- Original data set Transformed data set
20Implementation example
- Original data set Transformed data set
normalized weights
21Implementation example
- Original data set Transformed data set
The same individual is given different weights in
different risk sets Patient 4 weights are 4, 3,
2, 1
22Implementation of WCR
- Transformation approach
- cannot be applied with mixed weighting
- We produced specialized software based on FORTRAN
90 - SAS macro WCM
- R package coxphw
- Weights can be set to Rj, Sj or 1 (no weighting)
for each covariate separately - Inference (test and confidence intervals) based
on normalized Wald or on score statistic - Fixed and time-dependent effects
- Optional counting-process style input
23Implementation of WCR
- Our programs are available at
- http//www.muw.ac.at/msi/biometrie
24Application of WCR
- Now that we have efficient software, we can apply
WCR to a large-scale data set - Lung cancer study of Battarcharjee et al. (PNAS,
2001) - Gene expression of 12 600 genes
- Clinical data (survival, TNM-classification)
- N125
- Gene expressions were standardized using IQR
25Application of WCR example
- Application of WCR and CR in 12 600 univariate
models - Compare estimates obtained by WCR and CR
- Compare DFBETA residuals by WCR and CR
26Compare by WCR and CR
bCR - bWCR
(bCR bWCR)/2
27Compare by WCR and CR
HR increases with time
bCR - bWCR
HR decreases with time
(bCR bWCR)/2
28abs(DFBETA) residuals vs. time
abs(DFBETA)
CR WCR
Survival time (months)
29Slopes of abs(DFBETA) vs. time
Weighted Cox regression Slopes are centered at
0 Equal influence of short and long survival
times
30Slopes of abs(DFBETA) vs. time
Standard Cox regression Slopes tend to be
positive Overweights long survival
times Problematic if paired with outliers in gene
expression
31Conclusions from example
- WCR and CR estimates differ mainly if
non-proportional hazards are present - WCR provides unbiased estimates also in case of
non-proportionality - WCR provides good balance of influence of long
and short survival times on estimate - CR overweights long survival times
32Thank you!
- Daniela Dunkler
- Samo Wakounig
- Michael Schemper
http//www.muw.ac.at/msi/biometrie