Cox Regression II - PowerPoint PPT Presentation

About This Presentation
Title:

Cox Regression II

Description:

Title: Statistics 262: Intermediate Biostatistics Author: kristinc Last modified by: Kristin Created Date: 4/15/2004 12:39:14 AM Document presentation format – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 72
Provided by: kris58
Learn more at: http://web.stanford.edu
Category:
Tags: cox | menses | regression

less

Transcript and Presenter's Notes

Title: Cox Regression II


1
Cox Regression II
2
Monday Gut Check Problem
  • Write out the likelihood for the following data,
    with weight as a time-dependent variable

Time-to-event (months) Survival (1died/0censored) Weight at baseline Weight at 3 months Weight at 9 months Weight at 12 months
10 0 140 145 155 .
2 1 240 . . .
4 0 130 130 . .
8 1 200 210 250 .
12 0 150 145 145 140
14 0 180 180 180 175
10 1 180 190 240 .
1 0 230 . . .
3 0 110 110 . .
3
SAS code for a time-dependent variable
  • proc phreg dataexample
  • model timecensor(0) weight
  • if timelt3 then weightw0
  • if timegt3 and timelt6 then weightw3
  • if timegt6 and timelt9 then weightw6
  • if timegt9 then weightw9
  • run

4
Model results
  • Using baseline weight HR2.8
  • Using weight as time-changing variable HR9.3

5

1. Stratification
  • Violations of PH assumption can be resolved by
  • Adding timecovariate interaction
  • Adding other time-dependent version of the
    covariate
  • Stratification

6

Stratification
  • Different stratum are allowed to have different
    baseline hazard functions.
  • Hazard functions do not need to be parallel
    between different stratum.
  • Essentially results in a weighted hazard ratio
    being estimated weighted over the different
    strata.
  • Useful for nuisance confounders (where you do
    not care to estimate the effect).
  • Assumes no interaction between the stratification
    variable and the main predictors.

7

Example stratify on gender
  • Males 1, 3, 4, 10, 12, 18 (subjects 1-6)
  • Females 1, 4, 5, 9 (subjects 7-10)

8
The PL
9

2. Using age as the time-scale in Cox Regression
  • Age is a common confounder in Cox Regression,
    since age is strongly related to death and
    disease.
  • You may control for age by adding baseline age as
    a covariate to the Cox model.
  • A better strategy for large-scale longitudinal
    surveys, such as NHANES, is to use age as your
    time-scale (rather than time-in-study).
  • You may additionally stratify on birth cohort to
    control for cohort effects.

10
Age as time-scale
  • The risk set becomes everyone who was at risk at
    a certain age rather than at a certain event
    time.
  • The risk set contains everyone who was still
    event-free at the age of the person who had the
    event.
  • Requires enough people at risk at all ages (such
    as in a large-scale, longitudinal survey).

11
The likelihood with age as time
Event times 3, 5, 7, 12, 13 (years-in-study) Ba
seline ages 28, 25, 40, 29, 30 (years) Age at
event or censoring 31, 30, 47, 41, 43
12
3. Residuals
  • Residuals are used to investigate the lack of fit
    of a model to a given subject.
  • For Cox regression, theres no easy analog to the
    usual observed minus predicted residual of
    linear regression

13
Martingale residual
  • ci (1 if event, 0 if censored) minus the
    estimated cumulative hazard to ti (as a function
    of fitted model) for individual i
  • ci-H(ti,Xi,?ßi)
  • E.g., for a subject who was censored at 2 months,
    and whose predicted cumulative hazard to 2 months
    was 20
  • Martingale0-.20 -.20
  • E.g., for a subject who had an event at 13
    months, and whose predicted cumulative hazard to
    13 months was 50
  • Martingale1-.50 .50
  • Gives excess failures.
  • Martingale residuals are not symmetrically
    distributed, even when the fitted model is
    correctly, so transform to deviance residuals...

14
Deviance Residuals
  • The deviance residual is a normalized transform
    of the martingale residual. These residuals are
    much more symmetrically distributed about zero.
  • Observations with large deviance residuals are
    poorly predicted by the model.

15
Deviance Residuals
  • Behave like residuals from ordinary linear
    regression
  • Should be symmetrically distributed around 0 and
    have standard deviation of 1.0.
  • Negative for observations with longer than
    expected observed survival times.
  • Plot deviance residuals against covariates to
    look for unusual patterns.

16
Deviance Residuals
  • In SAS, option on the output statement
  • Output outoutdata resdevVarname
  • Cannot get diagnostics in SAS if time-dependent
    covariate in the model

17
Example uis data
Pattern looks fairly symmetric around 0.
18
Example uis data
19
Example censored only
20
Example had event only
21
Schoenfeld residuals
  • Schoenfeld (1982) proposed the first set of
    residuals for use with Cox regression packages
  • Schoenfeld D. Residuals for the proportional
    hazards regresssion model. Biometrika, 1982,
    69(1)239-241.
  • Instead of a single residual for each individual,
    there is a separate residual for each individual
    for each covariate
  • Note Schoenfeld residuals are not defined for
    censored individuals.

22
Schoenfeld residuals
  • The Schoenfeld residual is defined as the
    covariate value for the individual that failed
    minus its expected value. (Yields residuals for
    each individual who failed, for each covariate).
  • Expected value of the covariate at time ti a
    weighted-average of the covariate, weighted by
    the likelihood of failure for each individual in
    the risk set at ti.

23
Example
  • 5 people left in our risk set at event time7
    months
  • Female 55-year old smoker
  • Male 45-year old non-smoker
  • Female 67-year old smoker
  • Male 58-year old smoker
  • Male 70-year old non-smoker
  • The 55-year old female smoker is the one who has
    the event

24
Example
  • Based on our model, we can calculate a predicted
    probability of death by time 7 for each person
    (call it p-hat)
  • Female 55-year old smoker p-hat.10
  • Male 45-year old non-smoker p-hat.05
  • Female 67-year old smoker p-hat.30
  • Male 58-year old smoker p-hat.20
  • Male 70-year old non-smoker p-hat.30
  • Thus, the expected value for the AGE of the
    person who failed is
  • 55(.10) 45 (.05) 67(.30) 58 (.20) 70
    (.30) 60
  • And, the Schoenfeld residual is 55-60 -5

25
Example
  • Based on our model, we can calculate a predicted
    probability of death by time 7 for each person
    (call it p-hat)
  • Female 55-year old smoker p-hat.10
  • Male 45-year old non-smoker p-hat.05
  • Female 67-year old smoker p-hat.30
  • Male 58-year old smoker p-hat.20
  • Male 70-year old non-smoker p-hat.30
  • The expected value for the GENDER of the person
    who failed is
  • 0(.10) 1(.05) 0(.30) 1 (.20) 1 (.30) .55
  • And, the Schoenfeld residual is 0-.55 -.55

26
Schoenfeld residuals
  • Since the Schoenfeld residuals are, in principle,
    independent of time, a plot that shows a
    non-random pattern against time is evidence of
    violation of the PH assumption.
  • Plot Schoenfeld residuals against time to
    evaluate PH assumption
  • Regress Schoenfeld residuals against time to test
    for independence between residuals and time.

27
Example no pattern with time
28
Example violation of PH
29
Schoenfeld residuals
  • In SAS
  • option on the output statement
  • Output outoutdata ressch Covariate1 Covariate2
    Covariate3

30
Summary of the many ways to evaluate PH
assumption
  • 1. Examine log(-log(S(t)) plots
  • PH assumption is supported by parallel lines and
    refuted by lines that cross or nearly cross
  • Must use categorical predictors or categories of
    a continuous predictor
  • 2. Include interaction with time in the model
  • PH assumption is supported by non-significant
    interaction coefficient and refuted by
    significant interaction coefficient
  • Retaining the interaction term in the model
    corrects for the violation of PH
  • Dont complicate your model in this way unless
    its absolutely necessary!
  • 3. Plot Schoenfeld residuals
  • PH assumption is supported by a random pattern
    with time and refuted by a non-random pattern
  • 4. Regress Schoenfeld residuals against time to
    test for independence between residuals and time.
  • PH assumption is supported by a non-significant
    relationship between residuals and time, and
    refuted by a significant relationship

31

4. Repeated events
  • Death (presumably) can only happen once, but many
    outcomes could happen twice
  • Fractures
  • Heart attacks
  • Pregnancy
  • Etc

32

Repeated events 1
  • Strategy 1 run a second Cox regression (among
    those who had a first event) starting with first
    event time as the origin
  • Repeat for third, fourth, fifth, events, etc.
  • Problems increasingly smaller and smaller sample
    sizes.

33

Repeated events Strategy 2
  • Treat each interval as a distinct observation,
    such that someone who had 3 events, for example,
    gives 3 observations to the dataset
  • Major problem dependence between the same
    individual

34

Strategy 3
  • Stratify by individual (fixed effects partial
    likelihood)
  • In PROC PHREG strata id
  • Problems
  • does not work well with RCT data
  • requires that most individuals have at least 2
    events
  • Can only estimate coefficients for those
    covariates that vary across successive spells for
    each individual this excludes constant personal
    characteristics such as age, education, gender,
    ethnicity, genotype

35
5. Competing Risks
36
BMT Related vs. Unrelated Donor
37
SAS Output
  • Patients with related donors survive longer.

37
38
Related/Unrelated Donor is significant.
  • Can you say definitively to a patient
  • If you find a related donor, you will have longer
    survival time.
  • What variables could be confounders?

38
39
Survival Analysis categorizes subjects
  1. Event of interest was observed
  2. Censored
  3. Competing risk was observed

39
40
Competing Risk
  • an event that either precludes the event of
    interest or alters its probability

Event of Interest Competing Risk
Death from the disease Death from other causes
Relapse Non-relapse mortality
Relapse Treatment complications
Local progression Metastasis
40
41
BMT Example
  • Interested in Time to Relapse
  • Competing Risks (preclude or alter probability of
    relapse)
  • Non-relapse mortality
  • Graft-vs-host disease (GVHD)

41
42
Who failed from the event of interest?
  1. Event of interest was observed
  2. Censored
  3. Competing risk was observed

Yes Maybe No
  • Common Pitfall treating competing risks as
    censoring
  • Treats nos as maybes
  • Puts them partially in the numerator of
    occurrence when they shouldnt be there
  • Thus overestimates risk (underestimates S)

42
43
What to do instead
  • KM estimate of event free survival (EFS)
  • Cumulative Incidence Analysis

43
44
Event-Free Survival
  • In cancer, often Progression-Free Survival (PFS)
  • Treats competing risks as events
  • Can use KM
  • For each subject, the first event to occur
  • Survival implies death is considered an event
  • BMT first of relapse, GVHD or death
  • Is this of interest?
  • May not be, e.g., Local progression and metastasis

44
45
Cumulative Incidence Analysis
  • Separates competing risks from event of interest
  • If no competing risks, equivalent to KM
  • Estimates occurrence probability F(t) 1 S(t)
  • Each event goes into one bin (event type)

45
46
BMT CumulativeIncidence Curves



47
6. Considerations when analyzing data from an RCT
48
Intention-to-Treat Analysis
  • Intention-to-treat analysis compare outcomes
    according to the groups to which subjects were
    initially assigned, regardless of which
    intervention they actually received.
  • Evaluates treatment effectiveness rather than
    treatment efficacy

49
Why intention to treat?
  • Non-intention-to-treat analyses lose the benefits
    of randomization, as the groups may no longer be
    balanced with regards to factors that influence
    the outcome.
  • Intention-to-treat analysis simulates real
    life, where patients often dont adhere
    perfectly to treatment or may discontinue
    treatment altogether.

50
Drop-ins and Drop-outs example, WHI
51
Effect of Intention to treat on the statistical
analysis
  • Intention-to-treat analyses tend to underestimate
    treatment effects increased variability due to
    switching waters down results.

52
Example
  • Take the following hypothetical RCT
  • Treated subjects have a 25 chance of dying
    during the 2-year study vs. placebo subjects have
    a 50 chance of dying.
  • TRUE RR 25/50 .50 (treated have 50 less
    chance of dying)
  • You do a 2-yr RCT of 100 treated and 100 placebo
    subjects.
  • If nobody switched, you would see about 25 deaths
    in the treated group and about 50 deaths in the
    placebo group (give or take a few due to random
    chance).
  • ?Observed RR? .50

53
Example, continued
  • BUT, if early in the study, 25 treated subjects
    switch to placebo and 25 placebo subjects switch
    to treatment.
  • You would see about
  • 25.25 75.50 43-44 deaths in the placebo
    group
  • And about
  • 25.50 75.25 31 deaths in the treated group
  • Observed RR 31/44 ? .70
  • Diluted effect!

54
7. Example analysis stress fracture study
  • Women runners may have reduced levels of
    estrogen, which puts them at risk of bone loss
    and stress fractures
  • This was a randomized trial of hormones (oral
    contraceptives) to prevent stress fractures in
    women runners
  • Two groups treatment and control (no placebo)

55
Baseline Description and Comparability of Groups
  • Baseline descriptors are summarized as
  • means and standard deviations for continuous
    variables
  • frequencies and percentages for categorical
    variables
  • How good was the randomization? i.e., Are the
    groups indeed balanced with regards to variables
    known to be prognostically related to the
    outcome?
  • For cohort study, what factors are related to
    exposure, and thus might be confounders?
  • Who is in the population?

56
Stress fracture studyBaseline characteristics
by randomization assignment
57
Summary of events
  • Might be presented as overall incidence rates.
  • If events are heterogeneous (as with stress
    fractures), tabulate results.

58
Stress Fracture 1 Diagnostic test Stress fracture 2 Study Area
right tibial bone right tibial bone right tibial bone right tibial bone right tibial bone right tibial bone left tibial bone left tibial bone left tibial bone left tibial bone right foot right foot left third metatarsal right 4th metatarsal left cuboid navicular bone upper right femur right femoral neck  18 bone scan x-ray bone scan bone scan bone scan bone scan bone scan bone scan bone scan bone scan bone scan x-ray x-ray x-ray MRI bone scan  MRI MRI           right tibial bone right tibial bone right femur      left foot               4 Boston Boston Boston Boston Stanford Michigan Boston Michigan Los Angeles Michigan Los Angeles New York Boston Stanford Stanford Stanford Los Angeles Stanford  
59
Evaluation of primary hypothesis
  • Intention-to-treat analysis for RCT
  • Primary exposure-event hypothesis for cohort
    study, adjusted for confounding

60
Corresponding Kaplan-Meier curve
61
Corresponding HR
Hazard Ratio (95 CI) Randomized to
treatment .82 (0.30, 2.27)
62
Secondary analyses
  • For RCT any non-intention to treat analyses
  • For RCT and cohort evaluate other predictors
    effect modification subgroups

63
Hazard ratios for treatment variables
Hazard Ratio (95 CI) Randomized to
treatment .82 (0.30, 2.27) Randomized to
treatment, on-protocol only (n82) .63 (0.21,
1.92) Actually took OCs at least 1-month .41
(0.15,1.08) Per month on OCs .92 (0.85,
0.98) Time-dependent treatment variable, when on
treatment .50 (0.18,1.40) All analyses are
stratified on site and menstrual status at
baseline (amenorrheic, oligomenorrheic, or
eumenorrheic), and adjusted for age and spine
Z-score at baseline using Cox Regression.
64
Kaplan-Meier estimates of stress fracture-free
survivorship by BMC at baseline
65
Kaplan-Meier estimates of stress fracture-free
survivorship by levels of daily calcium intake
at baseline
1500mg/day (n36)
800-1499 mg/day (n63)
lt800 mg/day (n22)
66
Kaplan-Meier estimates of stress fracture-free
survivorship by previous stress fracture
No previous fracture (n83)
Previous fracture (n39)
67
(No Transcript)
68
(No Transcript)
69
Risk Factors
  Hazard Ratio (95 CI) History of menstrual
irregularity prior to baseline 2.91
(0.81,10.43) BMClt1800g 3.70 (1.31, 10.46) Low
calcium (lt800 mg/d) 3.60 (1.12,11.59) Stress
fracture prior to baseline 5.45 (1.48,20.08) Fat
mass (per kg) 1.05 (0.91, 1.21) All
analyses are stratified on site and menstrual
status at baseline, and adjusted for age and
spine Z-score at baseline using Cox Regression.
70
Other protective factors
Hazard Ratio (95 CI) Spine BMD (per
1-standard deviation increase) .54 (0.30,
0.96) Every 100-mg/d calcium (continuous) .90
(0.81, 0.99) Lean mass (per kg),
time-dependent .91 (0.81, 1.02) Change in lean
mass (per kg) .83 (0.56, 1.24) Menarche (per
1-year older) .55 (0.34,0.90) All analyses
are stratified on site and menstrual status at
baseline, and adjusted for age and spine Z-score
at baseline (except spine Z score) using Cox
Regression.
71
References
  • Paul Allison. Survival Analysis Using SAS. SAS
    Institute Inc., Cary, NC 2003.
Write a Comment
User Comments (0)
About PowerShow.com