Title: Advance Epidemiology II Survival Analysis: Cox Proportional Hazard Model
1Advance Epidemiology II Survival AnalysisCox
Proportional Hazard Model
2- John Graunt (1662)- Natural and Political
Observations upon the Bill of Mortality - Report on registrations of births and deaths in
London - Edmund Halley (1687-1691)
- Developed first life table using data from a
population in Poland - s.e. formula for the survival probability
developed in 1926 -
- Biological, epidemiological, medical and industry
research have stimulated this field in the last
60 years - In the last 30 year the application of survival
analysis has increased in clinical research
3- Survival analysis in biomedical research deals
with -
- Estimation of failure time distributions
- Comparison of survival of different groups and
estimation of treatment effect - Prognostic evaluation of different
factors/variables - Univariate
- Multivariate
4- The term "survival analysis" is slightly
misleading - It is used when the objective is to study the
time elapsed from a particular starting point to
the occurrence of an event -
- may not be related to survival and death
- Time to death
- Time to relapse
- Time to remission
- Time to disease to develop
- Time to symptom to develop
5- A common problem in this type of data is
censoring - time-to-event is not observed
- Survival Analysis accounts for the fact that we
almost never observe the event of interest in all
subjects - We do not know if they will develop the event of
interest only that they have not presented the
event of interest at the end of the study - Censoring / survival time censored / censored data
6- Types of censoring
- Type I censoring
- Experimenter terminates observation at certain
time point of follow-up - (2) Type II censoring
- Experimenter terminates observation when a number
r of failures/events is reached - In this case failures/events are not a random
variable
7- In clinical and epidemiological research
censoring is caused by time restriction (Type I
censoring) - Two major sources of censoring
- lost to follow-up/drop-outs
- patient is alive at the last contact but his
subsequent survival status is unknown - time to last contact censoring time
- (2) patients/subjects are alive at the time of
study closure - time to study closure censoring time
8- If the study period were long enough to observe
the survival time of all subjects (e.g., some
animal experiments) a more common method of
analysis for continuous variables would be
appropriate - t-test
- ordinary least square regression
9- In studies of human subjects there is often
censoring and the outcome cannot be analyzed by
the usual methods for continuous data - Subjects are often censored at different times
- leading to unequal follow-up time
- Analysis of the probability of survival during
the study period as a dichotomous variable (alive
vs dead), e.g. by a Chi-square test, would fail
to account for this non-comparability between
subjects
10Time (months)
0
6
12
18
Patient Accrual
Observation Period
Diagram showing patients entering the study at
different times and the observation of known (
?) and censored ( ?) survival times
11- Different ways to measure the timing of an event
- calendar time since the baseline survey
- age at death (time since birth)
- first day of treatment
- day of randomization
12- Entry period
- Must be clearly defined based on study objectives
- Date of diagnosis
- Date of treatment/Date of Randomization
- Date of recurrence
- Date of beginning/detection of a exposure
- Date of birth
13- End-point
- Based on study objectives
- Deaths from any causes
- Death from disease
- Recurrence of disease
- Other end-points
- Development of disease
- Detection of intermediate end-point
- Detection of biomaker
- Persistence or regression of a disease process
14Time (months)
0
6
12
18
Patient Accrual
Observation Period
Diagram showing patients entering the study at
different times and the observation of known (?)
and censored (?) survival times
15Time (months)
0
6
12
18
Patient Accrual
Observation Period
Diagram showing patients reorganized by survival
time
16- Survival probability
- Proportion of the population that survives a
given length of time in the same circumstances - Assumption in survival analysis
- In a sample of N individuals observations are
independent - The random variables survival time and
censoring time are independent - Special feature of survival analysis
- Covariates may be time dependent (e.g. age,
calendar period of observation)
17- Methods of survival analysis
- Accommodate censoring
- Account for different periods of observation on
each experimental unit - Account for time at which event occur
18Planning, Reporting, and Interpreting Survival
Analysis
- Sample size
- Follow-up
- Date of entry or beginning of F-U
- Date of end of F-U
- Cut-off date for analysis
- Summary of F-U (median F-U)
- Percent of censored data
19Planning, Reporting and Interpreting Survival
Analysis
- Clear Definition
- Entry point
- end-points
- Losses to F-U
- Competing events (e.g. deaths from other causes)
- Relapse-free survival, disease-free survival,
remission duration (only patients responding to
treatment are analyzed) - Progression-free survival (all patients are
analyzed)
20Planning, Reporting, and Interpreting Survival
Analysis
- Explanatory variables/Prognostic factors
- Number and definition variable
- Coding of variables
- Presence of missing values
- Cut-off for categorization of continuous
variables
21Planning, Reporting, and Interpreting Survival
Analysis
- Graphical Representations
- Quality of graphs
- Scales
- Marks for censoring times
- Clear identification of groups
- Avoid over-interpretation of the right hand of
the curve - Survival curve shows the pattern of mortality in
time, not the details
22Planning, Reporting, and Interpreting Survival
Analysis
- Type of analysis
- Univariate analysis
- Only one explanatory variable at a time
- Method of analysis
- Test for comparison of groups
- Median survival time should be reported when
possible - Estimates the time period beyond which 50 of
patients are expected to survived in the study
population
23Planning, Reporting, and Interpreting Survival
Analysis
- Type of analysis (continue)
- Multivariate analysis
- At least two explanatory variables
- Method of analysis
- Statistical software used
24Methods of Estimation of survival Probability
- Product Limit Method (Kaplan-Meier method)
- Time variable continuous
- Life-Table Method (Actuarial method)
- Time variable grouped
25Methods for Comparing Survival Curves
- Kaplan-Meier method (1958)
- non-parametric method
- very often is the primary comparison between
treatment and control groups in cancer clinical
trials - Kaplan-Meier method produces survival curves
26Methods of Estimation of survival Probability
- Log Rank test
- most common method to compared independent groups
of survival times - test the H0 that two survival curves are
identical - based on comparison of observed and expected
death - expected number calculated under the assumption
of no difference in survival between groups - appropriate when relative mortality does not
change over time
27(No Transcript)
28(No Transcript)
29Methods of Estimation of survival Probability
- Hazard Ratio
- Provides information on the magnitude of the
difference between groups - Measures relative survival comparing the observed
number of events with the expected number of
events
30Cox Proportional Hazard Model
31Cox Proportional Hazard Model
- Most used regression method for analysis of
censored survival data - Introduced by Cox in 1972
- Used for
- Identification of differences in survival due to
treatment effect or prognostic (risk,
explanatory, or predictor) factors - Control of confounding in cohort studies
32Cox Proportional Hazard Model
- Given a set of p covariates (explanatory
variables) xi (x1i, x2i,xpi) - The hazard function for a given individual i is
modeled by - hi(t, Xi) h0(t) e?ßiXi
33Cox Proportional Hazard Model
- The hazard is the product of two quantities
- An arbitrary nonnegative baseline hazard h0(t)
- An exponential linear function of the p
covariates (explanatory/predictor/prognostic
variables) - The model is nonparametric because h0(t) is
unspecified - Baseline hazard is a function of t but does not
involve the Xs
34Cox Proportional Hazard Model
- The linear function
- ?ßiXi ß1x1i ß2x2i . ßpxpi
- also known as prognostic index for the ith
individual - The linear function is exponentiated to insure a
nonnegative hazard - Linear function involves Xs but does not involve
t - time independent variables
- If Xs involve t, Xs are called time-dependent
variables - Extended Cox Model
35Cox Proportional Hazard Model
- Time independent variables
- Values do not change over time
- Gender
- Values are only measured once
- Values are assumed not to change once they are
measured - Age
- Smoking status
- weight
36Cox Proportional Hazard Model
- Properties of Cox PH Model
- If all Xs 0 formula reduces to the baseline
hazard function h0(t) - x1 x2 . xp 0
- h(t, Xi) h0(t) e?ßiXi
- h0(t) e0
- h0(t)
37Cox Proportional Hazard Model
- Properties of Cox PH Model
- The baseline hazard function h0(t)
- Unspecified function
- nonparametric method
38Cox Proportional Hazard Model
- Properties of Cox PH Model
- Even when the h0(t) is unspecified
- Still possible to estimate ßs to assess effect
of predictor/explanatory variables - hazard ratios calculated without estimates of
h0(t)
39Cox Proportional Hazard Model
- Properties of Cox PH Model
- Form of the model
- exponential part of the model (e?ßiXi) ensures
hazard estimates that are not negative - Hazard function ranges between 0 and plus 8
40Cox Proportional Hazard Model
- Properties of Cox PH Model
- Hazard Function h(t, X) and corresponding
survival curve S(t, X) can be estimated using
minimum assumptions - even if the baseline hazard baseline function
h0(t) is not specified -
41Cox Proportional Hazard Model
- Properties of Cox PH Model
- Prefer over logistic regression model when
survival time is available and there is censoring - Cox model uses more information (survival time)
- Logistic regression uses (1,0) outcome and
ignores survival time and censoring
42Cox Proportional Hazard Model
- Estimation of Cox PH Model Parameters (ß coeff)
- Parameters are called ML estimates
- ßi hat
- Derived by maximizing the likelihood function L
- L describes the joint probability of obtaining
the data observed in the study subjects as a
function of the unknown parameters (ßs) in the
model - L Joint probability of observed data
- L sometimes written as L (ß)
43Cox Proportional Hazard Model
- Estimation of Cox PH Model Parameters (ß coeff)
- Cox models L is a partial likelihood function
- L does not consider probabilities for all study
subjects - only probabilities for those subjects who fail
- but not for those who are censored
44Cox Proportional Hazard Model
- Estimation of Cox PH Model Parameters (ß coeff)
- L is the product of several likelihoods, one for
each k failure times - L L1 x L2 x L3 x x Lk ? Lj
- Lj denotes likelihood of failing at jth failure
time, given survival up to this time, risk set
R(t(j)) - R(t(j)) over time as failure time
K
j 1
45Cox Proportional Hazard Model
- Estimation of Cox PH Model Parameters (ß coeff)
- L focus on subjects who fail, but the survival
time of censored subjects is use - L is then maximized by maximizing the natural log
of L - Taking partial derivatives of L with respect to
each parameter in the model - ?L / ?ßi 0, i 1.p ( of parameters)
46Cox Proportional Hazard Model
- Estimation of Cox PH Model Parameters (ß coeff)
- Solution is obtained by iteration- in stepwise
manner - Guess a value for the solution
- Modifies guess value in successive steps
- Stops when solution is obtained
47Cox Proportional Hazard Model
- Estimation of Cox PH Model Parameters (ß coeff)
- Once ML estimates obtained
- Statistical inference about the hazard ratio
- Wild test
- LR test
- 95 CI
- Estimated HR computed by exponentiating the ß
coefficient (0,1) of a explanatory variable of
interest - HR eß
?
?
48Cox Proportional Hazard Model
- Computing the Hazard Ratio
- Hazard for one individual divided by the hazard
of another individual - Individuals are distinguished by their values of
the predictor variables of interest (Xs) - HR h(t,X) / h(t,X)
- where, X (X1, X2,. Xp)
- and, X (X1, X2,. Xp)
- denote the set of predictors for two individuals
?
?
?
49Cox Proportional Hazard Model
- Computing the Hazard Ratio
- Easier to interpret a HR that exceeds 1
- HP Formula in terms of regression coefficients
- X (X1, X2,. Xp), Unexposed group (X1
0) - X (X1, X2,. Xp), Exposed group (X1 1)
- HR h(t,X) / h(t,X)
- h0(t) e?ßiXi / h0(t) e?ßiXi
- e?ßi(Xi - Xi)
?
?
?
We Obtain HR formula substituting the Cox model
formula with the regression coefficients in the
numerator and denominator.
?
p
p
?
?
?
I1
I1
p
?
I1
50Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
-
51Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
-
Mitchell MF, Tortolero-Luna G, et al, Ob Gyn, 1998
52Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Background
- In patients with biopsy-proven squamous
intraepithelial lesions (SIL) of the cervix who
have negative findings on endocervical curettage,
a satisfactory colposcopy examination, and
concordant Papanicolaou smear and biopsy results,
ablation of the transformation zone has been the
standard of care for several decades. Several
techniques for ablation are available
cryotherapy, laser ablation, and loop excision of
the transformation zone, also called the loop
electrosurgical excision procedure
53Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Cryotherapy
- Introduced in 1972
- Advantages reliability, ease of use, and low
cost - Disadvantages lack of ability to tailor treatment
to the size of the lesion and lack of a tissue
specimen -
54Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Laser vaporization
- Introduced in 1977
- Advantage easily tailored to lesion size
- Disadvantages cost and lack of tissue specimen
55Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Loop electrical excision procedure (LEEP)
- Introduced in 1989
- Advantages reliable, easy to use, can be
tailored to lesion size, and provides a tissue
specimen - Disadvantage of increased risk of bleeding and
infection, and an increased cost - Bleeding after treatment with LEEP have been
reported in 2 to 7 of cases
56Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Rationale for Study
- No previous randomized clinical trial has
assessed the effectiveness all three-treatment
modalities - Objectives of the Study
- to compare differences in the rates of persistent
and recurrent disease among treatment modalities - to identify predictor factors for persistent and
recurrent disease - to assess differences in complications rate among
treatment groups
57Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Study Population
- Women referred to the University of Texas M. D.
Anderson Cancer Center Colposcopy clinic with a
diagnosis of cervical intraepithelial neoplasia
(CIN), between March 1992 and April 1994
58Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Eligibility Criteria
- non-pregnant
- 18 years of age or older
- using a contraceptive methods
- biopsy-proven CIN lesion
- negative endocervical curettage (ECC)
- satisfactory colposcopic examination
(visualization of squamocolumnar junction and
entire of the lesion - consistent Pap smear and biopsy result
59Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Exclusion Criteria
- suspicion or evidence of invasive cervical
lesions on Pap smear, biopsy, or colposcopic
examination - suspicion of pregnancy
- current pelvic inflammatory disease, cervicitis
or other gynecological infection
60Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Data Collection
- Complete medical history
- Physical examination
- Pan-colposcopy of the vulva, perineum, vagina,
and cervix - Colposcopically directed biopsies
- human papillomavirus (HPV) testing by
Virapap/Viratype assay (Digene Diagnostics, Inc.,
Silverspring, MD)
61Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Randomization
- Eligibility patients randomly assigned to one of
the treatment groups following a stratified
assignment schedule - grade of CIN (1, 2, or 3)
- endocervical gland involvement
- lesion size (less than one-third, one- to
two-thirds, or greater than two-thirds of the
surface area of the cervix) - Patients were scheduled for treatment within 7-15
days from initial evaluation
62Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Follow-up Schedule
- 1 month after treatment
- Every 4 months for 2 years (4, 8, 12, 16, 20, and
24 month post-treatment) - Follow-up Evaluation
- complete physical and pelvic examination
- colposcopic exam
- Pap smear and colposcopic directed biopsies as
needed - Complications
- bleeding, infection (fever, vaginal discharge),
visits to the emergency room, and use of pain
medication within 24 hours post-treatment or
latter - evaluated for cervical stenosis
63Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Outcomes
- persistent and recurrent disease
- assessed on the bases of cytology and histology
- cytologist and pathologist were blinded to
treatment assignment - Definitions
- Persistent disease cytological or histological
presence CIN at the time of their second
follow-up visit (within 6 months after treatment) - Recurrent disease cytological or histological
presence CIN diagnosed at a subsequent follow-up
visit (6 months after treatment) in a patients
who had at least one negative cytological smear
after treatment
64Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- to determine the association between treatment
modality and other covariates of interest and
disease-free survival - age
- HPV status
- smoking habits
- grade of disease
- glandular involvement
- size and location of lesion
- history of prior treatment for CIN
65Cox Proportional Hazard Model
- Computing the Hazard Ratio Example
- Differences in disease-free survival
- (date of treatment date of recurrence)
- K-M method and the log-rank test
- Univariate and multivariate Cox proportional
hazard models
66Mitchell MF, Tortolero-Luna G, et al, Ob Gyn, 1998
67(No Transcript)
68(No Transcript)
69- For the purpose of this presentation we will
focus on the role of HPV status on recurrence of
CIN - Analysis was conducted using SPSS
- For previous model and each of the following
- 1st column variable(s) included
- 2nd column regression coefficients (B)
- 3rd column standard error of the coefficient
(SE) - 4th column Wald statistic (B/SE)
- 5th column degrees of freedom (df)
- 6th column p value for the Wald test (Z
statistic) - 7th column exp(B) (HR)
- 8th column 95 CI for the exp(B)
70(No Transcript)
71When There is only one variable in the model the
estimated hazard ratio can simplify to
HR e?ß1(Xi - Xi)
e?ß1(1 - 0) eß1 e(.544) 1.722
72(No Transcript)
73(No Transcript)
74(No Transcript)
75(No Transcript)
76(No Transcript)
77(No Transcript)
78(No Transcript)
79(No Transcript)
80(No Transcript)
81Control of Confounding
82Crude Model
83Control of Confounding
- Comparison of the HR of the crude model with the
HR of the adjusted model - If the estimated HR of the of the crude model
meanignfully differs from the estimated HR of
the adjusted model then there is confounding - If confounding is present, we must control for
the confounding variable to obtain a valid
estimate of effect - However, if a variable is not found to be a
confounder of the effect, we might decide to
include it in the model for other reason
(biological meaning, previous studies, or to
increase precision)
84Control of Confounding
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
85Control of Confounding
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
86Control of Confounding
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
87Control of Confounding
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
88Control of Confounding
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
89Control of Confounding
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
90Control of Confounding
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
91Control of Confounding
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
92Control of ConfoundingMultivariate Model
93Control of ConfoundingMultivariate Model
Crude Model -2 LL 650.283 HPV 16/18 1.973
(1.111-3.504) HPV Other 1.302 (0.623-2.717)
94Adjusted Overall Survival Function
95Adjusted Survival Function
96Model Selection in Cox Regression Analysis
- Similar to other modeling procedures
-
- Initial selection of variables based on
- Biological plausibility
- Literature
- Previous research
- Pilot data
97Model Selection in Cox Regression Analysis
- Step 1
- Assess the influence of each covariate of
interest (univariate or bivariate analysis) - Likelihood ratio test
- Wald test
- Using a significant level of 0.1 to .25
98Model Selection in Cox Regression Analysis
- Step 2
- All significant variables fitted in a
multivariable model - If we are interested on a particular variable the
other covariates are regarded as confounders - If some variables become no longer significant,
assess the effect on the model of removing each
variable at the time - LRT and Wald test
- If a substantial changed occurred in the
remaining coefficients or variable of interest
(e.g., 20 or greater) add variable back to the
model
99Model Selection in Cox Regression Analysis
- Step 3
- Add all variables found to be not significant in
step1 - All variables found to be significant in this
step, would be added to a new multivariable model - If variables included in step 2 lose their
significance, test them fro deletion - Step 4
- Check that not covariate can be removed or
removed to the model - Stepwise procedures
- Backward
- Forward
100Selecting the Best Model
101Selecting the Best Model Backward Selection
Crude Model -2 LL 632.495 HPV 16/18 2.011
(1.103-3.666) HPV Other 1.212 (0.573-2.565)
102Selecting the Best Model Backward Selection
103Selecting the Best Model Backward Selection
Crude Model -2 LL 632.495 HPV 16/18 2.011
(1.103-3.666) HPV Other 1.212 (0.573-2.565)
104Selecting the Best Model Backward Selection
105Selecting the Best Model Backward Selection
Crude Model -2 LL 632.495 HPV 16/18 2.011
(1.103-3.666) HPV Other 1.212 (0.573-2.565)
106Selecting the Best Model Backward
StepwiseProcedure
107Selecting the Best Model Backward
StepwiseProcedure
108Selecting the Best Model Backward
StepwiseProcedure
Block 1 Method Backward Stepwise (Likelihood
Ratio)
109Selecting the Best Model Backward
StepwiseProcedure Step 1
110Selecting the Best Model Backward
StepwiseProcedure Step 2
111Selecting the Best Model Backward
StepwiseProcedure Step 3
112Selecting the Best Model Backward
StepwiseProcedure Step 4
113Selecting the Best Model
114Selecting the Best Model
115Adjusted Survival Curve
116Adjusted Survival Function