Title: Threshold Regression Models
1Threshold Regression Models
Mei-Ling Ting Lee, University of Maryland,
College Park MLTLEE_at_UMD.EDU
2Outline
- An example to demonstrate the usefulness of the
first-hitting time based threshold regression
(TR) model. - Introduction of the TR model
- Connections with the PH model
- Semi-parametric TR model
- Simulations
3A non-proportional hazard example Time to
infection of kidney dialysis patients with
different catheterization procedures(Nahman et
al 1992, Klein Moesberger 2003)
- Surgical group
- 43 patients utilized a surgically placed catheter
- Percutaneous group
- 76 patients utilized a percutaneous placement of
their catheter - The survival time is defined by the time to
cutaneous exit-site infection.
4Kaplan-Meier Estimate versus PH Cox Model
5Weibull versus Lognormal
6Loglogistic versus Gamma
7Kaplan-Meier Estimate versus First-hitting-time
based Threshold Regression Model
8Outline
- Introduction of First Hitting Time
- Threshold Regression
- Parametric and Semi-parametric Models
- Comparison with PH Models
9First-hitting Time Based Threshold
RegressionModeling Event Times by a Stochastic
Process Reaching a Boundary (Lee Whitmore
2006, Statistical Sciences)
- Example Equipment Failure
- Equipment fails when its cumulative wear first
reaches a failure threshold. -
- Question What is the influence of ambient
temperature on failure?
10Occupational and Environmental Health
Occupational risk A railroad worker is exposed
to diesel exhaust in the workplace. The exposure
and other influences cause the workers health
status to gradually tend downward toward a
critical threshold that will result in death from
a particular cause (e.g., lung cancer).
Question Does diesel exhaust exposure increase
the risk of lung cancer and, if so, to what
degree?
11y0
Sample path
Process Y(t)
0
S
0
time t
First hitting time S of a fixed boundary at
level zero for a stochastic process of interest
Y(t)
12First Hitting Time (FHT) Models Y(t) the
stochastic process of interest B the boundary
set First hitting time S defined by S inf
t Y(t) ? B
13- Examples of first hitting time (FHT) models
- Wiener diffusion to a fixed boundary
- Progress of multiple myeloma until death
-
- Renewal process to a fixed count of renewal
events - Time to the nth epileptic seizure
-
- Semi-Markov process to an absorbing state
- Multi-state model for disease with death as an
absorbing state
14y0
Sample paths Y(t)
Process Y(t)
0
S
L
time t
- Two sample paths of a stochastic process of
interest - One path experiences failure at first hitting
time S - One path is surviving at end of follow up at
time L
15First Hitting Time (FHT) Models
- The first hitting time (FHT) model describes
many - time-to-event applications
- The stochastic process of interest Y(t) may
represent the latent (unobservable) health status
of a subject. - The threshold constitutes the critical level of
the process that triggers the failure event
(e.g., symptomatic cancer, death). Death
(endpoint or event) occurs when the health status
Y(t) first decreases to the zero threshold.
16Parameters for the FHT Model
- Model parameters for the latent process Y(t)
- Process parameters ? (m, s2), where
- m is the mean drift and s2 is the variance
- Baseline level of process Y(0) y0
- Because Y(t) is latent, we set s2 1.
17Likelihood Inference for the FHT Model
- The likelihood contribution of each sample
subject is as follows. - If the subject fails at Ss
- f (s y0, m) Pr first-hitting-time in
(s, sds) - If the subject survives beyond time L
- 1- F (L y0 ,m) Pr no first-hitting-time
before L
18(No Transcript)
19- Threhold Regression
- Link Functions parametric or semi-parametric
- Simultaneous regressions
- Possible Link functions for the baseline
parameter Y(0) and drift parameter m include - Linear combinations of covariates X1,, Xp
- polynomial combinations of X1, , Xp
- Regression splines
- Penalized regression splines
- Random effects
20 Threshold regression (TR) More than one
simultaneous regression functions with different
links may be used to estimate parameters
of 1. Process Y(t) Wiener process, gamma
process, etc 2. Boundary threshold straight
lines or curves 3. Time scale calendar or
running time, analytical time
21References
- Aalen O.O. and Gjessing H.K. (2001).
Understanding the shape of the hazard rate a
process point of view. Statistical Science, 16
1-22. - Lawless, J. F. (2003). Statistical Models and
Methods for Lifetime Data, Second Edition, Wiley. - Lee, M.-L. T. and G. A. Whitmore (2006).
Threshold regression for survival analysis
modeling event times by a stochastic process
reaching a boundary. Statistical Sciences. - Aalen O.O., Borgon O, and Gjessing H.K (2008).
Survival and Event History Analysis A process
Point of View. Springer.
22Threshold Regression Interpretation of PH
functions
- Most survival distributions can be related to
hitting time distributions for some stochastic
processes. - Families of PH functions can be generated by
varying time scales or boundaries of a TR model - The same family of PH functions can be produced
by different TR models. - Simulation studies both TR model and PH hold
simultaneously, based on standard Brownian motion
with variation of time scale. (Julia Batishevs
presentation in sec 2 of Tract 2 on July 3rd)
23Threshold Regression
- If Y(t) has a Wiener Process, then the first
hitting time S has an inverse Gaussian
distribution.
Consider
24Semi-parametric Threshold Regression(Joint work
with Z. Yu and W. Tu)
When Y(t) has a Wiener Process, then the first
hitting time S has an inverse Gaussian
distribution.
We consider
Where the functional form of q (Z) is unspecified.
25 Semi-parametric TR using Regression Spline
- Use linear link for covariates X1, , Xp
- For covariate Z, consider the nonparametric
function q(z) as a linear combination of a set of
basis functions Bj(z). - q(z) Sj bj Bj(z).
- Select the smoothing parameter and the number of
knots.
26(No Transcript)
27Semi-parametric TR using Penalized Spline
- In addition to regression spline, we also
consider a cubic spline approach with penalty on
the second derivative of the nonparametric
function
28Cross Validation
29Simulation Procedures
30Simulation Results
Table 1 summarizes simulation results using both
spline approaches. For the penalized spline
approach, over 400 replications, the means are g
0.491 and g 0.504 (very close to the true
value g 0.5). The mean of the estimated
standard error are 0.311 and 0.180 which are very
close to the empirical standard errors 0.307 and
0.177. The empirical coverage probabilities are
0.952 and 0.947. Our simulation results show
satisfactory performance of the penalized spline
approach with respect to regression coefficients
and corresponding variance estimation. The mean
of the estimate for n over 400 replications is
1.99 which is very close to the true value 2.
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35symbols
- A b c d e f g h I j k l m n o p q r s t u v w x y
z S X W Q
36Analyzing Longitudinal Survival Data Using
Threshold Regression Comparison with Cox
Regression
Mei-Ling Ting Lee, University of Maryland G. A.
Whitmore, McGill University Bernard Rosner,
Harvard Medical School
37Longitudinal Data Structure
38- Longitudinal Data Structure
- Health examples
- Annual monitoring of blood pressure
- Current status of disease
- Cohort study of smoking and lung cancer with
bi-annual medical checks
39(x, z, f, c)
x0
Process X(t)
S
0
..
tm
t1
t2
A sequence of time points
Figure Longitudinal data structure for threshold
regression
40- Longitudinal Data Structure (cont.)
- Individual observation sequences may include
- Process level x
- Covariate vector z
- Failure indicator f
- Censoring indicator c
41Longitudinal Data Structure (cont.)
42Uncoupling Longitudinal Data Definition of
uncoupling Break each longitudinal record into
a series of single records. Handling
longitudinal data is simple with
uncoupling. Under what conditions is uncoupling
valid?
43Analysis of Longitudinal Threshold Regression
Data with Uncoupling
Define the observation vector for each visit j
The longitudinal observation sequence is stopped
by censoring or failure at some visit m
44Analysis of Longitudinal Threshold Regression
Data with Uncoupling Probability of the stopped
observation sequence
Invoke a Markov assumption
45A Common Analytical Situation for P(Aj
Aj-1) Consider independent censoring c ,
Failure indicator f (f 1 if failed, f 0 if
not) Process reading x , and covariate z Without
readings on process X (t) (i.e., the process
of interest is latent), probability elements for
the likelihood then involve simple conditional
expressions
46Case Illustration Threshold regression output
for the illustrative longitudinal data set
Assuming X(t) follows a standardized Brownian
motion with initial status x0 to a fixed barrier
at zero. We can make inferences about the
influence of covariates on x0 at each visit.
47Case Illustration
48Case Illustration (cont.)
Threshold regression output has the familiar
look of conventional regression output but offers
greater insights and a richer interpretation
49Uncoupling in Cox PH Regression The main
probability element
has the following form when Cox PH regression is
uncoupled
Term h0j denotes a segment of a discrete baseline
hazard function over tj-1, tj).
50Converting Time-varying Covariates to Fixed
Covariates in Cox Regression Stata Programs
51Output 1
52Output
53Case Illustration The Nurses Health Study
cohort data set Questionnaire completed every
two years Interest in incidence of lung
cancer Longitudinal records from 1986 to 2000
115,768 subjects 748,007 observational
intervals 1,577,382 person-years at risk The
health process is latent.
54- Case Illustration (cont.)
- Assume independent censoring.
- Assume a Wiener diffusion process with zero mean
and unit variance (Brownian motion). - A zero mean is consistent with the data.
- A latent health status scale allows the variance
to be fixed arbitrarily. - The only parameter is x0, the initial health
status at baseline for each observation interval.
55- Case Illustration (cont.)
- For each observation interval, the covariates
are - Baseline cumulative smoking (pkyrs0, in pack
years). - Baseline age (age0, in years)
- A log-linear link is used, i.e., ln(x0)
56Case Illustration (cont.)
Threshold regression output based on longitudinal
uncoupling
57Case Illustration (cont.)
Uncoupling breaks each longitudinal record in the
data set into a series of single records. Once
parameters are estimated, predictive inferences
can be made by splicing together forward records
of a case using specified covariate conditions.
In essence, splicing is the reverse of
uncoupling.
58Case Illustration (cont.)
The splicing process involves multiplying
estimated conditional event probabilities as
follows.
59Partial Likelihood Explicit representation of
the initial probability and one-step transition
probabilities
60Partial Likelihood (cont.) Factor conditional
probability
61Partial Likelihood (cont.) Build the partial
likelihood for parameters of the parent process,
running time and boundary. Set aside the
likelihood contribution of the covariate process
and censoring mechanism
62Common Analytical Situation (cont.) Longitudinal
records are broken into a series of single
records with the following elements.
63(No Transcript)