Longitudinal Data Analysis and Survival Analysis - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Longitudinal Data Analysis and Survival Analysis

Description:

General linear model: = 0.8088, SE = 0.4117, t = 1.96, p-value = 0.0497 ... Cox proportional hazard model. A subsample from the BMT example. Group. T. Status. All ... – PowerPoint PPT presentation

Number of Views:295
Avg rating:3.0/5.0
Slides: 42
Provided by: myf5
Category:

less

Transcript and Presenter's Notes

Title: Longitudinal Data Analysis and Survival Analysis


1
Longitudinal Data Analysis and Survival Analysis
  • Ming-Yu Fan, PhD
  • June 25, 2008

2
Outline
  • Longitudinal data
  • Methods for LDA
  • Robust standard error
  • Generalized Estimating Equation (GEE)
  • Random effects model
  • Survival data
  • Methods for analyzing survival data

3
Longitudinal Data
  • Each individual has multiple observations
  • The intervals between observations are
    approximately the same for each individual
  • Even intervals are nice but not necessary
  • Ex depression severity (SCL-20) evaluated at
    baseline, 3-, 6-, and 12-month

4
Why are longitudinal data desirable?
  • More information
  • Can control for individual heterogeneity
  • Can better assess causality than cross-sectional
    data

5
Problem with longitudinal data
  • Conventional statistical methods require
    independence between observations
  • Longitudinal data are likely to violate this
    assumption
  • Missing data due to attrition

6
Notation
  • yit, ith individual, tth observation
  • i 1, 2, , n t 1, 2, , T
  • yi1, yi2, yi3, ., yiT are very likely to be
    correlated

7
Example
  • Data from the IMPACT (PI Dr. Unützer) study
  • 8 organizations, total N 1801
  • Outcome SCL-20 measured at baseline and 3 months
  • Comparison between the intervention and the usual
    care groups

8
Example cont.
  • General linear model
  • ß -0.1323
  • SE 0.0222
  • t -5.96
  • p-value lt0.0001
  • Ignore the correlation
  • coefficient (? 0.4)
  • between SCL00 and
  • SCL03
  • Robust standard error
  • ß -0.1323
  • SE 0.0251
  • z -5.26
  • p-value lt0.0001

9
Example 2
  • Suppose we are only interested in two large study
    sites (N551)
  • Outcome MCS12 (mental health component score
    from SF12) measured at baseline and 3 months
  • Comparison between the intervention and the usual
    care groups

10
Example 2 cont.
  • General linear model
  • ß 0.8088, SE 0.4117, t 1.96, p-value
    0.0497
  • Ignore the correlation coefficient (? 0.23)
    between MCS1200 and MCS1203
  • Robust standard error
  • ß 0.8088, SE 0.4448, z 1.82, p-value
    0.0690
  • The robust standard error is greater than the SE
    estimated without accounting for correlation
  • Different methods lead to different conclusions

11
Methods for LDA
  • Robust standard error
  • Generalized Estimating Equations (GEE)
  • Random effects models (hierarchical models)

12
Robust standard error
  • Ordinary least squares covariance estimator
  • Robust covariance estimator
  • Residuals

13
Robust standard error cont.
  • Robust standard errors are usually larger than
    conventional standard errors, but its possible
    to see a smaller robust standard error
  • Robust standard errors may be inaccurate if the
    sample sizes are small

14
Generalized Estimating Equation (GEE)
  • Estimate ß by solving the following equation
    (Wedderburn, 1974)

15
Variance Structure
  • Covariance matrix for yi1, yi2, yi3, ., yiT
  • Corr(yik, yil) ?kl ? 0
  • Corr(yik, yjl) 0 when i ? j (for both k l and
    k ? l )
  • ?kl ?lk (symmetry)
  • Ex T 4, need to estimate 6 correlation
    coefficients (?12, ?13, ?14, ?23, ?24, ?34)

16
Possible Variance Structure
  • Unstructured (UN)
  • no constraints on ?s
  • Exchangeable (EXCH)
  • ?kl ? for all ks and ls
  • 1st order autoregressive (AR)
  • ?kl ?k-l
  • Banded structure (MDEP(m))
  • ?kl ?k-l when k-l lt m, otherwise ?kl 0
  • Independent (IND) ? Robust standard errors
  • ?kl 0 for all ks and ls

17
Example of Variance Structure
  • Outcome measured at 4 time points yi1, yi2, yi3,
    yi4
  • In total 6 correlation coefficients (?12, ?13,
    ?14, ?23, ?24, ?34)
  • Unstructured (UN) ? need to estimate all 6
    correlation coefficients
  • Exchangeable (EXCH) ? need to estimate only 1
    correlation coefficient
  • 1st order autoregressive (AR) ?12 ?, ?13 ?2,
    ?14 ?3 . ? need to estimate only 1 correlation
    coefficient
  • (e.g. ?12 0.4, ?13 0.16, ?14 0.064..)
  • Banded structure (MDEP(2)) ?12 ?23 ?34, ?13
    ?24, ?14 0 ? need to estimate 2 correlation
    coefficients (distance 1, 2)
  • Independent (IND) ? no need to estimate the
    correlation coefficient

18
GEE cont.
  • GEE produces efficient estimates of the
    coefficients
  • Can assume different variance structures. The
    results are usually robust to the choice
  • GEE assumes the drop-outs are Missing Completely
    At Random

19
Random effects models
  • General linear model
  • Random effects model
  • Other names hierarchical models multilevel
    models mixed models
  • This model is also called random intercepts
    model. It implies equal correlations and thus is
    equivalent to the exchangeable model in GEE
    estimation

20
Random effects models cont.
  • Can have more than 2 levels
  • Allow for random coefficients / slopes
  • The dependent variable can have missing data
    under a weaker assumption Missing At Random

21
Compared with GEE
  • Takes more computing time than GEE
  • Computationally less stable than GEE
  • More restrictive assumption on correlation
    structure (GEE can assume unstructured
    correlation)

22
LDA Methods for Continuous Dependent Variables
  • Robust standard errors
  • Can be derived using SAS Proc GENMOD (variance
    structure IND)or Proc SURVEYREG procedures
  • GEE
  • Stata xtgee, xtreg
  • SAS Proc GENMOD procedure
  • Choices of variance structure
  • Unstructured (UN)
  • Exchangeable (EXCH)
  • 1st order autoregressive (AR)
  • Banded structure (MDEP(m))

23
LDA Methods for Continuous Dependent Variables
cont.
  • How to check variance structure?
  • Assign UN (unstructured) first
  • Use CORRW option to print out the estimated
    correlation matrix
  • Random effects models
  • Stata xtreg
  • SAS Proc MIXED procedure

24
LDA Methods for Categorical Dependent Variables
  • Robust standard errors
  • SAS Proc GENMOD procedure (with IND variance
    structure)
  • Specify distribution family (e.g. Binomial for
    binary outcomes, Poisson for count data), default
    is Normal distribution
  • Can also use Proc SURVEYLOGISTIC procedure for
    binary outcome
  • GEE
  • Stata xtlogit, xtpoisson, etc
  • SAS Proc GENMOD procedure
  • Specify distribution family for categorical
    dependent variables
  • Assume variance structure to be UN, EXCH, AR, or
    MDEP(m)

25
LDA Methods for Categorical Dependent Variables
cont.
  • Random effects models
  • Use SAS Proc NLMIXED (non-linear mixed models)
    procedure
  • Can only estimate two-level models
  • Syntax is quite complicated
  • Computationally intensive and often unstable, and
    thus is not recommended (by Dr. Paul D. Alison)
  • Can also use SAS Proc GLIMMIX procedure (need to
    download it from SAS web site (http//support.sas.
    com/rnd/app/da/glimmix.html)
  • According to Dr. Alison
  • Can handle more than two levels of data
  • Much faster than NLMIXED
  • Syntax is simpler
  • Inaccurate for small numbers of time points (e.g.
    2-3 points per person)

26
IMPACT Example
  • Outcome SCL-20 measured at baseline, 3 months, 6
    months, 12 months, 18 months, and 24 months (time
    0, 3, 6, 12, 18, 24)
  • Compare between intervention and usual care
    groups
  • Models are adjusted for age, sex, education,
    ethnicity, number of chronic conditions, and time

27
(No Transcript)
28
UN, AR, EXCH, or MDEP(5)?
  • MDEP(5), perhaps?

29
Survival Analysis
  • Outcome failure failure time
  • Unlike repeated measures, survival data have only
    1 outcome measure
  • Methods for recurrent event are available
  • Failure time is (often) the clock time between
    the time origin and failure
  • Time origin should be precisely defined
  • Ex the date of randomization in a randomized
    clinical trial
  • Time origin doesnt have to be the same calendar
    time for all individuals
  • Censoring individuals are not observed for the
    full time to failure

30
BMT Example
  • From Klein and Moeschberger (1997)
  • Sample 137 patients who received bone marrow
    transplant
  • At the time of transplant, each patient is
    classified into one of three risk categories
  • ALL (Acute Lymphoblastic Leukemia)
  • Low-risk AML (Acute Myeloid Leukemia)
  • High-risk AML
  • End point disease-free survival in days
  • Time origin the date of transplant
  • Failure death or relapse
  • Censored no death or relapse by the end of the
    study

31
Survival Function
  • Failure time T
  • Survival function S(t) P(T gt t)
  • the probability that the failure time is greater
    than or equal to t
  • Hazard function
  • the chance that the failure occurs within the
    time interval t, t?t (let ?t be extremely
    small), given that the individual survives at t

32
Estimating Survival Function
  • Life-table
  • Divide the period of observation into a series of
    time intervals (often of equal length)
  • Compute the number of deaths and number of
    censored survival times
  • Estimate the survival probability in each
    interval
  • Take the product of the probabilities
  • Kaplan-Meier estimate
  • Similar to Life-Table method
  • Each interval has only one death occurred at the
    start of the interval
  • Cox proportional hazard model

33
A subsample from the BMT example
34
Life-Table
  • D death C censored N number of
    individuals who are alive (at risk) at beginning
    of the interval
  • N N (C/2) number of individuals who are at
    risk during the interval
  • S(t) cumulative survival

35
Kaplan-Meier Estimate
  • The beginning of each interval is determined by
    death
  • Each interval contains one death (or more if
    there are ties)
  • N(t) includes individuals with censored data at t

36
(No Transcript)
37
(No Transcript)
38
Cox Proportional Hazard Model
  • h0(t) baseline hazard function
  • The interpretation of b1

39
BMT Example
40
(No Transcript)
41
Reference
  • Wedderburn, R.W.M. (1974). Quasi-likelihood
    functions, generalized linear models and the
    Gaussian method. Biometrika, 61, 439-47.
  • Dr. Paul Alisons upcoming short courses
  • http//www.statisticalhorizons.com/index.html
  • Klein, J. P. and Moeschberger, M. L. (1997),
    Survival Analysis Techniques for Censored and
    Truncated Data, New York Springer-Verlag.
Write a Comment
User Comments (0)
About PowerShow.com