Designing longitudinal studies in epidemiology - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Designing longitudinal studies in epidemiology

Description:

Example. At baseline and at one time subsequently, six cognitive tests were administered ... between two measures of the same subject separated by one unit (rho) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 48
Provided by: xbas4
Category:

less

Transcript and Presenter's Notes

Title: Designing longitudinal studies in epidemiology


1
Designing longitudinal studiesin epidemiology
  • Donna Spiegelman
  • Professor of Epidemiologic Methods
  • Departments of Epidemiology and Biostatistics
  • stdls_at_channing.harvard.edu
  • Xavier Basagana
  • Doctoral StudentDepartment of Biostatistics,
  • Harvard School of Public Health

2
Background
  • We develop methods for the design of longitudinal
    studies for the most common scenarios in
    epidemiology
  • There already exist some formulas for power and
    sample size calculations in this context.
  • All prior work has been developed for clinical
    trials applications

3
Background
  • Based on clinical trials
  • Some are based on test statistics that are not
    valid or less efficient in an observational
    context, where (e.g.
    ANCOVA).

4
Background
  • Based on clinical trials
  • In clinical trials
  • The time measure of interest is time from
    randomization ? everyone starts at the same time.
    We consider situations where, for example, age is
    the time variable of interest, and subjects do
    not start at the same age.
  • Time-invariant exposures
  • Exposure (treatment) prevalence is 50 by design

5
Xavier Basagañas Thesis
  • Derive study design formulas based on tests that
    are valid and efficient for observational
    studies, for two reasonable alternative
    hypotheses.
  • Comprehensively assess the effect of all
    parameters on power and sample size.
  • Extend the formulas to a context where not all
    subjects enter the study at the same time.
  • Extend formulas to the case of time-varying
    covariates, and compare it to the time-invariant
    covariates case.

6
Xavier Basagañas Thesis
  • Derive the optimal combination of number of
    subjects (n) and number of repeated measures
    (r1) when subject to a cost constraint.
  • Create a computer program to perform design
    computations. Intuitive parameterization and easy
    to use.

7
Notation and Preliminary Results
8
  • We study two alternative hypotheses
  • Constant Mean Difference (CMD).

9
  • Linearly Divergent Differences (LDD)

10
Intuitive parameterization of the alternative
hypothesis
  • the mean response at baseline (or at the
    mean initial time) in the unexposed group, where
  • the percent difference between exposed and
    unexposed groups at baseline (or at the mean
    initial time), where

11
Intuitive parameterization of the alternative
hypothesis (2)
  • the percent change from baseline (or from the
    mean initial time) to end of follow-up (or to
    the mean final time) in the unexposed group,
    where
  • When is not fixed, is defined at time s
    instead of at time
  • the percent difference between the change from
    baseline (or from the mean initial time) to end
    of follow-up (or mean final time) in the exposed
    group and the unexposed group, where
  • When , will be defined as the
    percent change from baseline (or from the mean
    initial time) to the end of follow-up (or to the
    mean final time) in the exposed group, i.e.

12
Notation Preliminary Results
  • We consider studies where the interval between
    visits (s) is fixed but the duration of the study
    is free (e.g. participants may respond to
    questionnaires every two years)
  • Increasing r involves increasing the duration of
    the study
  • We also consider studies where the duration of
    the study, ?, is fixed, but the interval between
    visits is free (e.g. the study is 5 years long)
  • Increasing r involves increasing the frequency of
    the measurements, s
  • ? s r.

13
Notation Preliminary Results
  • Model
  • The generalized least squares (GLS) estimator of
    B is
  • Power formula

14
Notation Preliminary Results
  • Let ?lm be the (l,m)th element of ?-1
  • Assuming that the time distribution is
    independent of exposure group.
  • Then, under CMD
  • Under LDD

15
Correlation structures
  • We consider three common correlation structures
  • Compound symmetry (CS).

16
Correlation structures
  • Damped Exponential (DEX)

? 0 CS
? 0.3 CS
? 1 AR(1)
17
Correlation structures
  • Random intercepts and slopes (RS).
  • Reparameterizing
  • is the reliability coefficient at
    baseline
  • is the slope reliability at the end
    of follow-up ( 0 is CS 1 all variation in
    slopes is between subjects).
  • With this correlation structure, the variance of
    the response changes with time, i.e. this
    correlation structure gives a heteroscedastic
    model.

18
Example
  • Goal is to investigate the effect of indicators
    of socioeconomic status and post-menopausal
    hormone use on cognitive function (CMD) and
    cognitive decline (LDD)
  • Pilot study by Lee S, Kawachi I, Berkman LF,
    Grodstein F (Education, other socioeconomic
    indicators, and cognitive function. Am J
    Epidemiol 2003 157 712-720). Will denote as
    Grodstein.
  • Design questions include power of the published
    study to detect effects of specified magnitude,
    the number and timing of additional tests in
    order to obtain a study with the desired power to
    detect effects of specified magnitude, and the
    optimal number of participants and measurements
    needed in a de novo study of these issues

19
Example
  • At baseline and at one time subsequently, six
    cognitive tests were administered to 15,654
    participants in the Nurses Health Study
  • Outcome Telephone Interview for Cognitive Status
    (TICS)
  • ?0032.7 (4)
  • Implies model
  • 1 point/10 years of age

20
Example
  • Exposure Graduate school degree vs. not (GRAD)
  • Corr(GRAD, age)-0.01
  • points
  • Exposure Post-menopausal hormone use (CURRHORM)
  • Corr(CURRHORM, age)-0.06
  • points
  • Time age (years) is the best choice, not
    questionnaire cycle or calendar year of test
  • The mean age was 74 and V(t0)?4.

21
Example
  • The estimated covariance parameters were
  • SAS code to fit the LDD model with CS covariance
  • proc mixed
  • class id
  • model ticsgrad age gradage/s
  • random id
  • SAS code to fit the LDD model with RS covariance

22
Program optitxs.r makes it all possible
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
http//www.hsph.harvard.edu/faculty/spiegelman/sof
tware.html
29
http//www.hsph.harvard.edu/faculty/spiegelman/opt
itxs.html
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Illustration of use of softwareoptitxs.r
  • Well calculate the power of the Grodsteins
    published study to detect the observed 70
    difference in rates of decline between those with
    more than high school vs. others
  • Recall that 6.2 of NHS had more than high
    school there was a 0.3 decline in cognitive
    function per year

36
gt long.power() Press ltEscgt to quit Constant mean
difference (CMD) or Linearly divergent difference
(LDD)? ldd The alternative is LDD. Enter the
total sample size (N) 15000 Enter the number of
post-baseline measures (rgt0) 1 Enter the time
between repeated measures (s) 2 Enter the
exposure prevalence (pe) (0ltpelt1) 0.062 Enter
the variance of the time variable at baseline,
V(t0) (enter 0 if all participants begin at
the same time) 4 Enter the correlation between
the time variable at baseline and exposure,
rhoe,t0 (enter 0 if all participants begin
at the same time) -0.01 Will you specify the
alternative hypothesis on the absolute (beta
coefficient) scale (1) or the relative
(percent) scale (2)? 2 The alternative hypothesis
will be specified on the relative (percent)
change scale.
37
Enter mean response at baseline among unexposed
(mu00) 32.7 Enter the percent change from
baseline to end of follow-up among unexposed (p2)
(e.g. enter 0.10 for a 10 change)
-0.006 Enter the percent difference between the
change from baseline to end of follow-up in the
exposed group and the unexposed group (p3) (e.g.
enter 0.10 for a 10 difference) 0.7 Which
covariance matrix are you assuming compound
symmetry (1), damped exponential (2) or random
slopes (3)? 2 You are assuming DEX
covariance Enter the residual variance of the
response given the assumed model covariates
(sigma2) 12 Enter the correlation between two
measures of the same subject separated by one
unit (rho) 0.3 Enter the damping coefficient
(theta) 0.10 Power 0.4206059
38
Power of current study
  • To detect the observed 70 difference in
    cognitive decline by GRAD
  • CS 44
  • RS 35
  • DEX 42
  • To detect a hypothesized 10 difference in
    cognitive decline by current hormone use
  • CS DEX 7
  • RS 6

39
How many additional measurements are needed when
tests are administered every 2 years how
many more years of follow-up are needed...
  • To detect the observed 70 difference in
    cognitive decline by GRAD with 90 power?
  • CS, DEX , RS 3 post-baseline
    measurements 6
  • one more 5 year grant cycle
  • To detect a hypothesized 20 difference in
    cognitive decline by current hormone use with 90
    power?
  • CS, DEX 6 post-baseline
    measurements 12
  • More than two 5 year grant cycles
  • N15,000 for these calculations

40
How many more measurements should be taken in
four (1 NIH grant cycle) and eight years of
follow-up (two NIH grant cycles)...
  • To detect the observed
  • 70 difference in cognitive
  • decline by GRAD with 90
  • power?
  • To detect a hypothesized
  • 20 difference in cognitive
  • decline by current hormone
  • use with 90 power?

41
Optimize (N,r) in a new study of cognitive decline
  • Assume
  • 4 years of follow-up (1 NIH grant cycle)
  • cost of recruitment and baseline measurements are
    twice that of subsequent measurements
  • GRAD
  • (N,r)(26,795 1) CS
  • (26,9301) DEX
  • (28,9451) RS
  • CURRHORM
  • (N,r)(97,662 1) CS
  • (98,155 1) DEX
  • (105,4701) RS

42
Conclusions
Re Constant Mean Difference (CMD)
43
Conclusions
  • CMD
  • If all observations have the same cost, one would
    not take repeated measures.
  • If subsequent measures are cheaper, one would
    take no repeated measures or just a small number
    if the correlation between measures is large.
  • If deviations from CS exist, it is advisable to
    take more repeated measures.
  • Power increases as and as
  • Power increases as Var( ) goes to 0

44
Conclusions
  • LDD
  • If the follow-up period is not fixed, choose the
    maximum length of follow-up possible (except when
    RS is assumed).
  • If the follow-up period fixed, one would take
    more than one repeated measure only when the
    subsequent measures are more than five times
    cheaper. When there are departures from CS,
    values of ? around 10 or 20 are needed to justify
    taking 3 or 4 measures.
  • Power increases as , as , as
    slope reliability goes to 0, as Var( )
    increases, and as the correlation between
    and exposure goes to 0

45
Conclusions
  • LDD
  • The optimal (N,r) and the resulting power can
    strongly depend on the correlation structure.
    Combinations that are optimal for one correlation
    may be bad for another.
  • All these decisions are based on power
    considerations alone. There might be other
    reasons to take repeated measures.
  • Sensitivity analysis. Our program.

46
Future work
  • Develop formulas for time-varying exposure.
  • Include dropout
  • For sample size calculations, simply inflate the
    sample size by a factor of 1/(1-f).
  • However, dropout can alter the relationship
    between N and r.

47
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com