Designing longitudinal studies in epidemiology - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Designing longitudinal studies in epidemiology

Description:

Example. At baseline and at one time subsequently, six cognitive tests were administered ... between two measures of the same subject separated by one unit (rho) ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 48

Provided by: xbas4

Category:

more less

Transcript and Presenter's Notes

Title: Designing longitudinal studies in epidemiology

1
Designing longitudinal studiesin epidemiology

Donna Spiegelman
Professor of Epidemiologic Methods
Departments of Epidemiology and Biostatistics
stdls_at_channing.harvard.edu
Xavier Basagana
Doctoral StudentDepartment of Biostatistics,
Harvard School of Public Health

2
Background

We develop methods for the design of longitudinal
studies for the most common scenarios in
epidemiology
There already exist some formulas for power and
sample size calculations in this context.
All prior work has been developed for clinical
trials applications

3
Background

Based on clinical trials
Some are based on test statistics that are not
valid or less efficient in an observational
context, where (e.g.
ANCOVA).

4
Background

Based on clinical trials
In clinical trials
The time measure of interest is time from
randomization ? everyone starts at the same time.
We consider situations where, for example, age is
the time variable of interest, and subjects do
not start at the same age.
Time-invariant exposures
Exposure (treatment) prevalence is 50 by design

5
Xavier Basagañas Thesis

Derive study design formulas based on tests that
are valid and efficient for observational
studies, for two reasonable alternative
hypotheses.
Comprehensively assess the effect of all
parameters on power and sample size.
Extend the formulas to a context where not all
subjects enter the study at the same time.
Extend formulas to the case of time-varying
covariates, and compare it to the time-invariant
covariates case.

6
Xavier Basagañas Thesis

Derive the optimal combination of number of
subjects (n) and number of repeated measures
(r1) when subject to a cost constraint.
Create a computer program to perform design
computations. Intuitive parameterization and easy
to use.

7
Notation and Preliminary Results
8

We study two alternative hypotheses

Constant Mean Difference (CMD).

Linearly Divergent Differences (LDD)

10
Intuitive parameterization of the alternative
hypothesis

the mean response at baseline (or at the
mean initial time) in the unexposed group, where
the percent difference between exposed and
unexposed groups at baseline (or at the mean
initial time), where

11
Intuitive parameterization of the alternative
hypothesis (2)

the percent change from baseline (or from the
mean initial time) to end of follow-up (or to
the mean final time) in the unexposed group,
where
When is not fixed, is defined at time s
instead of at time
the percent difference between the change from
baseline (or from the mean initial time) to end
of follow-up (or mean final time) in the exposed
group and the unexposed group, where
When , will be defined as the
percent change from baseline (or from the mean
initial time) to the end of follow-up (or to the
mean final time) in the exposed group, i.e.

12
Notation Preliminary Results

We consider studies where the interval between
visits (s) is fixed but the duration of the study
is free (e.g. participants may respond to
questionnaires every two years)
Increasing r involves increasing the duration of
the study
We also consider studies where the duration of
the study, ?, is fixed, but the interval between
visits is free (e.g. the study is 5 years long)
Increasing r involves increasing the frequency of
the measurements, s
? s r.

13
Notation Preliminary Results

Model
The generalized least squares (GLS) estimator of
B is
Power formula

14
Notation Preliminary Results

Let ?lm be the (l,m)th element of ?-1
Assuming that the time distribution is
independent of exposure group.
Then, under CMD
Under LDD

15
Correlation structures

We consider three common correlation structures
Compound symmetry (CS).

16
Correlation structures

Damped Exponential (DEX)

? 0 CS
? 0.3 CS
? 1 AR(1)
17
Correlation structures

Random intercepts and slopes (RS).
Reparameterizing
is the reliability coefficient at
baseline
is the slope reliability at the end
of follow-up ( 0 is CS 1 all variation in
slopes is between subjects).
With this correlation structure, the variance of
the response changes with time, i.e. this
correlation structure gives a heteroscedastic
model.

18
Example

Goal is to investigate the effect of indicators
of socioeconomic status and post-menopausal
hormone use on cognitive function (CMD) and
cognitive decline (LDD)
Pilot study by Lee S, Kawachi I, Berkman LF,
Grodstein F (Education, other socioeconomic
indicators, and cognitive function. Am J
Epidemiol 2003 157 712-720). Will denote as
Grodstein.
Design questions include power of the published
study to detect effects of specified magnitude,
the number and timing of additional tests in
order to obtain a study with the desired power to
detect effects of specified magnitude, and the
optimal number of participants and measurements
needed in a de novo study of these issues

19
Example

At baseline and at one time subsequently, six
cognitive tests were administered to 15,654
participants in the Nurses Health Study
Outcome Telephone Interview for Cognitive Status
(TICS)
?0032.7 (4)
Implies model
1 point/10 years of age

20
Example

Exposure Graduate school degree vs. not (GRAD)
Corr(GRAD, age)-0.01
points
Exposure Post-menopausal hormone use (CURRHORM)
Corr(CURRHORM, age)-0.06
points
Time age (years) is the best choice, not
questionnaire cycle or calendar year of test
The mean age was 74 and V(t0)?4.

21
Example

The estimated covariance parameters were
SAS code to fit the LDD model with CS covariance
proc mixed
class id
model ticsgrad age gradage/s
random id
SAS code to fit the LDD model with RS covariance

22
Program optitxs.r makes it all possible
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
http//www.hsph.harvard.edu/faculty/spiegelman/sof
tware.html
29
http//www.hsph.harvard.edu/faculty/spiegelman/opt
itxs.html
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Illustration of use of softwareoptitxs.r

Well calculate the power of the Grodsteins
published study to detect the observed 70
difference in rates of decline between those with
more than high school vs. others
Recall that 6.2 of NHS had more than high
school there was a 0.3 decline in cognitive
function per year

36
gt long.power() Press ltEscgt to quit Constant mean
difference (CMD) or Linearly divergent difference
(LDD)? ldd The alternative is LDD. Enter the
total sample size (N) 15000 Enter the number of
post-baseline measures (rgt0) 1 Enter the time
between repeated measures (s) 2 Enter the
exposure prevalence (pe) (0ltpelt1) 0.062 Enter
the variance of the time variable at baseline,
V(t0) (enter 0 if all participants begin at
the same time) 4 Enter the correlation between
the time variable at baseline and exposure,
rhoe,t0 (enter 0 if all participants begin
at the same time) -0.01 Will you specify the
alternative hypothesis on the absolute (beta
coefficient) scale (1) or the relative
(percent) scale (2)? 2 The alternative hypothesis
will be specified on the relative (percent)
change scale.
37
Enter mean response at baseline among unexposed
(mu00) 32.7 Enter the percent change from
baseline to end of follow-up among unexposed (p2)
(e.g. enter 0.10 for a 10 change)
-0.006 Enter the percent difference between the
change from baseline to end of follow-up in the
exposed group and the unexposed group (p3) (e.g.
enter 0.10 for a 10 difference) 0.7 Which
covariance matrix are you assuming compound
symmetry (1), damped exponential (2) or random
slopes (3)? 2 You are assuming DEX
covariance Enter the residual variance of the
response given the assumed model covariates
(sigma2) 12 Enter the correlation between two
measures of the same subject separated by one
unit (rho) 0.3 Enter the damping coefficient
(theta) 0.10 Power 0.4206059
38
Power of current study

To detect the observed 70 difference in
cognitive decline by GRAD
CS 44
RS 35
DEX 42
To detect a hypothesized 10 difference in
cognitive decline by current hormone use
CS DEX 7
RS 6

39
How many additional measurements are needed when
tests are administered every 2 years how
many more years of follow-up are needed...

To detect the observed 70 difference in
cognitive decline by GRAD with 90 power?
CS, DEX , RS 3 post-baseline
measurements 6
one more 5 year grant cycle
To detect a hypothesized 20 difference in
cognitive decline by current hormone use with 90
power?
CS, DEX 6 post-baseline
measurements 12
More than two 5 year grant cycles
N15,000 for these calculations

40
How many more measurements should be taken in
four (1 NIH grant cycle) and eight years of
follow-up (two NIH grant cycles)...

To detect the observed
70 difference in cognitive
decline by GRAD with 90
power?
To detect a hypothesized
20 difference in cognitive
decline by current hormone
use with 90 power?

41
Optimize (N,r) in a new study of cognitive decline

Assume
4 years of follow-up (1 NIH grant cycle)
cost of recruitment and baseline measurements are
twice that of subsequent measurements
GRAD
(N,r)(26,795 1) CS
(26,9301) DEX
(28,9451) RS
CURRHORM
(N,r)(97,662 1) CS
(98,155 1) DEX
(105,4701) RS

42
Conclusions
Re Constant Mean Difference (CMD)
43
Conclusions

CMD
If all observations have the same cost, one would
not take repeated measures.
If subsequent measures are cheaper, one would
take no repeated measures or just a small number
if the correlation between measures is large.
If deviations from CS exist, it is advisable to
take more repeated measures.
Power increases as and as
Power increases as Var( ) goes to 0

44
Conclusions

LDD
If the follow-up period is not fixed, choose the
maximum length of follow-up possible (except when
RS is assumed).
If the follow-up period fixed, one would take
more than one repeated measure only when the
subsequent measures are more than five times
cheaper. When there are departures from CS,
values of ? around 10 or 20 are needed to justify
taking 3 or 4 measures.
Power increases as , as , as
slope reliability goes to 0, as Var( )
increases, and as the correlation between
and exposure goes to 0

45
Conclusions

LDD
The optimal (N,r) and the resulting power can
strongly depend on the correlation structure.
Combinations that are optimal for one correlation
may be bad for another.
All these decisions are based on power
considerations alone. There might be other
reasons to take repeated measures.
Sensitivity analysis. Our program.

46
Future work

Develop formulas for time-varying exposure.
Include dropout
For sample size calculations, simply inflate the
sample size by a factor of 1/(1-f).
However, dropout can alter the relationship
between N and r.

47
Thanks!

Write a Comment

User Comments (0)