Repetition Multiple imputation - PowerPoint PPT Presentation

About This Presentation
Title:

Repetition Multiple imputation

Description:

... the progression of lung density loss between two treatment groups (active ... In most cases an increase of 5% or 10% should suffice, but depending on the ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 69
Provided by: ziad7
Category:

less

Transcript and Presenter's Notes

Title: Repetition Multiple imputation


1
RepetitionMultiple imputation
  • Ziad Taib
  • Biostatistics, AZ
  • May 20, 2009

2
Data set with missing values
Result
Completed set
3
(No Transcript)
4
General principles
5
Informal justification
6
The algorithm (Estimation)
7
Pooling information
8
MI in practice
A simulation-based approach to missing data
1. Generate M gt 1 plausible versions of .
Complete Cases
2. Analyze each of the M datasets by standard
complete-data methods.





3. Combine the results across the M datasets (M
3-5 is usually OK).
imputation for Mth dataset
9
Software
2. SAS software (experimental) It is part of
SAS/STAT version 8.02 SAS institute paper on
multiple imputation, gives an example and SAS
code http//www.sas.com/rnd/app/papers/multipleim
putation.pdf SAS documentation on PROC
MI http//www.sas.com/rnd/app/papers/miv802.pdf S
AS documentation on PROC MIANALYZE http//www.sas.
com/rnd/app/papers/mianalyzev802.pdf
10
Software
1. Joe Schafers software from his web site.
(0) http//www.stat.psu.edu/7Ejls/misoftwa.html
top Schafer has written publicly available
software primarily for S-plus. There is a
stand-alone Windows package for data that is
multivariate normal. This web site contains much
useful information regarding multiple imputation.
11
Software
3. SOLAS version 3.0 (1K) http//www.statsol.ie/s
olas/solas.htm Windows based software that
performs different types of imputation
Hot-deck imputation Predictive
OLS/discriminant regression Nonparametric
based on propensity scores Last value carried
forward Will also combine parameter results
across the M analyses.
12
MI in SAS
13
MI Analysis of the Orthodontic Growth Data
14
RepetitionPower and sample size estimation
  • Ziad Taib
  • Biostatistics, AZ
  • May 20, 2009

Name, department
14
Date
15
Example Estimating the sample size needed in a
trial for chronic pulmonary diseases
  • Chronic pulmonary diseases (such as Chronic
    Obstructive Pulmonary Disease COPD) concern the
    development of emphysema.
  • A clinical trial using lung densitometry
    (measuring the lung density through CT scan) as
    an endpoint is typically designed as a
    longitudinal study with repeated measurements at
    fixed time intervals.
  • Since lung density measurements are closely
    correlated with lung volume (inspiration level),
    it is important to include lung volume
    measurements in statistical analyses as a
    longitudinal covariate.
  • Lung volume is normally measured at the same time
    as the lung density is measured.

Name, department
Date
15
16
  • The clinical efficacy can be assessed by
    comparing the progression of lung density loss
    between two treatment groups (active vs. placebo)
    using a random coefficient model a longitudinal
    linear mixed model with a random intercept and
    slope.
  • In planning the clinical trial with such complex
    statistical analyses, the calculation of the
    sample size required to achieve a given power to
    detect a specified treatment difference is an
    important, often complex issue.
  • In this example, an empirical approach is used to
    calculate the sample size by simulating
    trajectories of lung density and lung volume
    using SAS. We present step-by-step details for
    sample size calculation through simulation, and
    discuss the pros and cons of this approach.

(1)
Name, department
Date
16
17
  • Yij is the efficacy endpoint (i.e. lung density)
    measurement for subject i 1, 2,, n, at fixed
    time point j 1, 2, , K.
  • TRT is an indicator of subject is treatment
    group (i.e. TRT1 for active drug TRT0 for
    placebo).
  • COVij is a longitudinal covariate (i.e. logarithm
    of lung volume) for subject i 1, 2,, n, at
    fixed time point j 1, 2, , K.
  • b0 and b2 are subject-specific random effects for
    the intercept and slope, respectively, which are
    from a normal distribution with mean 0 and
    variance s02 and s02, respectively.
  • eij is the random error from a normal
    distribution with mean 0 and variance s2 .
  • ß0, ß1, ß2, ß3, and ß4 are the fixed effects for
    intercept, treatment, time, covariate and
    interaction of treatment and time respectively.
  • Here we assume that the benefits can be assessed
    quantitatively by comparing the slopes of lung
    density trajectories for the two treatment
    groups. This quantity is captured by ß4.

17
18
Sample Size Estimation Using Simulations
  • In the model, ß4 is typically our interest, which
    is the difference in slope of time between two
    treatment groups (active vs. placebo).
  • There is no direct mathematical formula to
    calculate the sample size for a given statistical
    power (i.e. 80) to test the null hypothesis
    ß40 with a specified type I error (i.e. a0.05).
  • One approach to calculate the sample size for a
    given power is through the simulation.

Date
18
19
Methods used
  • Assume we know the parameters ß0, ß1, ß2, ß3, and
    ß4 , and s02 and s02 from either history data,
    previous clinical trials or meaningful clinical
    differences.
  • We want to test, the study design in terms of
    number of time points (K) and fixed time
    intervals (TIME), and the longitudinal covariate
    COVij.
  • For a fixed equal sample size n for each
    treatment, the trajectories of efficacy
    measurement Yij (i.e. lung density) for the n
    subjects can be simulated through the model for
    each treatment group.
  • Then, perform a statistical test on ß4 0 by
    using the SAS Proc MIXED on the simulated data
    set, and record whether the p-value lt 0.05.

Date
19
20
  • 5. The sample code to perform the test is as
    follow
  • proc mixed data data
  • class id trt
  • model y trt time trttime cov / solution
  • random intercept time/ subject id type un
  • run
  • For the fixed sample size n per treatment group,
    simulate M (i.e. M1000) times and the proportion
    of significance tests of ß4 0 among the total M
    simulations is the statistical power () for the
    sample size n per treatment group.
  • Then, adjust the sample size n to achieve
    desirable statistical power.

() In reality ß40.7 gt0 (in our simulations) so
the proportion of times we reject the hypothesis
ß4 0 of the power.
21
Simulating the response
  • In order to simulate the trajectories of Yij, it
    is necessary to simulate the trajectories of
    longitudinal covariate COVij. Similarly, assume
    COVij is from a linear model regressing against
    time with a random intercept
  • Where g0 and g1 are the fixed intercept and slope
    respectively r0 and eij are from a normal
    distribution with mean 0 and variance d12 and
    d22, respectively. If we know the parameters (g0,
    g1 , d12 and d22 ) from history data or previous
    clinical trials for the study population, it will
    be simple to simulate the trajectories of the
    longitudinal covariate COVij by using SAS random
    generating functions

(2)
Name, department
Date
21
22
  • Summary
  • 1. Obtain the pre-specified parameters through
    either history data, previous clinical trials or
    meaningful clinical difference to be tested from
    clinicians
  • 2. Specify a desired statistical power (i.e. 80)
    and a type-1 error rate (i.e. 5)
  • 3. Simulate trajectories of efficacy measurement
    (i.e. lung density) and longitudinal covariate
    (i.e. logarithm of lung volume) for a fixed
    sample size (n) of subjects within each treatment
    arm
  • A. Trajectories of longitudinal covariate (i.e.
    logarithm of lung volume) are simulated through
    model (2)
  • B. Trajectories of efficacy measurement (i.e.
    lung density) are simulated through model (1)

Date
22
23
  • 4. Perform the statistical test on ß40 based on
    the simulated data set. Record whether a p-value
    lt 0.05 was obtained
  • 5. Repeat steps 3 and 4 M (i.e. M1000) times and
    calculate the statistical power for the fixed
    sample size
  • 6. Repeat steps 3 - 5 for various values of n.
    Stop when desired statistical power is obtained

Name, department
Date
23
24
Results Example of a Simulation
  • Assume there are two treatment groups (active vs.
    placebo) in a study design. The efficacy endpoint
    along with the longitudinal covariate will be
    measured at K4 time points at baseline, 1 year,
    2 years and 3 years. All corresponding parameters
    specified in model (1) and (2) could be obtained
    either through history data, previous clinical
    trials or meaningful clinical difference to be
    tested from clinicians. For purpose of
    simulation, they are randomly selected and
    specified as below

Name, department
Date
24
25
  • The summary of statistical power for a given
    sample size per treatment based on M 1000
    simulated data sets is listed below
  • Therefore, a sample size 45 per treatment arm has
    an estimated statistical 80 power to detect the
    treatment slope difference of 0.7 in a random
    coefficient model for the study design above.

n
Name, department
Date
25
26
Conclusions and Discussion
  • In practice, it is rarely the case that all
    subjects have the complete data for all visits in
    the study because of missing certain study
    visits, drop out or other reasons. Since our
    simulation framework assumes there are no missing
    observations, we recommend that the implemented
    sample size for the designed trial include more
    subjects than the number estimated from the
    simulation. In most cases an increase of 5 or
    10 should suffice, but depending on the
    characteristics of the designed trial such as the
    study population, difficulty of study procedure,
    difficulty of study measurement etc to cause the
    subjects drop out or missing of study
    measurements. The appropriate percentage could
    vary.

Date
26
27
Post-Hoc Power (also known as observed power or
retrospective power)
  • You have collected the data, ran an appropriate
    statistical analysis, and did not observe
    statistical significance as indicated by a
    relatively large p-value. So you decide to
    compute post-hoc power to see how powerful the
    test was, which, by itself is essentially an
    empty, meaningless result.
  • Post-hoc power is merely a one-to-one
    transformation of the p-value (based on the
    F-statistic and degrees of freedom as illustrated
    above).
  • In this situation power was computed based only
    on what this particular sample data showed the
    observed difference in means, the computed
    standard error, and the actual sample sizes of
    the groups all contributed to the observed
    power exactly as they did to the p-value.

28
  • So, power calculations can only be considered
    as a prospective or an "a priori" concept. Power
    calculations should be directed towards planning
    a study, not an after-theexperiment review of the
    results.
  • None of the SAS statistical procedures (e.g.,
    PROCs REG, TTEST, GLM, or MIXED and others)
    provide retrospective (post hoc) power
    calculations. (However, through saving results
    from PROC MIXED with the ODS and following
    through with a few basic SAS functions, it is
    quite simple to compute them in a DATA step or
    with the inputs to PROC POWER or PROC GLMPOWER.)

29
(No Transcript)
30
Sample Questions for the Final Exam
Ziad Taib Biostatistics, AZ MV, CTH
Name, department
Date
30
31
Question 1
  • Formulate the general LMM and
  • State its underlying assumptions.
  • Explain/interpret its ingredients
  • Explain what is meant by the marginal model
  • Explain what is meant by the hierarchical model.
  • Explain the difference between the marginal- and
    the hierarchical model

Date
31
32
Question 2
  • Explain how and why the predicted values of the
    random effects in a linear mixed model can be
    used to identify outliers

Date
32
33
Question 3
  • Formulate the generalized linear mixed model and
    give an example of your choice of its use.

Date
33
34
Question 4
  • Skiss for how we can obtain a useful formula for
    the prediction of the random effects. Only a
    principle description is needed.

Date
34
35
Question 5
  • Define missing at random and missing completely
    at random. Argue why the the direct maximum
    likelihood method can be used under a suitable
    ignorability condition.

Date
35
36
Question 6
  • Explain what is meant by Multiple Imputation
    (MI). Describe the main algorithm used in MI and
    give a informal argument for why it is valid.

Date
36
37
Question 7
  • Give two different candidates for the definition
    of residual in a general LMM.

Date
37
38
Answers to samplequestion 1
  • Ziad Taib
  • Biostatistics, AZ
  • May 20, 2009

Name, department
38
Date
39
(No Transcript)
40
mij P(Yij1) p EYij
1-P(Yij1)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
()
X
()
()
X
()
()
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
Answers to samplequestions 2
  • Ziad Taib
  • Biostatistics, AZ
  • May 20, 2009

Name, department
57
Date
58
(No Transcript)
59
obs
Lund0 Göteborg1
Lund
Göteborg
Lund
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
Lund
68
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com