Repetition Multiple imputation - PowerPoint PPT Presentation

About This Presentation

Title:

Repetition Multiple imputation

Description:

... the progression of lung density loss between two treatment groups (active ... In most cases an increase of 5% or 10% should suffice, but depending on the ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 69

Provided by: ziad7

Category:

more less

Transcript and Presenter's Notes

Title: Repetition Multiple imputation

1
RepetitionMultiple imputation

Ziad Taib
Biostatistics, AZ
May 20, 2009

2
Data set with missing values
Result
Completed set
3
(No Transcript)
4
General principles
5
Informal justification
6
The algorithm (Estimation)
7
Pooling information
8
MI in practice
A simulation-based approach to missing data
1. Generate M gt 1 plausible versions of .
Complete Cases
2. Analyze each of the M datasets by standard
complete-data methods.

3. Combine the results across the M datasets (M
3-5 is usually OK).
imputation for Mth dataset
9
Software
2. SAS software (experimental) It is part of
SAS/STAT version 8.02 SAS institute paper on
multiple imputation, gives an example and SAS
code http//www.sas.com/rnd/app/papers/multipleim
putation.pdf SAS documentation on PROC
MI http//www.sas.com/rnd/app/papers/miv802.pdf S
AS documentation on PROC MIANALYZE http//www.sas.
com/rnd/app/papers/mianalyzev802.pdf
10
Software
1. Joe Schafers software from his web site.
(0) http//www.stat.psu.edu/7Ejls/misoftwa.html
top Schafer has written publicly available
software primarily for S-plus. There is a
stand-alone Windows package for data that is
multivariate normal. This web site contains much
useful information regarding multiple imputation.
11
Software
3. SOLAS version 3.0 (1K) http//www.statsol.ie/s
olas/solas.htm Windows based software that
performs different types of imputation
Hot-deck imputation Predictive
OLS/discriminant regression Nonparametric
based on propensity scores Last value carried
forward Will also combine parameter results
across the M analyses.
12
MI in SAS
13
MI Analysis of the Orthodontic Growth Data
14
RepetitionPower and sample size estimation

Ziad Taib
Biostatistics, AZ
May 20, 2009

Name, department
14
Date
15
Example Estimating the sample size needed in a
trial for chronic pulmonary diseases

Chronic pulmonary diseases (such as Chronic
Obstructive Pulmonary Disease COPD) concern the
development of emphysema.
A clinical trial using lung densitometry
(measuring the lung density through CT scan) as
an endpoint is typically designed as a
longitudinal study with repeated measurements at
fixed time intervals.
Since lung density measurements are closely
correlated with lung volume (inspiration level),
it is important to include lung volume
measurements in statistical analyses as a
longitudinal covariate.
Lung volume is normally measured at the same time
as the lung density is measured.

Name, department
Date
15
16

The clinical efficacy can be assessed by
comparing the progression of lung density loss
between two treatment groups (active vs. placebo)
using a random coefficient model a longitudinal
linear mixed model with a random intercept and
slope.
In planning the clinical trial with such complex
statistical analyses, the calculation of the
sample size required to achieve a given power to
detect a specified treatment difference is an
important, often complex issue.
In this example, an empirical approach is used to
calculate the sample size by simulating
trajectories of lung density and lung volume
using SAS. We present step-by-step details for
sample size calculation through simulation, and
discuss the pros and cons of this approach.

(1)
Name, department
Date
16
17

Yij is the efficacy endpoint (i.e. lung density)
measurement for subject i 1, 2,, n, at fixed
time point j 1, 2, , K.
TRT is an indicator of subject is treatment
group (i.e. TRT1 for active drug TRT0 for
placebo).
COVij is a longitudinal covariate (i.e. logarithm
of lung volume) for subject i 1, 2,, n, at
fixed time point j 1, 2, , K.
b0 and b2 are subject-specific random effects for
the intercept and slope, respectively, which are
from a normal distribution with mean 0 and
variance s02 and s02, respectively.
eij is the random error from a normal
distribution with mean 0 and variance s2 .
ß0, ß1, ß2, ß3, and ß4 are the fixed effects for
intercept, treatment, time, covariate and
interaction of treatment and time respectively.
Here we assume that the benefits can be assessed
quantitatively by comparing the slopes of lung
density trajectories for the two treatment
groups. This quantity is captured by ß4.

17
18
Sample Size Estimation Using Simulations

In the model, ß4 is typically our interest, which
is the difference in slope of time between two
treatment groups (active vs. placebo).
There is no direct mathematical formula to
calculate the sample size for a given statistical
power (i.e. 80) to test the null hypothesis
ß40 with a specified type I error (i.e. a0.05).
One approach to calculate the sample size for a
given power is through the simulation.

Date
18
19
Methods used

Assume we know the parameters ß0, ß1, ß2, ß3, and
ß4 , and s02 and s02 from either history data,
previous clinical trials or meaningful clinical
differences.
We want to test, the study design in terms of
number of time points (K) and fixed time
intervals (TIME), and the longitudinal covariate
COVij.
For a fixed equal sample size n for each
treatment, the trajectories of efficacy
measurement Yij (i.e. lung density) for the n
subjects can be simulated through the model for
each treatment group.
Then, perform a statistical test on ß4 0 by
using the SAS Proc MIXED on the simulated data
set, and record whether the p-value lt 0.05.

Date
19
20

5. The sample code to perform the test is as
follow
proc mixed data data
class id trt
model y trt time trttime cov / solution
random intercept time/ subject id type un
run
For the fixed sample size n per treatment group,
simulate M (i.e. M1000) times and the proportion
of significance tests of ß4 0 among the total M
simulations is the statistical power () for the
sample size n per treatment group.
Then, adjust the sample size n to achieve
desirable statistical power.

() In reality ß40.7 gt0 (in our simulations) so
the proportion of times we reject the hypothesis
ß4 0 of the power.
21
Simulating the response

In order to simulate the trajectories of Yij, it
is necessary to simulate the trajectories of
longitudinal covariate COVij. Similarly, assume
COVij is from a linear model regressing against
time with a random intercept
Where g0 and g1 are the fixed intercept and slope
respectively r0 and eij are from a normal
distribution with mean 0 and variance d12 and
d22, respectively. If we know the parameters (g0,
g1 , d12 and d22 ) from history data or previous
clinical trials for the study population, it will
be simple to simulate the trajectories of the
longitudinal covariate COVij by using SAS random
generating functions

(2)
Name, department
Date
21
22

Summary
1. Obtain the pre-specified parameters through
either history data, previous clinical trials or
meaningful clinical difference to be tested from
clinicians
2. Specify a desired statistical power (i.e. 80)
and a type-1 error rate (i.e. 5)
3. Simulate trajectories of efficacy measurement
(i.e. lung density) and longitudinal covariate
(i.e. logarithm of lung volume) for a fixed
sample size (n) of subjects within each treatment
arm
A. Trajectories of longitudinal covariate (i.e.
logarithm of lung volume) are simulated through
model (2)
B. Trajectories of efficacy measurement (i.e.
lung density) are simulated through model (1)

Date
22
23

4. Perform the statistical test on ß40 based on
the simulated data set. Record whether a p-value
lt 0.05 was obtained
5. Repeat steps 3 and 4 M (i.e. M1000) times and
calculate the statistical power for the fixed
sample size
6. Repeat steps 3 - 5 for various values of n.
Stop when desired statistical power is obtained

Name, department
Date
23
24
Results Example of a Simulation

Assume there are two treatment groups (active vs.
placebo) in a study design. The efficacy endpoint
along with the longitudinal covariate will be
measured at K4 time points at baseline, 1 year,
2 years and 3 years. All corresponding parameters
specified in model (1) and (2) could be obtained
either through history data, previous clinical
trials or meaningful clinical difference to be
tested from clinicians. For purpose of
simulation, they are randomly selected and
specified as below

Name, department
Date
24
25

The summary of statistical power for a given
sample size per treatment based on M 1000
simulated data sets is listed below
Therefore, a sample size 45 per treatment arm has
an estimated statistical 80 power to detect the
treatment slope difference of 0.7 in a random
coefficient model for the study design above.

n
Name, department
Date
25
26
Conclusions and Discussion

In practice, it is rarely the case that all
subjects have the complete data for all visits in
the study because of missing certain study
visits, drop out or other reasons. Since our
simulation framework assumes there are no missing
observations, we recommend that the implemented
sample size for the designed trial include more
subjects than the number estimated from the
simulation. In most cases an increase of 5 or
10 should suffice, but depending on the
characteristics of the designed trial such as the
study population, difficulty of study procedure,
difficulty of study measurement etc to cause the
subjects drop out or missing of study
measurements. The appropriate percentage could
vary.

Date
26
27
Post-Hoc Power (also known as observed power or
retrospective power)

You have collected the data, ran an appropriate
statistical analysis, and did not observe
statistical significance as indicated by a
relatively large p-value. So you decide to
compute post-hoc power to see how powerful the
test was, which, by itself is essentially an
empty, meaningless result.
Post-hoc power is merely a one-to-one
transformation of the p-value (based on the
F-statistic and degrees of freedom as illustrated
above).
In this situation power was computed based only
on what this particular sample data showed the
observed difference in means, the computed
standard error, and the actual sample sizes of
the groups all contributed to the observed
power exactly as they did to the p-value.

So, power calculations can only be considered
as a prospective or an "a priori" concept. Power
calculations should be directed towards planning
a study, not an after-theexperiment review of the
results.
None of the SAS statistical procedures (e.g.,
PROCs REG, TTEST, GLM, or MIXED and others)
provide retrospective (post hoc) power
calculations. (However, through saving results
from PROC MIXED with the ODS and following
through with a few basic SAS functions, it is
quite simple to compute them in a DATA step or
with the inputs to PROC POWER or PROC GLMPOWER.)

29
(No Transcript)
30
Sample Questions for the Final Exam
Ziad Taib Biostatistics, AZ MV, CTH
Name, department
Date
30
31
Question 1

Formulate the general LMM and
State its underlying assumptions.
Explain/interpret its ingredients
Explain what is meant by the marginal model
Explain what is meant by the hierarchical model.
Explain the difference between the marginal- and
the hierarchical model

Date
31
32
Question 2

Explain how and why the predicted values of the
random effects in a linear mixed model can be
used to identify outliers

Date
32
33
Question 3

Formulate the generalized linear mixed model and
give an example of your choice of its use.

Date
33
34
Question 4

Skiss for how we can obtain a useful formula for
the prediction of the random effects. Only a
principle description is needed.

Date
34
35
Question 5

Define missing at random and missing completely
at random. Argue why the the direct maximum
likelihood method can be used under a suitable
ignorability condition.

Date
35
36
Question 6

Explain what is meant by Multiple Imputation
(MI). Describe the main algorithm used in MI and
give a informal argument for why it is valid.

Date
36
37
Question 7

Give two different candidates for the definition
of residual in a general LMM.

Date
37
38
Answers to samplequestion 1

Ziad Taib
Biostatistics, AZ
May 20, 2009

Name, department
38
Date
39
(No Transcript)
40
mij P(Yij1) p EYij
1-P(Yij1)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
()
X
()
()
X
()
()
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
Answers to samplequestions 2

Ziad Taib
Biostatistics, AZ
May 20, 2009

Name, department
57
Date
58
(No Transcript)
59
obs
Lund0 Göteborg1
Lund
Göteborg
Lund
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
Lund
68
(No Transcript)

Write a Comment

User Comments (0)