Handling Missing Data in the Analysis of CTN Trials: - PowerPoint PPT Presentation

About This Presentation

Title:

Handling Missing Data in the Analysis of CTN Trials:

Description:

CTN Design & Analysis Workshop Handling Missing Data in the Analysis of CTN Trials: Pitfalls and Possible Solutions Neal Oden, PhD, DSC2-EMMES Gaurav Sharma, PhD ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 63

Provided by: Grant101

Learn more at: https://ctnlibrary.org

Category:

more less

Transcript and Presenter's Notes

Title: Handling Missing Data in the Analysis of CTN Trials:

1
CTN Design Analysis Workshop
Handling Missing Data in the Analysis of CTN
Trials Pitfalls and Possible Solutions
Neal Oden, PhD, DSC2-EMMES Gaurav Sharma, PhD,
DSC2-EMMES Paul Van Veldhuisen, PhD,
DSC2-EMMES Paul Wakim, PhD, CCTN, NIDA
15 March 2011
2
Todays Workshop

The problem
Prevention
Types of missing data
Analysis methods
Case study
Open discussion

3
Missing Data

Information within a trial that is meaningful for
analysis but not collected
Focus here mostly on primary outcome data, but
relevant to missing secondary outcomes and
covariates too

4
Missing Data

Randomization
Balances treatment groups for known and unknown
factors
Lose benefits if there is drop-out, as groups at
outcome may not have been similar at baseline
Intention-to-treat principle
Violates principle if not all participants
contribute to the primary analysis

5
Missing Data

If missing unrelated to assigned treatment
Reduces statistical power
If missing related to assigned treatment or to
outcome
Biases the estimate of the treatment effect

6
Causes of Missing Data

Due to discontinuation of study treatment
Outcomes undefined for some participants
QOL measures after death
Quantitative drug use hair analysis in
individuals without hair
Test fails/specimen lost
Attrition
Related to health status/drug use
Unrelated to health status/drug use (e.g., moved)

7
Continuing Data Collection for Drop-Outs

Distinction between
Premature end of treatment
AND
End of study
Does collecting data after premature end of
treatment make sense?

8
Rationale

Preserves intention-to-treat approach
Many CTN trials are pragmatic trials
NOT Does treatment work if perfectly delivered?
but RATHER
Is this a good treatment strategy or policy?
OR
What happens once treatment starts or is
recommended?

9
Rationale

Delivery of medicine deals with people in the
real world
A 100 efficacious cure for stimulant use is
useless for public health if nobody can stand it.
Strive to collect complete data for primary
outcome on ALL participants, even in those who do
not complete intervention
Too much missing data - gt no way result will be
believable no matter how sophisticated the
statistical method

10
Why Do We Like It?

Weight loss diet
People on the effective arm lose weight and stay
in the study
Some on the ineffective arm get discouraged and
quit
If we analyzed only the people who stayed in the
trial, the ineffective arm would look too good

11
Approaches to Missing Data

Design and conduct of clinical trial that
minimizes missing data
May require trade-offs with generalizability
Apply analysis methods that use information in
observed data to help analyze primary outcome
data in the presence of missing data

12
B. Franklin
An ounce of prevention is worth a pound of cure
13
Minimize Missing Data in.. Trial Design

Flexible dose
Target population
Allow rescue therapy for poor responders
Define primary outcomes that are highly
ascertainable
Minimize participant burden/reduce follow-up
Number of visits/assessments

14
Minimize Missing Data in... Trial Conduct

Explain importance of trial participation during
consent process
Emphasize to staff importance of maintaining
follow-up even when treatment is refused
Incentives
For participants, need to ensure level is not
viewed as coercive

15
Minimize Missing Data in... Trial Conduct

Expression of thanks
Written/verbal
Assistance with travel
Reminders before visits
Welcoming staff/friendly environment
Keep locator information current
Monitor and report to investigators extent of
missing data

16
Availability of Primary Outcome Percent of
Measures with Values(N29 trials)
17
Whats the big deal?
We need N 400 (based on power analysis) But we
expect 20 missing So we set the initial N
500 So that the final (analyzed) N 400
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
18
Technical terms that we cant escape
Missing at random (MAR) Missing completely at
random (MCAR) Missing not at random
(MNAR) Ignorable Non-ignorable
but what do they mean?
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
19
Missing Completely at Random (MCAR)
(Non-technical) Definition The fact that Y is
missing has nothing to do with the unobserved
value of Y, or with other variables Therefore Th
e set of participants with complete data can be
regarded as a simple random (or representative)
sample of all participants What to do? Ignore
the missing data and analyze the available data
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
20
Missing at Random (MAR)
(Non-technical) Definition The fact that Y is
missing can be explained by other observed values
of Y, or by other measured variables Therefore T
he observed data can be used to account for the
missing data What to do? Use Maximum Likelihood
or Multiple Imputation approach, and include in
the model the other measured variables that
explain missingness
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
21
Missing Not at Random (MNAR)
(Non-technical) Definition The fact that Y is
missing cannot be explained by other observed
values of Y, or by other measured
variables Therefore The observed data cannot be
used to account for the missing data and outside
information is needed In simple English We have
a problem
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
22
In Summary
Missingness (i.e. whether the data are missing or not) Missingness (i.e. whether the data are missing or not)
is related to is not related to
MCAR observed or unobserved data
MAR observed data unobserved data
MNAR unobserved data
Based on Graham 2009
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
23
Bottom Line
MCAR No big deal MAR Use available collected
data to explain missing mechanism, and use
existing statistical methods MNAR Need outside
information to explain missing mechanism
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
24
Ignorable Non-Ignorable (roughly speaking)

Ignorable (available data are sufficient)
Missing Completely At Random (MCAR)
Missing At Random (MAR)
Non-Ignorable (need outside information)
Missing Not At Random (MNAR)

National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
25
Missing Data Analysis Methods
26
Complete Case and Pairwise Deletion

CC PD
Y1 Y2 Y3 Y1 Y2
Y3
X X X X
X X
X X X X
X X
X X - X
X -
X X - X
X -
(Correlation Illustration)
Simple, Default in Statistical Software
Potential loss of info and precision
Biased when observation is not MCAR

27
Single Imputation

Impute a single value, i.e. mean, BOCF, LOCF,
imputing missing as positive
Simple, artificially increases sample size
Underestimate SE and incorrect p-values
Most SI methods require MCAR assumptions to hold,
while some, such as LOCF, even require very
strong and often unrealistic assumptions

28
Multiple Imputation (MI)

Observed Data Imputations
1 2 m
A simulation based approach to missing data

?
?
?
?
29
The General Idea

IMPUTATION ANALYSIS POOLING
(1) (2) (3)
Incomplete Data Imputed Data Analysis
Results Final Results

30
(1) IMPUTATION Models

The imputation model should include primary
predictive variables and other variables
associated with missingness
Multiple Imputation method is robust even with
approximate imputation models

31
(2) ANALYSIS Models

Regression Model
General Linear Model
Generalized Linear Model (Logistic Regression,
Poisson Regression)

32
(3) Rules for POOLING

Confidence Interval for Parameter of Interest is
given by
Mean of Estimate tdf v(Total Variance)

Estimate 1 Variance 1
Estimate 2 Variance 2
Estimate 3 Variance 3
Estimate m Variance m
Mean of Estimate Within Variance Between
Variance Total Variance
33
Desirable Features

MI gives approximately unbiased estimates of all
parameters
MI provides good estimates of the standard errors
MI can be used with many kinds of data and
analyses without specialized software
Requires MAR assumption

34
Maximum likelihood

Basic idea
Given some data,
Try to guess the parameter(s) of the probability
distribution that generated the data
MLE of a parameter is the value that maximizes
the probability of the data you already have

35
Example

Flip a coin, get 45 heads, 36 tails
We dont know p, but whatever it is
Pr(45 H in 81 tosses) K p45(1-p)36
How to guess p?
Pick the value of p that maximizes the
probability of what already happened
Pick p to maximize L p45(1-p)36
Best guess turns out to be 45/81

36
Maximum likelihood estimates have nice properties

Consistent
Asymptotically
Normal
Unbiased
minimum variance
etc.

37
New problem

H 45
T 36
? 19
Now how to guess p?
If we knew how many missing were H and how many
T, we would know what to do.
But we dont.
What to do?

38
A solution

If data are MAR,
you can get MLEs by
maximizing the (conditional) likelihood for the
nonmissing data
ignoring the missing data mechanism.

39
Important Application

Longitudinal analysis
Participant 1, visit 1, 2, 3,
Participant 2, visit 1, 2, 3,
For each visit, y a b1 x1 b2 x2
First approach
Treat all visits as independent
Do the regression on all visits together
Wrong, because visits from a single participant
are related, not independent

40
Important Application (contd)

Second approach
The visits from a single participant have
covariance
Use a mixed model
It used to be that you had to have all visits
nonmissing for this analysis
But modern software (SAS MIXED, GLIMMIX) ignores
the missing-data mechanism and gets MLEs from
only the nonmissing data, even if some visits are
missing.
If data are MAR, this is fine!

41
Modern longitudinal ML software uses more data
Neither old nor new method can use this visit
Older CC analysis would use only these cases
42
Another application

Survival analysis
Example time to relapse
For some people, you have the time
For others, you dont because
Study ended
People died
People dropped out
etc.
People without relapse times are said to be
CENSORED

43
Another application (contd)

For censored people, you dont know the relapse
time, but you know it is after the censor time
Survival analysis handles censored data, but
You have to make the assumption that censoring is
noninformative.
If people drop out because they know they are
going to relapse the next day, the censoring is
informative.
Informative censoring gives biased survival time
estimates
The noninformative censoring assumption is
basically an MAR assumption.

44
What if data are not MAR?

When the missing data are nonignorable (i.e.,
MNAR), standard statistical models can yield
badly biased results
Cannot test MAR versus MNAR

45
Sensitivity Analysis

The missing data mechanism is not identifiable
from observed data
We dont know what we dont know
One or more analyses can be performed using
different assumptions
Example Worst Case Analysis
(wont work with a lot of missing data)

46
Goals of Sensitivity Analysis

Consider a range of potential associations
between missingness and response
Assess the degree to which conclusion can be
influenced by the missingness mechanism
If the conclusion is largely unchanged the result
may be considered robust
Otherwise, the conclusion should be interpreted
cautiously and may be misleading

47
MNAR models

Use of non-ignorable models can be helpful in
conducting a sensitivity analysis
Not necessarily a good idea to rely on a single
MNAR model, because the assumptions about the
missing data are impossible to assess with the
observed data
One should use MNAR models sensibly, possibly
examining several types of such models for a
given dataset

48
Two general classes of MNAR models

Selection Models use model for the full data
response and a selection mechanism
Pattern Mixture Models use mixture of missing
data pattern information in the model

49
Case Study CTN0010 - BUP for Adolescents
Two groups Bup/Nal detoxification over 2 weeks
vs. Bup/Nal maintenance over 12 weeks N
(analyzed) 152 at 6 community treatment
programs Main outcome measure Opioid-positive
urine test result at weeks 4, 8 12 Evaluation
weekly for 12 weeks, comprehensive at 4, 8, 12,
24, 36 52 weeks
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
50
Woody, JAMA 2008
51
Missingness in CTN0010 (from Paul Allisons
analysis)
20 participants had missing outcome for all 12
weeks (effective sample size N 20) Available
Data (after removing the 20 cases)
Week 1 2 3 4 5 6 7 8 9 10 11 12
present 90 74 60 78 48 45 44 69 40 37 37 67
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
52
Paul Allisons Analysis

Included in the model each of Weeks 1 to 12
Used Maximum Likelihood Estimation (MLE) and
Multiple Imputation (MI) approaches (MLE is
preferred over MI)
Used random effects (mixed) logit model with SAS
PROC GLIMMIX

National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
53
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
54
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
55
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
56
(No Transcript)
57
(No Transcript)
58
Take-Home Messages

Model all the available outcome data at all time
points, including outcome at baseline (t0), and
then test the time points (contrasts) of interest
There are good data analytic methods for dealing
with missing data in repeated-measures designs
(under MAR assumption) use random effects
(mixed) models estimated by maximum likelihood
Allow for a linear and quadratic time trend
(saves degrees of freedom), or spline model
(broken line)
If no time-related pattern, use time as a class
variable, i.e. each time point is a category (not
continuous)

National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
59
Take-Home Messages (contd)

Imputing missing outcomes as positive is a crude
approach one can often do better
Incorporation of covariates and auxiliary
variables
Sensitivity analysis is absolutely vital

National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
60
References
Allison, Missing Data, Sage University Papers
Series on Quantitative Applications in the Social
Sciences, 07-136, Thousand Oaks, CA Sage, 2001.
Fitzmaurice, Laird Ware, Applied Longitudinal
Analysis, Wiley, 2004. Graham, Missing Data
Analysis Making It Work in the Real World,
Annual Review of Psychology, 2009, 60
549-576. Liang Zeger, Longitudinal Data
Analysis of Continuous and Discrete Responses for
Pre-Post Designs, Sankhya, 2000, 62(B) 134-148.
Weiss, An Introduction to Modeling Longitudinal
Data, presentation at UCLA CALDAR Summer
Institute on Longitudinal Research, August
2010. Woody et al., Extended vs Short-term
Buprenorphine-Naloxone for Treatment of
Opioid-Addicted Youth A Randomized Trial, JAMA,
2008, 300(17) 2003-2011.
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
61
Contact Information
Neal Oden noden_at_emmes.com Gaurav Sharma
gsharma_at_emmes.com Paul Van Veldhuisen
pvanveldhuisen_at_emmes.com Paul Wakim
pwakim_at_nida.nih.gov
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
62
Questions Comments
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services

Write a Comment

User Comments (0)