Handling Missing Data in the Analysis of CTN Trials: - PowerPoint PPT Presentation

About This Presentation
Title:

Handling Missing Data in the Analysis of CTN Trials:

Description:

CTN Design & Analysis Workshop Handling Missing Data in the Analysis of CTN Trials: Pitfalls and Possible Solutions Neal Oden, PhD, DSC2-EMMES Gaurav Sharma, PhD ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 63
Provided by: Grant101
Learn more at: https://ctnlibrary.org
Category:

less

Transcript and Presenter's Notes

Title: Handling Missing Data in the Analysis of CTN Trials:


1
CTN Design Analysis Workshop
Handling Missing Data in the Analysis of CTN
Trials Pitfalls and Possible Solutions
Neal Oden, PhD, DSC2-EMMES Gaurav Sharma, PhD,
DSC2-EMMES Paul Van Veldhuisen, PhD,
DSC2-EMMES Paul Wakim, PhD, CCTN, NIDA
15 March 2011
2
Todays Workshop
  • The problem
  • Prevention
  • Types of missing data
  • Analysis methods
  • Case study
  • Open discussion

3
Missing Data
  • Information within a trial that is meaningful for
    analysis but not collected
  • Focus here mostly on primary outcome data, but
    relevant to missing secondary outcomes and
    covariates too

4
Missing Data
  • Randomization
  • Balances treatment groups for known and unknown
    factors
  • Lose benefits if there is drop-out, as groups at
    outcome may not have been similar at baseline
  • Intention-to-treat principle
  • Violates principle if not all participants
    contribute to the primary analysis

5
Missing Data
  • If missing unrelated to assigned treatment
  • Reduces statistical power
  • If missing related to assigned treatment or to
    outcome
  • Biases the estimate of the treatment effect

6
Causes of Missing Data
  • Due to discontinuation of study treatment
  • Outcomes undefined for some participants
  • QOL measures after death
  • Quantitative drug use hair analysis in
    individuals without hair
  • Test fails/specimen lost
  • Attrition
  • Related to health status/drug use
  • Unrelated to health status/drug use (e.g., moved)

7
Continuing Data Collection for Drop-Outs
  • Distinction between
  • Premature end of treatment
  • AND
  • End of study
  • Does collecting data after premature end of
    treatment make sense?

8
Rationale
  • Preserves intention-to-treat approach
  • Many CTN trials are pragmatic trials
  • NOT Does treatment work if perfectly delivered?
  • but RATHER
  • Is this a good treatment strategy or policy?
  • OR
  • What happens once treatment starts or is
    recommended?

9
Rationale
  • Delivery of medicine deals with people in the
    real world
  • A 100 efficacious cure for stimulant use is
    useless for public health if nobody can stand it.
  • Strive to collect complete data for primary
    outcome on ALL participants, even in those who do
    not complete intervention
  • Too much missing data - gt no way result will be
    believable no matter how sophisticated the
    statistical method

10
Why Do We Like It?
  • Weight loss diet
  • People on the effective arm lose weight and stay
    in the study
  • Some on the ineffective arm get discouraged and
    quit
  • If we analyzed only the people who stayed in the
    trial, the ineffective arm would look too good

11
Approaches to Missing Data
  • Design and conduct of clinical trial that
    minimizes missing data
  • May require trade-offs with generalizability
  • Apply analysis methods that use information in
    observed data to help analyze primary outcome
    data in the presence of missing data

12
B. Franklin
An ounce of prevention is worth a pound of cure
13
Minimize Missing Data in.. Trial Design
  • Flexible dose
  • Target population
  • Allow rescue therapy for poor responders
  • Define primary outcomes that are highly
    ascertainable
  • Minimize participant burden/reduce follow-up
  • Number of visits/assessments

14
Minimize Missing Data in... Trial Conduct
  • Explain importance of trial participation during
    consent process
  • Emphasize to staff importance of maintaining
    follow-up even when treatment is refused
  • Incentives
  • For participants, need to ensure level is not
    viewed as coercive

15
Minimize Missing Data in... Trial Conduct
  • Expression of thanks
  • Written/verbal
  • Assistance with travel
  • Reminders before visits
  • Welcoming staff/friendly environment
  • Keep locator information current
  • Monitor and report to investigators extent of
    missing data

16
Availability of Primary Outcome Percent of
Measures with Values(N29 trials)
17
Whats the big deal?
We need N 400 (based on power analysis) But we
expect 20 missing So we set the initial N
500 So that the final (analyzed) N 400
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
18
Technical terms that we cant escape
Missing at random (MAR) Missing completely at
random (MCAR) Missing not at random
(MNAR) Ignorable Non-ignorable
but what do they mean?
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
19
Missing Completely at Random (MCAR)
(Non-technical) Definition The fact that Y is
missing has nothing to do with the unobserved
value of Y, or with other variables Therefore Th
e set of participants with complete data can be
regarded as a simple random (or representative)
sample of all participants What to do? Ignore
the missing data and analyze the available data
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
20
Missing at Random (MAR)
(Non-technical) Definition The fact that Y is
missing can be explained by other observed values
of Y, or by other measured variables Therefore T
he observed data can be used to account for the
missing data What to do? Use Maximum Likelihood
or Multiple Imputation approach, and include in
the model the other measured variables that
explain missingness
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
21
Missing Not at Random (MNAR)
(Non-technical) Definition The fact that Y is
missing cannot be explained by other observed
values of Y, or by other measured
variables Therefore The observed data cannot be
used to account for the missing data and outside
information is needed In simple English We have
a problem
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
22
In Summary
Missingness (i.e. whether the data are missing or not) Missingness (i.e. whether the data are missing or not)
is related to is not related to
MCAR observed or unobserved data
MAR observed data unobserved data
MNAR unobserved data
Based on Graham 2009
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
23
Bottom Line
MCAR No big deal MAR Use available collected
data to explain missing mechanism, and use
existing statistical methods MNAR Need outside
information to explain missing mechanism
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
24
Ignorable Non-Ignorable (roughly speaking)
  • Ignorable (available data are sufficient)
  • Missing Completely At Random (MCAR)
  • Missing At Random (MAR)
  • Non-Ignorable (need outside information)
  • Missing Not At Random (MNAR)

National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
25
Missing Data Analysis Methods
26
Complete Case and Pairwise Deletion
  • CC PD
  • Y1 Y2 Y3 Y1 Y2
    Y3
  • X X X X
    X X
  • X X X X
    X X
  • X X - X
    X -
  • X X - X
    X -
  • (Correlation Illustration)
  • Simple, Default in Statistical Software
  • Potential loss of info and precision
  • Biased when observation is not MCAR

27
Single Imputation
  • Impute a single value, i.e. mean, BOCF, LOCF,
    imputing missing as positive
  • Simple, artificially increases sample size
  • Underestimate SE and incorrect p-values
  • Most SI methods require MCAR assumptions to hold,
    while some, such as LOCF, even require very
    strong and often unrealistic assumptions

28
Multiple Imputation (MI)
  • Observed Data Imputations
  • 1 2 m
  • A simulation based approach to missing data

?
?
?
?
29
The General Idea
  • IMPUTATION ANALYSIS POOLING
  • (1) (2) (3)
  • Incomplete Data Imputed Data Analysis
    Results Final Results

30
(1) IMPUTATION Models
  • The imputation model should include primary
    predictive variables and other variables
    associated with missingness
  • Multiple Imputation method is robust even with
    approximate imputation models

31
(2) ANALYSIS Models
  • Regression Model
  • General Linear Model
  • Generalized Linear Model (Logistic Regression,
    Poisson Regression)

32
(3) Rules for POOLING

  • Confidence Interval for Parameter of Interest is
    given by
  • Mean of Estimate tdf v(Total Variance)

Estimate 1 Variance 1
Estimate 2 Variance 2
Estimate 3 Variance 3
Estimate m Variance m
Mean of Estimate Within Variance Between
Variance Total Variance
33
Desirable Features
  • MI gives approximately unbiased estimates of all
    parameters
  • MI provides good estimates of the standard errors
  • MI can be used with many kinds of data and
    analyses without specialized software
  • Requires MAR assumption

34
Maximum likelihood
  • Basic idea
  • Given some data,
  • Try to guess the parameter(s) of the probability
    distribution that generated the data
  • MLE of a parameter is the value that maximizes
    the probability of the data you already have

35
Example
  • Flip a coin, get 45 heads, 36 tails
  • We dont know p, but whatever it is
  • Pr(45 H in 81 tosses) K p45(1-p)36
  • How to guess p?
  • Pick the value of p that maximizes the
    probability of what already happened
  • Pick p to maximize L p45(1-p)36
  • Best guess turns out to be 45/81

36
Maximum likelihood estimates have nice properties
  • Consistent
  • Asymptotically
  • Normal
  • Unbiased
  • minimum variance
  • etc.

37
New problem
  • H 45
  • T 36
  • ? 19
  • Now how to guess p?
  • If we knew how many missing were H and how many
    T, we would know what to do.
  • But we dont.
  • What to do?

38
A solution
  • If data are MAR,
  • you can get MLEs by
  • maximizing the (conditional) likelihood for the
    nonmissing data
  • ignoring the missing data mechanism.

39
Important Application
  • Longitudinal analysis
  • Participant 1, visit 1, 2, 3,
  • Participant 2, visit 1, 2, 3,
  • For each visit, y a b1 x1 b2 x2
  • First approach
  • Treat all visits as independent
  • Do the regression on all visits together
  • Wrong, because visits from a single participant
    are related, not independent

40
Important Application (contd)
  • Second approach
  • The visits from a single participant have
    covariance
  • Use a mixed model
  • It used to be that you had to have all visits
    nonmissing for this analysis
  • But modern software (SAS MIXED, GLIMMIX) ignores
    the missing-data mechanism and gets MLEs from
    only the nonmissing data, even if some visits are
    missing.
  • If data are MAR, this is fine!

41
Modern longitudinal ML software uses more data
Neither old nor new method can use this visit
Older CC analysis would use only these cases
42
Another application
  • Survival analysis
  • Example time to relapse
  • For some people, you have the time
  • For others, you dont because
  • Study ended
  • People died
  • People dropped out
  • etc.
  • People without relapse times are said to be
    CENSORED

43
Another application (contd)
  • For censored people, you dont know the relapse
    time, but you know it is after the censor time
  • Survival analysis handles censored data, but
  • You have to make the assumption that censoring is
    noninformative.
  • If people drop out because they know they are
    going to relapse the next day, the censoring is
    informative.
  • Informative censoring gives biased survival time
    estimates
  • The noninformative censoring assumption is
    basically an MAR assumption.

44
What if data are not MAR?
  • When the missing data are nonignorable (i.e.,
    MNAR), standard statistical models can yield
    badly biased results
  • Cannot test MAR versus MNAR

45
Sensitivity Analysis
  • The missing data mechanism is not identifiable
    from observed data
  • We dont know what we dont know
  • One or more analyses can be performed using
    different assumptions
  • Example Worst Case Analysis
  • (wont work with a lot of missing data)

46
Goals of Sensitivity Analysis
  • Consider a range of potential associations
    between missingness and response
  • Assess the degree to which conclusion can be
    influenced by the missingness mechanism
  • If the conclusion is largely unchanged the result
    may be considered robust
  • Otherwise, the conclusion should be interpreted
    cautiously and may be misleading

47
MNAR models
  • Use of non-ignorable models can be helpful in
    conducting a sensitivity analysis
  • Not necessarily a good idea to rely on a single
    MNAR model, because the assumptions about the
    missing data are impossible to assess with the
    observed data
  • One should use MNAR models sensibly, possibly
    examining several types of such models for a
    given dataset

48
Two general classes of MNAR models
  • Selection Models use model for the full data
    response and a selection mechanism
  • Pattern Mixture Models use mixture of missing
    data pattern information in the model

49
Case Study CTN0010 - BUP for Adolescents
Two groups Bup/Nal detoxification over 2 weeks
vs. Bup/Nal maintenance over 12 weeks N
(analyzed) 152 at 6 community treatment
programs Main outcome measure Opioid-positive
urine test result at weeks 4, 8 12 Evaluation
weekly for 12 weeks, comprehensive at 4, 8, 12,
24, 36 52 weeks
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
50
Woody, JAMA 2008
51
Missingness in CTN0010 (from Paul Allisons
analysis)
20 participants had missing outcome for all 12
weeks (effective sample size N 20) Available
Data (after removing the 20 cases)
Week 1 2 3 4 5 6 7 8 9 10 11 12
present 90 74 60 78 48 45 44 69 40 37 37 67
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
52
Paul Allisons Analysis
  • Included in the model each of Weeks 1 to 12
  • Used Maximum Likelihood Estimation (MLE) and
    Multiple Imputation (MI) approaches (MLE is
    preferred over MI)
  • Used random effects (mixed) logit model with SAS
    PROC GLIMMIX

National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
53
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
54
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
55
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
56
(No Transcript)
57
(No Transcript)
58
Take-Home Messages
  1. Model all the available outcome data at all time
    points, including outcome at baseline (t0), and
    then test the time points (contrasts) of interest
  2. There are good data analytic methods for dealing
    with missing data in repeated-measures designs
    (under MAR assumption) use random effects
    (mixed) models estimated by maximum likelihood
  3. Allow for a linear and quadratic time trend
    (saves degrees of freedom), or spline model
    (broken line)
  4. If no time-related pattern, use time as a class
    variable, i.e. each time point is a category (not
    continuous)

National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
59
Take-Home Messages (contd)
  1. Imputing missing outcomes as positive is a crude
    approach one can often do better
  2. Incorporation of covariates and auxiliary
    variables
  3. Sensitivity analysis is absolutely vital

National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
60
References
Allison, Missing Data, Sage University Papers
Series on Quantitative Applications in the Social
Sciences, 07-136, Thousand Oaks, CA Sage, 2001.
Fitzmaurice, Laird Ware, Applied Longitudinal
Analysis, Wiley, 2004. Graham, Missing Data
Analysis Making It Work in the Real World,
Annual Review of Psychology, 2009, 60
549-576. Liang Zeger, Longitudinal Data
Analysis of Continuous and Discrete Responses for
Pre-Post Designs, Sankhya, 2000, 62(B) 134-148.
Weiss, An Introduction to Modeling Longitudinal
Data, presentation at UCLA CALDAR Summer
Institute on Longitudinal Research, August
2010. Woody et al., Extended vs Short-term
Buprenorphine-Naloxone for Treatment of
Opioid-Addicted Youth A Randomized Trial, JAMA,
2008, 300(17) 2003-2011.
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
61
Contact Information
Neal Oden noden_at_emmes.com Gaurav Sharma
gsharma_at_emmes.com Paul Van Veldhuisen
pvanveldhuisen_at_emmes.com Paul Wakim
pwakim_at_nida.nih.gov
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
62
Questions Comments
National Institute on Drug Abuse - National
Institutes of Health - U.S. Department of Health
and Human Services
Write a Comment
User Comments (0)
About PowerShow.com