Statistical, Practical, and Design Issues in Analysis with Missing Data - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Statistical, Practical, and Design Issues in Analysis with Missing Data

Description:

The Hanley Family Foundation. Presentation in Two Parts ... A comparison of inclusive and restrictive strategies in modern missing data procedures. ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 57
Provided by: john821
Category:

less

Transcript and Presenter's Notes

Title: Statistical, Practical, and Design Issues in Analysis with Missing Data


1
Statistical, Practical, and Design Issues in
Analysis with Missing Data
  • John Graham
  • The Methodology Center
  • Penn State University
  • American Psychological Association, Toronto,
    August 8, 2003

2
Acknowledgements
  • Joe Schafer
  • Scott Hofer
  • Patricio Cumsille
  • Bonnie Taylor
  • Steve West
  • NIAAA
  • NIDA
  • The Hanley Family Foundation

3
Presentation in Two Parts
  • (1) Introductory Material Practical Issues
  • (2) Planned Missingness Designs

4
Recent Papers
  • Collins, L. M., Schafer, J. L., Kam, C. M.
    (2001). A comparison of inclusive and
    restrictive strategies in modern missing data
    procedures. Psychological Methods, 6, 330_351.
  • Schafer, J. L., Graham, J. W. (2002). Missing
    data our view of the state of the art.
    Psychological Methods, 7, 147-177.
  • Graham, J. W., Cumsille, P. E., Elek-Fisk, E.
    (2003). Methods for handling missing data. In
    J. A. Schinka W. F. Velicer (Eds.). Research
    Methods in Psychology (pp. 87_114). Volume 2 of
    Handbook of Psychology (I. B. Weiner,
    Editor-in-Chief). New York John Wiley Sons.
  • http//mcgee.hhdev.psu.edu/publication_resources

5
Recent Papers
  • Graham, J. W., Taylor, B. J., Cumsille, P. E.
    (2001). Planned missing data designs in analysis
    of change. In L. Collins A. Sayer (Eds.), New
    methods for the analysis of change, (pp.
    335-353). Washington, DC American Psychological
    Association.
  • Graham, J. W. (2003). Adding missing-data
    relevant variables to FIML-based structural
    equation models. Structural Equation Modeling,
    10, 80-100.
  • Graham, J. W., Schafer, J. L. (1999). On the
    performance of multiple imputation for
    multivariate data with small sample size. In R.
    Hoyle (Ed.) Statistical Strategies for Small
    Sample Research, (pp. 1-29). Thousand Oaks, CA
    Sage.

6
Part IMissing Data Introductory Material and
Practical Issues
7
Problem with Missing Data
  • Analysis procedures were designed for complete
    data. . .

8
Solution 1
  • Design new procedures
  • Missing Data Parameter Estimation in One Step
  • Full Information Maximum Likelihood (FIML)SEM
    and Other Latent Variable Programs(Amos, Mx,
    LISREL, Mplus, LTA)

9
Solution 2
  • Missing data Multiple Imputation (MI)
  • Two Steps
  • Step 1 Replace Missing Values with Plausible
    Values
  • Step 2 Analyze Data as if there were No Missing
    Data

10
FAQ
  • Aren't you somehow helping yourself with
    imputation?. . .

11
NO. Missing data imputation . . .
  • does NOT give you something for nothing
  • DOES let you make use of all data you have
  • . . .

12
FAQ
  • Is the imputed value what the person would have
    given?

13
NO. When we impute a value . .
  • We do not impute for the sake of the value itself
  • We impute to preserve important characteristics
    of the whole data set
  • . . .

14
We want . . .
  • unbiased parameter estimation
  • e.g., b-weights
  • Good estimate of variability
  • e.g., standard errors
  • best statistical power

15
Causes of Missingness
  • Ignorable
  • MCAR Missing Completely At Random
  • MAR Missing At Random
  • Non-Ignorable
  • MNAR Missing Not At Random

16
Practical Issues
  • How much difference does it make?
  • How easy is the "sell"?
  • Which is better FIML or MI?
  • "Auxiliary" Variables (Collins, Schafer, Kam,
    2001 Graham, 2003)
  • Small sample size (Graham Schafer, 1999)
  • Too many variables
  • Automation

17
Practical IssuesBiggest problems in multiple
imputation
  • How do I write my data out of SPSS?
  • How can I use MI with ANOVA?
  • How do I use MI with SPSS, STATA, SUDAAN, EQS,
    Mplus?
  • Is there a less tedious way?

18
Part IIPlanned Missingness Designs
19
Planned Missingness
  • Why would anyone want to plan to have
    missingness?
  • To manage costs, data quality, and statistical
    power
  • In fact, we do it all the time. . .

20
Common Sampling Designs
  • Random sampling of
  • Subjects
  • Items
  • Goal
  • Collect smaller, more manageable amount of data
  • Draw reasonable conclusions

21
Planned Missingness
  • Is sampling items within subjects so hard?
  • Trick is to sample all item combinations

22
Why NOT UsePlanned Missingness?
  • Past Not convenient to do analyses
  • Present Many statistical solutions
  • Now is time to consider design alternatives

23
Design Examples
24
Lighten Burden ILongitudinal Measurement
  • Problem Studying Growth
  • Participants may grow tired of measurement
  • One Solution sample times for measurement
  • See Graham, Taylor, Cumsille (2001)

25
Planned Missingness for Growth Modeling
  • Growth modeling increasingly common
  • multiple (e.g., 5) measurement waves
  • identify intercept, slope, etc.
  • predict slope, etc. with other variables

26
ExamplePreventing College Alcohol Problems
  • Alcohol use in first year of college
  • Baseline rate steep onset
  • After program . . . shallower onset rate

27
Could collect data at all five time points
  • Advantages
  • easy to analyze
  • Disadvantages
  • expensive in per-subject costs
  • expensive in data quality (taxes subjects)
  • Explore missing data designs

28
Design 1 all combinations of1 time missing (17
missing)
  • 1 1 1 1 1 57 0 0 0 0 01 1 1 1 0 57 1 1 1 1
    11 1 1 0 1 57 1 1 1 1 11 1 0 1 1 57 1 1 1 1
    11 0 1 1 1 57 1 1 1 1 10 1 1 1 1 57 1 1 1 1
    1 ___ N 342

29
Design 3 all combinations of 2 times missing
(36 missing)
  • 1 1 1 1 1 31 0 0 0 0 0
  • 1 1 1 0 0 31 0 0 0 0 0
  • 1 1 0 1 0 31 0 0 0 0 0
  • 1 0 1 1 0 31 0 0 0 0 0
  • 0 1 1 1 0 31 1 1 1 1 1
  • 1 1 0 0 1 31 1 1 1 1 1
  • 1 0 1 0 1 31 1 1 1 1 1
  • 0 1 1 0 1 31 1 1 1 1 1
  • 1 0 0 1 1 31 1 1 1 1 1
  • 0 1 0 1 1 31 1 1 1 1 1
  • 0 0 1 1 1 31 1 1 1 1 1

30
Planned Missingness Designs
  • all combinations missing
  • Design of ___ missing times data points
  • ---------- -------------------- ----------------
  • 1 1 17
  • 2 1 2 29
  • 3 2 36
  • 4 2 3 45
  • 5 3 54

31
Standard Errors for Various Designs
Complete Cases Designs
Missing Data Designs
Data Points
32
Missing data designs
  • Often better than complete cases designs
  • Always cheaper
  • Often acceptable drop in power

33
Lighten Burden on Respondents II
  • The problem
  • 7th graders can answer only 100 questions
  • We want to ask 133 questions
  • One Solution The 3-form design

34
3-Form Design
  • Student Received Item Set?
  • ----------------------------
  • X A B C
  • Form 1 yes yes yes NO
  • Form 2 yes yes NO yes
  • Form 3 yes NO yes yes
  • Form 4 yes yes yes yes

35
3-Form Design
  • Item Sets X A B C total 34 33 33 3
    3 133
  • form X A B C1 34 33 33 0 1002 34 33
    0 33 1003 34 0 33 33 100

36
3-Form Design Item Order
  • Form 1 X A BForm 2 X C AForm 3 X B C

37
3-Form Design Item Order
  • Form 1 X A B CForm 2 X C A BForm
    3 X B C A

38
3-Form Design Item Order
  • Form 1 X A B CForm 2 X C A BForm
    3 X B C A
  • Could pay some subjects to complete extra
    questions

39
3-Form Design Item Order
  • Form 1 X A B CForm 2 X C A BForm
    3 X B C A
  • Give questions as shown, measure reasons for
    non-completion
  • poor reading
  • low motivation
  • "Managed" missingness

40
Planned MissingnessExpensive Measures IWhole
Constructs Sometimes Missing
41
Research Example
  • Recent Prevention Study
  • Adolescent Alcohol Prevention Trial
  • Two Drug-Abuse Prevention Curricula
  • Resistance Training
  • Normative Education

42
N 1000
N 3000
43
Expensive Measures IIItems Sometimes
Missingfrom Larger Construct
44
Research Examples
  • Smoking Research
  • less expensive Self-Reports
  • more expensive CO and Saliva Cotinine
  • Alcohol Research
  • less expensive Brief Self-reports
  • more expensive Time Line Follow Back

45
Research Examples
  • Nutrition Research
  • less expensive Brief Nutrition Survey
  • more expensive Extensive 24-hr Recall
  • Survey Research
  • less expensive Brief Mail Survey
  • more expensive Extensive Face-to-Face
    Interview

46
Expensive Measures II
Larger N, Less Expensive
r .30
Smaller N, More Expensive
47
Example Study
  • r -.30 (smoking and health)
  • Self-report Smoking
  • two items
  • Biochemical Smoking Measures
  • Expired Air CO
  • Saliva Cotinine

48
Example Study
  • 15,050 for Measuring Smoking
  • Self-Reports 7.30 per subject
  • CO / Cotinine 16.78 per subject
  • self-reports bio-chem625 x 7.30
    625 x 16.78 15,050
  • 1200 x 7.30 375 x 16.78 15,050

49
Standard Errors
Sample Size (Self-Reports)
50
But is that all there is to it?
  • What about importance of
  • the main analysis?
  • secondary analyses?
  • Are conclusions the same
  • when the main analysis is
  • a little more important?
  • moderately more important?
  • a lot more important?

51
Standard Errors
Sample Size (Self-Reports)
52
Which Design is Best?Importance Factor 20
  • Sample Size
  • cheap expensive Overall
  • measure measure Value
  • 625 625 (complete) 21.42
  • 900 505 (optimal) 23.00
  • 1500 217 (extreme) 20.86

53
Which Design is Best?Importance Factor 5
  • Sample Size
  • cheap expensive Overall
  • measure measure Value
  • 625 625 (complete) 6.92
  • 900 505 (optimal) 8.00
  • 1500 217 (extreme) 8.97

54
Conclusions
  • Optimal (missing data) design better than
    complete cases
  • Extreme missing data design often best

55
Why would anyone ever want to PLAN to have
missing data?
  • Easy to analyze
  • Cheaper
  • Optimal design in most cases
  • So . . .

56
  • the end

57
Recent Papers
  • Collins, L. M., Schafer, J. L., Kam, C. M.
    (2001). A comparison of inclusive and
    restrictive strategies in modern missing data
    procedures. Psychological Methods, 6, 330_351.
  • Schafer, J. L., Graham, J. W. (2002). Missing
    data our view of the state of the art.
    Psychological Methods, 7, 147-177.
  • Graham, J. W., Cumsille, P. E., Elek-Fisk, E.
    (2003). Methods for handling missing data. In
    J. A. Schinka W. F. Velicer (Eds.). Research
    Methods in Psychology (pp. 87_114). Volume 2 of
    Handbook of Psychology (I. B. Weiner,
    Editor-in-Chief). New York John Wiley Sons.
  • http//mcgee.hhdev.psu.edu/publication_resources
  • email jgraham_at_psu.edu
Write a Comment
User Comments (0)
About PowerShow.com