Multiple%20Imputation - PowerPoint PPT Presentation

About This Presentation
Title:

Multiple%20Imputation

Description:

Multiple Imputation. Stata (ice) How and when to use it. How ice() works ... npneurm: Variables to be used for imputation. using 'C:pathoutfile': the result; ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 12
Provided by: Eri76
Category:

less

Transcript and Presenter's Notes

Title: Multiple%20Imputation


1
Multiple Imputation
  • Stata (ice)
  • How and when to use it.

2
How ice() works
  • Each variable with missing data is the subject of
    a regression.
  • Typically all other variables are used as
    predictors
  • Estimate ß, s via the regression
  • Draw s from its posterior distribution
    (non-informative prior)
  • Draw ß from its posterior distribution
    (non-informative prior)
  • Find predicted values YXß, then either
  • Keep Y for the missing values (default option)
  • Predictive Mean Matching
  • Move on to the next variable, using the
    newly-predicted values
  • Cycle through the variables a number of times (10
    is default)

3
Assumptions
  • Missing at Random
  • No getting around this one. MCAR is fine, of
    course.
  • Distinct Parameters
  • Does the missing data mechanism govern what
    data-generating parameters you can see? Ex
    limits of detection.
  • Adequate Sample Size
  • Hard to quantify. Regression on continuous
    variables doesnt take much, but other methods
    certainly can
  • Convergence to a Posterior Distribution
  • Standard MI (such as Proc MI) is known to
    converge to a posterior distribution with enough
    iterations. Ice() does not have this guarantee.
    This is typically ignored when ice() is used.

4
Predictive Mean Matching
  • We have Ymis for the variable with missing
    information
  • Previously
  • Find the yobs that is closest to ymis, fill in
    the missing observations value with the true
    value of the yobs
  • Was the default behavior for previous versions of
    ice()
  • Could be a problem not enough variability.
  • Currently
  • Find a set of yobs that are close to ymis, choose
    one randomly, fill in the missing observations
    value with the true value of the yobs
  • Invoked by using the match argument

5
Other Regression Methods
  • Multinomial Logistic Regression
  • For categorical variables, ordered or unordered
  • Finds a probability for each category value, then
    imputes a value using those probabilities.
  • My advice try to avoid using it, as Ive found
    its results to be incorrect (biased)
  • Ordinal Logistic Regression
  • For ordered categorical variables
  • My advice it seems to work well, but it needs a
    large (ngt1000) sample size to work

6
Useful Material How to run ice()
  • Getting the program
  • Help -gt Search -gt Search all ice imputation
  • Click on st_0067_2 (www.stata-journal.com)
  • Click click here to install
  • This gets you ice and micombine, as well as a few
    other commands

7
  • Running ice
  • Have the dataset open
  • insheet using "C\path\example.csv", clear
  • Four variables with missing information
  • npnitm binary variable
  • npceradm, npneurm continuous variables
  • npbrkm 3-category ordered variable
  • Four variables with complete data
  • We need to make dummy variables for categorical
    variables
  • recode npbrkm (40) (51) (60) (..),
    generate(brk5)
  • recode npbrkm (40) (50) (61) (..),
    generate(brk6)

8
  • Running ice, continued (1)
  • Call ice()
  • ice educ mmselast npdage npgender npnitm npceradm
    npbrkm brk5 brk6 npneurm using "C\path\outfile",
    m(5) passive(brk5npbrkm5 \ brk6npbrkm6)
    substitute(npbrkmbrk5 brk6) cmd(npbrkmmlogit,
    npnitmlogit)
  • Heres what the code pieces do
  • educ npneurm Variables to be used for
    imputation
  • using "C\path\outfile the result outfile.dta
  • m(5) 5 imputed datasets
  • passive(brk5npbrkm5 \ brk6npbrkm6)
  • Stata will not impute for brk5 and brk6 they
    will be updated from the new values in npbrkm

9
  • Running ice, continued (2)
  • Heres what the code pieces do
  • substitute(npbrkmbrk5 brk6)
  • npbrkm wont be used to impute other variables
    brk5 and brk6 will be used in its place
  • cmd(npbrkmmlogit, npnitmlogit)
  • npbrkm will have multiple logistic regression
  • npnitm will have logistic regression
  • all other variables with missing data use default
    methods
  • continuous OLS
  • n2 categories Logistic Regression
  • ngt2 categories Multinomial Logistic Regression

10
Results
  • A dataset, outfile.dta
  • use C\path\outfile.dta, clear
  • New variables
  • _i row number per dataset (not generally used)
  • _j imputed dataset number (same as _Imputation_
    from Proc MI)
  • Analyzing the results using micombine, an example
  • xi micombine regress mmselast npgender npnitm
    npceradm i.npbrkm
  • xi expand interactions. Used to break npbrkm
    into dummy variables for the analysis
  • micombine automatically does the MI analysis,
    using _j to distinguish between the imputed
    datasets
  • See its help file for a list of supported
    regression commands
  • For some methods, SASs MIANALYZE may be needed

11
The end.
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com