Multiple%20Imputation

About This Presentation

Title:

Description:

Number of Views:64

Avg rating:3.0/5.0

Slides: 12

Provided by: Eri76

Learn more at: http://faculty.washington.edu

Category:

Tags: 20imputation | imputation | multiple

Transcript and Presenter's Notes

Title: Multiple%20Imputation

1
Multiple Imputation

2
How ice() works

3
Assumptions

Missing at Random
No getting around this one. MCAR is fine, of
course.
Distinct Parameters
Does the missing data mechanism govern what
data-generating parameters you can see? Ex
limits of detection.
Adequate Sample Size
Hard to quantify. Regression on continuous
variables doesnt take much, but other methods
certainly can
Convergence to a Posterior Distribution
Standard MI (such as Proc MI) is known to
converge to a posterior distribution with enough
iterations. Ice() does not have this guarantee.
This is typically ignored when ice() is used.

4
Predictive Mean Matching

We have Ymis for the variable with missing
information
Previously
Find the yobs that is closest to ymis, fill in
the missing observations value with the true
value of the yobs
Was the default behavior for previous versions of
ice()
Could be a problem not enough variability.
Currently
Find a set of yobs that are close to ymis, choose
one randomly, fill in the missing observations
value with the true value of the yobs
Invoked by using the match argument

5
Other Regression Methods

Multinomial Logistic Regression
For categorical variables, ordered or unordered
Finds a probability for each category value, then
imputes a value using those probabilities.
My advice try to avoid using it, as Ive found
its results to be incorrect (biased)
Ordinal Logistic Regression
For ordered categorical variables
My advice it seems to work well, but it needs a
large (ngt1000) sample size to work

6
Useful Material How to run ice()

Running ice, continued (1)
Call ice()
ice educ mmselast npdage npgender npnitm npceradm
npbrkm brk5 brk6 npneurm using "C\path\outfile",
m(5) passive(brk5npbrkm5 \ brk6npbrkm6)
substitute(npbrkmbrk5 brk6) cmd(npbrkmmlogit,
npnitmlogit)
Heres what the code pieces do
educ npneurm Variables to be used for
imputation
using "C\path\outfile the result outfile.dta
m(5) 5 imputed datasets
passive(brk5npbrkm5 \ brk6npbrkm6)
Stata will not impute for brk5 and brk6 they
will be updated from the new values in npbrkm

Running ice, continued (2)
Heres what the code pieces do
substitute(npbrkmbrk5 brk6)
npbrkm wont be used to impute other variables
brk5 and brk6 will be used in its place
cmd(npbrkmmlogit, npnitmlogit)
npbrkm will have multiple logistic regression
npnitm will have logistic regression
all other variables with missing data use default
methods
continuous OLS
n2 categories Logistic Regression
ngt2 categories Multinomial Logistic Regression

10
Results

A dataset, outfile.dta
use C\path\outfile.dta, clear
New variables
_i row number per dataset (not generally used)
_j imputed dataset number (same as _Imputation_
from Proc MI)
Analyzing the results using micombine, an example
xi micombine regress mmselast npgender npnitm
npceradm i.npbrkm
xi expand interactions. Used to break npbrkm
into dummy variables for the analysis
micombine automatically does the MI analysis,
using _j to distinguish between the imputed
datasets
See its help file for a list of supported
regression commands
For some methods, SASs MIANALYZE may be needed