A%20new%20architecture%20for%20handling%20multiply%20imputed%20data%20in%20Stata

About This Presentation

Title:

A%20new%20architecture%20for%20handling%20multiply%20imputed%20data%20in%20Stata

Description:

Multiple imputation (MI) Introduced by Donald Rubin (1987 ... apply at the imputation level ... Within-imputation variance (average of the complete ... –

Number of Views:181

Avg rating:3.0/5.0

Slides: 21

Provided by: roryw

Learn more at: http://repec.org

Category:

more less

Transcript and Presenter's Notes

Title: A%20new%20architecture%20for%20handling%20multiply%20imputed%20data%20in%20Stata

1
A new architecture for handling multiply imputed
data in Stata

JC Galati1, JB Carlin1,2, P Royston3
1Murdoch Childrens Research Institute (MCRI),
Melbourne
2The University of Melbourne
3MRC Clinical Trials Unit, London

2
Missing data

Why do we need additional tools for analysing
datasets with missing values?
Traditional methods work with complete datasets
Statistical packages discard incomplete
observations when analysing an incomplete dataset
i.e. a complete-case analysis is performed
This can lead to loss of power, and possibly to
biased estimates, depending on why the data went
missing

3
Multiple imputation (MI)

Introduced by Donald Rubin (1987 book, Wiley)
Based on Bayesian principles
Both the data-generating mechanism and the
missingness mechanism are modelled
Fairly broad assumptions about data-generating
model
Fairly restrictive assumptions about missingness
mechanism
Modelling assumptions apply at the imputation
level
Statistical modelling is general (once data is
imputed)
Post estimation some more work needs to be
done
Diagnostics theory and practice not yet worked
out
Model-building in its infancy work has started

4
MI data analysis

Start with a dataset with some values missing
Missing values are imputed multiple times
Using a Bayesianly proper imputation method
This creates m sets of completed data
Each completed dataset is analysed separately
Standard complete-data estimation methods are
used
E.g. linear regression, logistic regression

5
Inference (estimation) using MI

Coefficient estimates and variances (SEs) from
complete-data analyses are combined using Rubins
Rules
Parameter estimates
Average of the complete-data parameter estimates
Variance is the sum of two components
Within-imputation variance (average of the
complete-data variances)
Between-imputation variance (determined from
complete-data parameter estimates)
Point estimators divided by SE have approximate t
distributions
Estimate d.f. and use t-multipliers to get
confidence intervals

6
Background (MI in Stata)

What is available in Stata currently?
MI Tools, Carlin et. al. Stata J. 2003
Imputed datasets stored in separate dta files
myfile1.dta, ... , myfilem.dta
Estimation
mifit with
regress, logit, probit, clogit, glm,
logistic, poisson, svyreg, svylogit,
svyprobit, svypoisson, xtgee, xtreg
Post estimation
milincom, mitestparm
Data manipulation
miset, miappend, mimerge, mido, misave

7
Background (MI in Stata)

Main drawbacks of MI Tools
Loose association between original and imputed
data
Loose association between individual imputed
datasets
Limit to range of estimation commands supported
(13)
Choice of coding of some aspects resulted in slow
execution time in some cases
No capacity to perform imputation

8
Background (MI in Stata)

What is available in Stata currently? (cont.)
ice, micombine, Royston Stata J. 2004/05
ice stores imputed datasets in a single dta file
uses impid and obsid vars
Estimation
micombine with
clogit, cnreg, glm, logistic, logit,
poisson, probit, qreg, regress, rreg,
xtgee, streg, stcox, ologit, oprobit, mlogit
Post estimation
results returned in e(b) , e(V) etc.
onus on user to know when post-estimation command
applied directly to combined estimates is valid

9
Background (MI in Stata)

ice, micombine, Royston Stata J. 2004/05 (cont.)
Data manipulation
left to user, but stacked format facilitates
simple transformation of variables etc.
mijoin, misplit (for conversion between formats)
Main drawbacks
Limit to range of estimation commands supported
(16)
Manipulation that changes number of observations
in each dataset not easily supported (eg.
reshape)
Not clear when/if post-estimation is valid

10
mim A new architecture