Missing data issues and extensions

About This Presentation

Title:

Missing data issues and extensions

Description:

Consider the imputation stage with a set of multivariate responses ... Thus imputation is standard and the reverse transformation is used to obtain ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 9

Provided by: golds97

Category:

more less

Transcript and Presenter's Notes

Title: Missing data issues and extensions

1
Missing data issues and extensions

For multilevel data we need to impute missing
data for variables defined at higher levels
We need to have a valid procedure for discrete
variables
Useful to include sampling weights
Can we deal with partially missing data?

2
Consider the imputation stage with a set of
multivariate responses

We illustrate first with a simple model where the
response joint distribution is MVN and there are
responses at 2 levels
To illustrate how such a model is specified
consider repeated measures of childrens heights
level 2 is the childs adult height.

3
Child heights adult height

Child height as a cubic polynomial with intercept
slope random at level 2 and both correlated
with adult height random effect to give 3-variate
normal.
This allows us jointly to model level1 and level
2 variables with missing data. (see Goldstein and
Kounali, JRSSA, 2009)
4

Results

Thus, if data are missing at either level 1 or
level 2 they will get imputed via the MCMC
algorithm.
5
Mixed response types

For ordered, or unordered categorical data we can
specify corresponding latent normal
distributions.
For ordered response we can consider a probit
threshold model s.t.
the cumulative probability of being in one of the
categories 1,,s is
and the associated latent normal model is

For a p category unordered response we can
define a latent p-1 variate normal

We can define MCMC steps to sample form observed
categorical responses an underlying normal or
MVN. Note that these are further conditioned on
the remaining set of (correlated) normal
variables. For details see Multilevel models with
multivariate mixed response types (2009)
Goldstein, H, Carpenter, J., Kenward, M., Levin,
K. Statistical Modelling (to appear)
6
Imputation

So now with any mixture of categorical and normal
variables at any level, we sample, for each MCMC
iteration, a MVN set of variables including
imputed values.
Thus imputation is standard and the reverse
transformation is used to obtain imputed
variables on the categorical scales.
For non-normal continuous data we can use e.g. a
Box-Cox normalising transformation to sample a
latent normal. Further extensions for Poisson and
other discrete distributions are also available.
Release 2.10 of MLwiN has a link to REALCOM that
allows these extensions.

7
Partially observed (coarsened) data

Where we have a prior (estimated) probability
distribution (PD) for a missing discrete (or
continuous) variable value we simply insert an
extra MCMC step that accepts the standard MI
value with a probability that is just the
probability given by the PD. A corresponding step
is used for normal data.
This thus uses all of the data efficiently. No
data are discarded so long as it is possible to
assign a PD.
Applications in record matching, rating scales
with uncertain responses etc.
Several completed data sets are produced and
combined as in standard MI

8
Sampling weights- briefly

Consider a 2-level model
Write level 2 weights as
Level 1 weights for j-th level 2 unit as
Final level 1 weights
We use as the level 1 random part
explanatory variable instead of the constant 1
This will be used for imputation and for MOI

Ongoing work to incorporate this into
MLwiN-REALCOM

Write a Comment

User Comments (0)