REALCOM - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

REALCOM

Description:

Markov Chain Monte Carlo a quick. introduction ... starting values samples a new set of parameters at each cycle of a Markov chain' ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 32

Provided by: golds97

Category:

more less

Transcript and Presenter's Notes

Title: REALCOM

1
REALCOM

Multilevel models for realistically complex data

An ESRC research project at Bristol University
Methodology and examples for
Measurement errors Multilevel Structural
equations Multivariate responses at several
levels and of different types
2
(No Transcript)
3
General Format

MATLAB software
Free standing executable programs
ASCII and worksheet input and output
Graphical menu based input specification
Model equation display
Monitoring of MCMC chains
A training manual containing
Outline of methodology
Worked through examples

4
Markov Chain Monte Carlo a quick introduction

Bayesian simulation based method that, given
starting values samples a new set of parameters
at each cycle of a Markov chain
This yields a final chain (after discarding a
burn-in set) of, say, 5000 sets of values from
the (joint) posterior distribution of the
parameters
This is formed by combining the likelihood based
on the data and a prior distribution typically
diffuse.
These chains are used for inference e.g. the
mean for a parameter is analogous to the point
estimate from a likelihood analysis, intervals
etc.

5

Consider the simple 2-level model
The parameters in this model are the fixed
coefficients, the two variances and the level 2
residuals.
From suitable starting values eventually the
chain settles down so that sampling is from the
true posterior distribution and we need to sample
sufficient to provide stable estimates using
suitable convergence criteria.All the MATLAB
routines use MCMC sampling.
6
Measurement errors

Continuous variables a simple example
Basic model is
With a model of interest e.g.

7
Some assumptions we need to make

Variance assumed known or alternatively
Reliability
We also need a distribution for true value
An important issue is value for and
sensitivity analysis useful we can also give it
a prior.

8
2. Missclassification errors

Assume a binary (0,1) variable, for example
whether or not a school pupil is eligible for
free school meals (yes1)
Probability of observing a zero (no eligibility),
given that the true value is zero, is
and the probability of observing a one
given that the true value is zero by
- likewise we have and
We now assume we know these missclassification
probabilities similar target model as before
with a binary predictor.

9
Modelling considerations

We can model multivariate continuous measurement
errors, but only independent binary
missclassifications.
We can allow different measurement error
variances and covariances for different groups
e.g. gender.
In multivariate case we typically need non-zero
correlations between measurement errors
Thus, say, if R0.7 observed correlation 0.8
then we require measurement error correlation
gt0.33

10
An educational example

Maths test score related to prior test scores and
FSM eligibility.
We will look at continuous, correlated and binary
measurement errors.

Open measurement-error.exe and read file
classsize
11
Summary table for analyses
12
Factor analysis and structural equation models
Consider a single level factor model where we
have several responses on each member of a
sample Where r indexes the response variable
and i the person. This is a special kind of
multivariate model where we assume the residuals
are independent and the covariance between two
responses is thus given by

A constraint is needed for identifiability and
the default is to choose
13
Extensions- further factors

We can add explanatory variables in addition to
the
(see later) or we can add further factors

As number of factors increases, we require
further constraints, typically on loading values.
A popular choice is simple structure with each
response loading on only 1 factor and non-zero
correlations between factors.
14
Extensions structural variables

We can allow the factors themselves to depend on
further variables e.g.

Or alternatively, but less commonly
15
Two level factor models
Standard formulation

Alternatively
But we shall not consider this case
16
Example PISA data

A survey of reading performance, of 15 year olds
in 32 countries by OECD in 2000.
We use one subscale of 35 items retrieving
information
and look at France and England.
First we shall fit one and two level models
assuming responses are Normal in fact they are
binary and ordered but we come to that later.
Open structural-equation.exe load pisadata

17
Binary and ordered responses

Assume a binary response z.
We will use the idea of a latent Normal
distribution. Consider the (factor) model for a
single response

Where we observe a positive (1) response for our
binary variable z if y is positive, that is
So that we obtain the probit model
18
Ordered data
Consider the cumulative probability of being in
one of the lowest s1 categories of a p category
variable - categories numbered from 0 upwards
s0,p-2 We extend the binary response model
as Where the define a set
of thresholds for the categories. So suppose we
have a 3-category variable, then for observed
responses

19
PISA data with binary/ordered responses

In fact all the responses are binary except for 4
with 3 ordered categories C9, C14, C20, and C26
Change these responses and rerun models.

Finally fit explanatory variables Country and
Gender in structural part of model.

20
Multivariate models with responses at 2 levels

Consider first 2 Normal responses
Superscript indicates level
Models are linked via level 2 covariance matrix
MCMC algorithm handles missing response data and
categorical (binary, ordered and unordered) as
well as Normal data.
First example is a repeated measures growth curve
model

21
Child heights adult height

Child height as a cubic polynomial with intercept
slope random at level 2
22

Load growthdata.txt and fit the model
Results

23
Adult height prediction

Suppose we have 2 growth measures we want a
regression prediction of the form
This leads to

24
Mixed response types and missing data

Normal and ordered data already considered in
structural equation models
We now introduce unordered categorical responses
We can also have general Normalising
transformations
Missing data via imputation is an important
application for these models

25
Unordered categorical responses
Assume p categories where an individual responds
to just one.

We have

where h indexes the response. For each
we assume an underlying latent variable
exists and that we have the following model
For identifiability we model p-1 categories and
assume .
The maximum indicant model we observe category h
for individual i iff .
so that

26
(No Transcript)
27
Multiple imputation briefly and simply
Consider the model of interest (MOI) We turn
this into a multivariate response model and
obtain residual estimates of
(from an MCMC chain)
which are missing. Use these to fill in
and produce a complete data set. Do this
(independently) n (e.g. 20) times. Fit MOI to
each data set and combine according to rules to
get estimates and standard errors.

28
Class size example

Load classsize_impute
MOI is Normalised exam score as response
regressed on pretest score, gender, FSM, class
size. 50 level 1 units have missing data.
Multivariate model

29
MI estimates vs listwise deletion

Fixed effects in multivariate model 50 records
MCAR

30
Further extensions

Box-Cox normalising transformations
Application to survival data treated as an
ordered response when divided into discrete time
intervals
Combination of measurement errors, structural
models and responses at gt1 level into a single
program
Incorporation into MLwiN

31
General remarks