MCMC Estimation for Random Effect Modelling

About This Presentation

Title:

MCMC Estimation for Random Effect Modelling

Description:

Developed by a team led by Harvey Goldstein and Jon Rasbash at the Institute of ... huge dataset that Martin has been collecting in a Milk Development Council grant. ... – PowerPoint PPT presentation

Number of Views:158

Avg rating:3.0/5.0

Slides: 40

Provided by: mat135

Category:

more less

Transcript and Presenter's Notes

Title: MCMC Estimation for Random Effect Modelling

1
MCMC Estimation for Random Effect Modelling The
MLwiN experience

Dr William J. Browne
School of Mathematical Sciences
University of Nottingham

2
Contents

Random effect modelling, MCMC and MLwiN.
Methods comparison Guatemalan child health
example.
Extendibility of MCMC algorithms
Cross classified and multiple membership models.
Artificial insemination and Danish chicken
examples.
Further Extensions.

3
Random effect models

Models that account for the underlying structure
in the dataset.
Originally developed for nested structures
(multilevel models), for example in education,
pupils nested within schools.
An extension of linear modelling with the
inclusion of random effects.
A typical 2-level model is
Here i indexes pupils and j indexes schools.

4
MLwiN

Software package designed specifically for
fitting multilevel models.
Developed by a team led by Harvey Goldstein and
Jon Rasbash at the Institute of Education in
London over past 15 years or so. Earlier
incarnations ML2, ML3, MLN.
Originally contained classical IGLS estimation
methods for fitting models.
MLwiN launched in 1998 also included MCMC
estimation.
My role in team was as developer of MCMC
functionality in MLwiN during 4.5 years at the
IOE.

5
Estimation Methods for Multilevel Models

Due to additional random effects no simple matrix
formulae exist for finding estimates in
multilevel models.
Two alternative approaches exist
Iterative algorithms e.g. IGLS, RIGLS, EM in HLM
that alternate between estimating fixed and
random effects until convergence. Can produce ML
and REML estimates.
Simulation-based Bayesian methods e.g. MCMC that
attempt to draw samples from the posterior
distribution of the model.

6
MCMC Algorithm

Consider the 2-level model
MCMC algorithms work in a Bayesian framework and
so we need to add prior distributions for the
unknown parameters.
Here there are 4 sets of unknown parameters
We will add prior distributions

7
MCMC Algorithm (2)

The algorithm for this model then involves
simulating in turn from the 4 sets of conditional
distributions. Such an algorithm is known as
Gibbs Sampling. MLwiN uses Gibbs sampling for all
normal response models.
Firstly we set starting values for each group of
unknown parameters,
Then sample from the following conditional
distributions, firstly
To get .

8
MCMC Algorithm (3)

We next sample from
to get , then
to get , then finally
To get . We have then updated all of the
unknowns in the model. The process is then simply
repeated many times, each time using the
previously generated parameter values to generate
the next set

9
Burn-in and estimates

Burn-in It is general practice to throw away the
first n values to allow the Markov chain to
approach its equilibrium distribution namely the
joint posterior distribution of interest. These
iterations are known as the burn-in.
Finding Estimates We continue generating values
at the end of the burn-in for another m
iterations. These m values are then average to
give point estimates of the parameter of
interest. Posterior standard deviations and other
summary measures can also be obtained from the
chains.

10
Methods for non-normal responses

When the response variable is Binomial or Poisson
then different algorithms are required.
IGLS/RIGLS methods give quasilikelihood estimates
e.g. MQL, PQL.
MCMC algorithms including Metropolis Hastings
sampling and Adaptive Rejection sampling are
possible.
Numerical Quadrature can give ML estimates but is
not without problems.

11
So why use MCMC?

Often gives better estimates for non-normal
responses.
Gives full posterior distribution so interval
estimates for derived quantities are easy to
produce.
Can easily be extended to more complex problems.
Potential downside 1 Prior distributions
required for all unknown parameters.
Potential downside 2 MCMC estimation is much
slower than the IGLS algorithm.

12
The Guatemalan Child Health dataset.

This consists of a subsample of 2,449 respondents
from the 1987 National Survey of Maternal and
Child Helath, with a 3-level structure of births
within mothers within communities.
The subsample consists of all women from the
chosen communities who had some form of prenatal
care during pregnancy. The response variable is
whether this prenatal care was modern (physician
or trained nurse) or not.
Rodriguez and Goldman (1995) use the structure of
this dataset to consider how well
quasi-likelihood methods compare with considering
the dataset without the multilevel structure and
fitting a standard logistic regression.
They perform this by constructing simulated
datasets based on the original structure but with
known true values for the fixed effects and
variance parameters.
They consider the MQL method and show that the
estimates of the fixed effects produced by MQL
are worse than the estimates produced by standard
logistic regression disregarding the multilevel
structure!

13
The Guatemalan Child Health dataset.

Goldstein and Rasbash (1996) consider the same
problem but use the PQL method. They show that
the results produced by PQL 2nd order estimation
are far better than for MQL but still biased.
The model in this situation is
In this formulation i,j and k index the level 1,
2 and 3 units respectively.
The variables x1,x2 and x3 are composite scales
at each level because the original model
contained many covariates at each level.
Browne and Draper (2004) considered the hybrid
Metropolis-Gibbs method in MLwiN and two possible
variance priors (Gamma-1(e,e) and Uniform.

14
Simulation Results

The following gives point estimates (MCSE) for 4
methods and 500 simulated datasets.

Parameter (True) MQL1 PQL2 Gamma Uniform
ß0 (0.65) 0.474 (0.01) 0.612 (0.01) 0.638 (0.01) 0.655 (0.01)
ß1 (1.00) 0.741 (0.01) 0.945 (0.01) 0.991 (0.01) 1.015 (0.01)
ß2 (1.00) 0.753 (0.01) 0.958 (0.01) 1.006 (0.01) 1.031 (0.01)
ß3 (1.00) 0.727 (0.01) 0.942 (0.01) 0.982 (0.01) 1.007 (0.01)
s2v (1.00) 0.550 (0.01) 0.888 (0.01) 1.023 (0.01) 1.108 (0.01)
s2u (1.00) 0.026 (0.01) 0.568 (0.01) 0.964 (0.02) 1.130 (0.02)
15
Simulation Results

The following gives interval coverage
probabilities (90/95) for 4 methods and 500
simulated datasets.

Parameter (True) MQL1 PQL2 Gamma Uniform
ß0 (0.65) 67.6/76.8 86.2/92.0 86.8/93.2 88.6/93.6
ß1 (1.00) 56.2/68.6 90.4/96.2 92.8/96.4 92.2/96.4
ß2 (1.00) 13.2/17.6 84.6/90.8 88.4/92.6 88.6/92.8
ß3 (1.00) 59.0/69.6 85.2/89.8 86.2/92.2 88.6/93.6
s2v (1.00) 0.6/2.4 70.2/77.6 89.4/94.4 87.8/92.2
s2u (1.00) 0.0/0.0 21.2/26.8 84.2/88.6 88.0/93.0
16
Summary of simulations

The Bayesian approach yields excellent bias and
coverage results.
For the fixed effects, MQL performs badly but the
other 3 methods all do well.
For the random effects, MQL and PQL both perform
badly but MCMC with both priors is much better.
Note that this is an extreme scenario with small
levels 1 in level 2 yet high level 2 variance and
in other examples MQL/PQL will not be so bad.

17
Extension 1 Cross-classified models
For example, schools by neighbourhoods. Schools
will draw pupils from many different
neighbourhoods and the pupils of a neighbourhood
will go to several schools. No pure hierarchy can
be found and pupils are said to be contained
within a cross-classification of schools by
neighbourhoods

nbhd 1 nbhd 2 Nbhd 3
School 1 xx x
School 2 x x
School 3 xx x
School 4 x xxx

18
Notation
With hierarchical models we use a subscript
notation that has one subscript per level and
nesting is implied reading from the left. For
example, subscript pattern ijk denotes the ith
level 1 unit within the jth level 2 unit within
the kth level 3 unit. If models become
cross-classified we use the term classification
instead of level. With notation that has one
subscript per classification, that captures the
relationship between classifications, notation
can become very cumbersome. We propose an
alternative notation introduced in Browne et al.
(2001) that only has a single subscript no matter
how many classifications are in the model.
19
Single subscript notation
We write the model as
Where classification 2 is neighbourhood and
classification 3 is school. Classification 1
always corresponds to the classification at which
the response measurements are made, in this case
patients. For pupils 1 and 11 equation (1)
becomes
20
Classification diagrams
In the single subscript notation we lose
information about the relationship(crossed or
nested) between classifications. A useful way of
conveying this information is with the
classification diagram. Which has one node per
classification and nodes linked by arrows have a
nested relationship and unlinked nodes have a
crossed relationship.
School
Neighbourhood
Pupil
Cross-classified structure where pupils from a
school come from many neighbourhoods and pupils
from a neighbourhood attend several schools.
Nested structure where schools are contained
within neighbourhoods
21
Example Artificial insemination by donor
1901 women 279 donors 1328 donations 12100
ovulatory cycles response is whether conception
occurs in a given cycle
In terms of a unit diagram
Or a classification diagram
22
Model for artificial insemination data
We can write the model as
Results
Note cross-classified models can be fitted in
IGLS but are far easier to fit using MCMC
estimation.
23
Extension 2 Multiple membership models

When level 1 units are members of more than one
higher level unit we describe a model for such
data as a multiple membership model.
For example,
Pupils change schools/classes and each
school/class has an effect on pupil outcomes.
Patients are seen by more than one nurse during
the course of their treatment.

24
Notation
Note that nurse(i) now indexes the set of nurses
that treat patient i and w(2)i,j is a weighting
factor relating patient i to nurse j. For
example, with four patients and three nurses, we
may have the following weights
25
Classification diagrams for multiple membership
relationships
Double arrows indicate a multiple membership
relationship between classifications.
We can mix multiple membership, crossed and
hierarchical structures in a single model.
26
Example involving nesting, crossing and multiple
membership Danish chickens
Production hierarchy 10,127 child flocks
725
houses 304 farms
Breeding hierarchy 10,127 child flocks 200 parent
flocks
As a unit diagram
As a classification diagram
27
Model and results
Note multiple membership models can be fitted in
IGLS and this model/dataset represents roughly
the most complex model that the method can
handle. Such models are far easier to fit using
MCMC estimation.
28
Further Extensions / Work in progress

Multilevel factor models
Response variables at different levels
Missing data and multiple imputation
ESRC grant Sample size calculations, MCMC
efficiency Model identifiability
Wellcome Fellowship grant for Martin Green

29
Multilevel factor analysis modelling

In sample surveys there are often many responses
for each individual.
Techniques like factor analysis are often used to
identify underlying latent traits amongst these
responses.
Multilevel factor analysis allows factor analysis
modelling to identify factors at various
levels/classifications in the dataset so we can
identify shared latent traits as well as
individual level traits.
Due to the nature of MCMC algorithms by adding a
step to allow for multilevel factor models in
MLwiN, cross-classified models can also be fitted
without any additional programming!
See Goldstein and Browne (2002,2005) for more
detail.

30
Responses at different levels

In a medical survey some responses may refer to
patients in a hospital while others may refer to
the hospital itself.
Models that combine these responses can be fitted
using the IGLS algorithm in MLwiN and shouldnt
pose any problems to MCMC estimation.
The Centre for Multilevel modelling in Bristol
are investigating such models as part of their
LEMMA node in the ESRC research methods program.
I am a named collaborator for the Lemma project.
They are also looking at MCMC algorithms for
latent growth models.

31
Missing data and multiple imputation

Missing data is proliferate in survey research.
One popular approach to dealing with missing data
is multiple imputation (Rubin 1987) where several
imputed datasets are created and then the model
of interest is fitted to each dataset and the
estimates combined.
Using a multivariate normal response multilevel
model to generate the imputations using MCMC in
MLwiN is described in chapter 17 of Browne (2003)
James Carpenter (LSHTM) has begun work on macros
in MLwiN that automate the multiple imputation
procedure.

32
Sample size calculations

Another issue in data collection is how big a
sample do we need to collect?
Such sample size calculations have simple
formulae if we can assume that an independent
sample can be generated.
If however we wish to account for the data
structure in the calculation then things are more
complex.
One possibility is a simulation-based approach
similar to that used in the model comparisons
described earlier where many datasets are
simulated to look at the power for a fixed sample
size.
Mousa Golalizadeh Lehi will be joining me in
February on an ESRC grant looking at such an
approach. A 4th year MMath. student (Lynda Leese)
is looking at the approach for nested models.

33
Efficient MCMC algorithms

In MLwiN we have tended to use the simplest, most
generally applicable MCMC algorithms for
multilevel models.
For particular models there are many approaches
that may improve the performance / mixing of the
MCMC algorithm.
We will also investigate some of these methods in
the ESRC grant.
Browne (2004) looked at some reparameterisation
methods for cross-classified models in a bird
nesting dataset.
A second 4th year MMath. student (Francis
Bourchier) is looking at MCMC methods based
around the IGLS representation of nested models
which are interesting.

34
Model Identifiability

The final part of the ESRC grant is to look at
whether a model is identifiable/estimable given a
particular set of data.
Cross-classified datasets where there are few
level 1 units per higher level unit can result in
each observation being factored into several
random effects with very few observations being
used to estimate each random effect.
We are interested in establishing whether we can
really estimate all parameters in such models.
An example where we cant would be a dataset of
patients who are attended by doctors in wards.
Now if there is only one doctor per ward and
likewise one ward per doctor then we cannot tease
out doctor and ward effects. Again this work was
motivated by a bird nesting dataset.

35
Wellcome Fellowship

Martin Green has been successful in obtaining 4
years of funding from Wellcome to come and work
with me.
The project is entitled Use of Bayesian
statistical methods to investigate farm
management strategies, cow traits and
decision-making in the prevention of clinical and
sub-clinical mastitis in dairy cows.
Martin is a qualified veterinary and a specialist
farm animal surgeon.
He completed a PhD in 2004 at the University of
Warwick veterinary epidemiology and used MCMC to
fit binary response multilevel models in both
MLwiN and WinBUGS to look at various factors that
affect clinical mastitis in dairy cows.

36
Wellcome Fellowship

In the 4 years we are aiming to analyse a huge
dataset that Martin has been collecting in a Milk
Development Council grant.
In particular we will look at how farm management
practices, environmental conditions and cow
characteristics influence the risk of mastitis
during the dry period.
We will hope to get interesting applied results
and also some novel statistical methodology from
the grant. MCMC will again play a big part.
Martin has been appointed as a professor in the
new vet school and will be working there 1 day a
fortnight during the grant before moving there
full time after the four years.

37
Conclusions

In this talk we have attempted to show the
flexibility of MCMC methods in attempting to
match the complexity of data structures found in
real problems.
We have also contrasted the methods with the IGLS
algorithm.
Although we have used MLwiN, all the models
considered here could also be fitted in WinBUGS.
WinBUGS offers even more flexibility in model
choice for complex data structures however it is
typically slower in fitting models that can also
be fitted in MLwiN.
Genstat and the GLLAMM package in Stata also
deserve mention as alternatives in the ML/REML
world for the models considered.
Slides available at http//www.maths.nottingham.ac
.uk/personal/pmzwjb/billtalk.html

38
References

Browne, W.J. (2003). MCMC Estimation in MLwiN.
London Institute of Education, University of
London.
Browne, W.J. (2004). An illustration of the use
of reparameterisation methods for improving MCMC
efficiency in crossed random effect models.
Multilevel Modelling Newsletter 16 (1) 13-25
Browne, W.J. and Draper, D. (2004). A Comparison
of Bayesian and Likelihood-based methods for
fitting multilevel models. University of
Nottingham Research Report 04-01
Browne, W.J., Goldstein, H. and Rasbash, J.
(2001). Multiple membership multiple
classification (MMMC) models. Statistical
Modelling 1 103-124.

39
References

Goldstein, H. and Browne, W. J. (2002).
Multilevel factor analysis modelling using Markov
Chain Monte Carlo (MCMC) estimation. In
Marcoulides and Moustaki (Eds.), Latent Variable
and Latent Structure Models. p 225-243. Lawrence
Erlbaum, New Jersey.
Goldstein, H. and Browne, W.J. (2005). Multilevel
Factor Analysis Models for Continuous and
Discrete Data. In Maydeu-Olivares, A and McArdle,
J.J. (Eds.), Contemporary psychometrics a
festschrift for Roderick P. McDonald, p 453-475.
Lawrence Erlbaum, New Jersey.
Goldstein, H. and Rasbash, J. (1996). Improved
approximations for multilevel models with binary
responses. Journal of the Royal Statistical
Society, A. 159 505-13.
Rodriguez, G. and Goldman, N. (1995). An
assessment of estimation procedures for
multilevel models with binary responses. Journal
of the Royal Statistical Society, Series A, 158,
73-89.
Rubin, D.B. (1987). Multiple Imputation for
Nonresponse in Surveys. New York J. Wiley and
Sons.