Other MCMC features in MLwiN and the MLwiN->WinBUGS interface presentation

About This Presentation

Transcript and Presenter's Notes

Title: Other MCMC features in MLwiN and the MLwiN->WinBUGS interface

1
Lecture 8

Other MCMC features in MLwiN and the
MLwiN-gtWinBUGS interface

2
Lecture Contents

Convergence diagnostics.
Starting values.
MLwiN demonstration.
MLwiN to WinBUGS interface.
t distributed residuals example.

3
Recap of practical session

The practical covered two features of MCMC
sampling that we will cover in more detail here
Convergence diagnostics and summary statistics.
Choosing starting values for unknown parameters.

4
Reunion Island Model revisited

We firstly fitted the following model to the
dataset

5
Trajectories window

By default we have the last 500 iterations
Clearly some chains mix better than others!

6
MCMC diagnostics for ?0

We will describe each pane separately

7
Trace plot

This graph plots the generated values of the
parameter against the iteration number.
A crude test of mixing is the blue finger test.
This chain doesnt mix that well but could be
worse!

8
Kernel Density plot

This plot is like a smoothed histogram.
Instead of counting the estimates into bins of
particular widths like a histogram, the effect of
each iteration is spread around the estimate via
a Kernel function e.g. a normal distribution.
This means that at each point we get the sum of
the Kernel function parts for each iteration.
The Kernel density plot has a smoothness
parameter that can be modified.

9
Time series diagnostics
Here we have the Auto correlation function (ACF)
and partial autocorrelation function (PACF)
plots. The ACF measures how correlated the values
in the chain are with their close neighbours. The
lag is the distance between the two chains to be
compared. An independent chain will have
approximately zero autocorrelation at each lag. A
Markov chain should have a power relationship in
the lags i.e. if ACF(1) ? then ACF(2) ?2 etc.
This is known as an AR(1) process. The PACF
measures discrepancies from such a process and so
should normally have values 0 after lag 1.
10
Accuracy Diagnostics

MLwiN has 2 accuracy diagnostics
Raftery-Lewis works on quantiles of distribution.
Brooks-Draper works on quoting the mean to n
significant figures. Its formulae uses the
estimate, its s.d. and the ACF and it can often
give very small or very large values!

11
Raftery Lewis Diagnostic

This diagnostic is based on the properties of a 2
state Markov chain.
For a given quantile the chain is dichotomized
into values that are greater (1) or less than (0)
the value of the quantile.
For an independent chain the 0s and 1s should
appear randomly. If there is large clustering (in
time) of 0s or 1s then the chain is mixing
badly.

12
Summary Statistics
Three estimates of location are given Mean
from the chain. Mode from the kernel
plot. Median (50 quantile) by sorting the
chain and finding the middle value. The SD is
calculated from the chain and the other quantiles
are used to give (possibly) non-symmetric
interval estimates. The MCSE and ESS will be
discussed next.
13
Monte Carlo Standard Error
The Monte Carlo Standard Error (MCSE) is an
indication of how much error is in the estimate
due to the fact that MCMC is used. As the number
of iterations increases the MCSE?0. For an
independent sampler it equals the SD/?n. However
it is adjusted due to the autocorrelation in the
chain. The graph above gives estimates for the
MCSE for longer runs.
14
Effective Sample Size

This quantity gives an estimate of the equivalent
number of independent iterations that the chain
represents.
This is related to the ACF and the MCSE.
Its formula is
For this parameter our 5,000 actual iterations
are equivalent to only 344 independent iterations!

15
MCMC diagnostics for ?2u
This is the worst mixing parameter with a
suggestion of a run of over 100k iterations being
required.
16
MCMC diagnostics for ?2uAfter 100,000 iterations

Note that now the diagnostics are all satisfied.

17
Starting Values in MLwiN

By default MLwiN uses the values currently stored
for each parameter.
As we usually run MCMC directly after IGLS this
means we use IGLS estimates as starting values.
These values are a good place to start and hence
burnin will often be short.
This is OK for most multilevel models as we know
they should be unimodal.
However it is better to try several starting
values to ensure we have convergence.
WinBUGS allows multiple chains to be run which
allows multiple chain MCMC diagnostics.

18
Diagnostics in WinBUGS

WinBUGS uses a multiple chain diagnostic by
Brooks, Gelman and Rubin.
This is based on the ratio of betweenwithin
chain variance (ANOVA).
WinBUGS produces plots of
Average 80 interval within-chains (blue) and
pooled 80 interval between chains (green) which
should converge to stable values.
Ratio pooledaverage interval widths (red)
should converge to 1.0.
To print values of the diagnostic double click on
the plot and press CTRL left hand mouse button.

19
MCSE in WinBUGS

Accuracy of the posterior estimates can be
assessed by the Monte Carlo standard error for
each parameter.
If samples were generated independently could
estimate SE of mean as
is the sample variance.
This will underestimate the true MCSE due to
autocorrelation in the chain.
WinBUGS uses a batch means method replace
sample variance S2 by variance of batched means
that are assumed independent.
MLwiN replaces actual sample size with effective
sample size.

20
BUGS history

Bayesian inference Using Gibbs Sampling
Language for specifying complex Bayesian models.
Constructs object-oriented internal
representation of the model.
Builds up an arbitrary complex model through
specification of local structure.
Simulation from full conditionals using Gibbs
sampling
Current version (WinBUGS 1.4) runs in Windows,
and incorporates a script language for running in
batch mode.
Classic BUGS available for UNIX but this is an
old version.
WinBUGS is freely available from
http//www.mrc-bsu.cam.ac.uk

21
WinBUGS common distributions
Expression Distribution Usage
dbin binomial r dbin(p,n)
dnorm normal x dnorm(mu,tau)
dpois Poisson r dpois(lambda)
dunif uniform x dunif(a,b)
dgamma gamma x dgamma(a,b)
NB As noted earlier the Normal is parameterised
in terms of its mean and precision 1/variance
1/sd2. See Model Specification/The BUGS
language stochastic nodes/ Distributions in
manual for full syntax. Note Functions cannot be
used as argument in distributions!! You need to
create additional nodes
22
Some Built in WinBUGS functions

p lt- step(x-.5) 1 if x 0.5, 0 otherwise
hence can be used to monitor when x 0.5
p lt- equals(x,.5) 1 if x0.5, 0 otherwise.
tau lt- 1/pow(s,2) sets
s lt- 1/sqrt(tau) sets
pi,k lt- inprod(pi, Lambdai,,k) sets
See Model Specification/Logical nodes in manual
for full syntax.

23
Graphical Models

Model building
Statistical modelling of complex systems involve
usually many interconnected random variables.
How to we build the connections?
Key idea conditional independence
It is helpful to represent the modelling process
by a graph
Nodes all random quantities
Links (directed or undirected) association
between the nodes
Directed edges natural ordering of association,
causal influence
Undirected edges symmetric association,
correlation
The graph is used to represent a set of
conditional independence statements

24
Graphical Model Doodle in WinBUGS

The following is a doodle for the rats example in
WinBUGS

25
The MLwiN to WinBUGS interface

MLwiN has an interface that will generate WinBUGS
code for an MLwiN model.

The interface will save the model in two possible
versions of WinBUGS and will input back the
WinBUGS output in CODA format and use MLwiN
diagnostics.
26
Model Description

model
Level 1 definition
for(i in 1N)
lncfsi dnorm(mui,tau)
muilt- beta1 consi
u2cowi consi
u3herdi consi
Higher level definitions
for (j in 1n2)
u2j dnorm(0,tau.u2)
for (j in 1n3)
u3j dnorm(0,tau.u3)
Priors for fixed effects
beta1 dflat()
Priors for random terms

WinBUGS describes the model in a mixture of
deterministic and stochastic statements. The
interface takes the names from the MLwiN columns
and uses some other standard names e.g. beta,
sigma2. Note this means there is some redundancy
e.g. cons is included. Caution Care must be
taken with some column names.
27
Data description

Note that the data is stored in the file after
the word list and between parentheses.
For example
list(N 3027, n2 1575, n3 50,
cow c(1,1,2,2,3,4,5,5,5,5,6,6,6,7,7,7,8,8,8,9,
9,9,9,10,10,10,11,11,12,12,13,14,14,14,14,15,15,15
,16,16,
16,17,17,17,17,18,18,18,18,19,20,20,20,21,21,21,22
,22,22,22,
23,23,23,23,24,24,25,26,27,27,28,29,29,29,30,30,31
,31,32,33,
33,33,33,34,34,34,35,35,35,36,36,37,38,39,39,40,40
,40,41,41,
41,41,42,42,42,43,43,43,44,44,44,44,45,45,45,46,46
,47,47,47,
.),.
Note some notation e.g. c for vectors borrowed
from R and S-plus.

28
Initial values in WinBUGS

Note that the initial values are also stored in
the file after the word list and between
parentheses.
For example
list(beta c(4.217674),
u2 c( 0.062732,-0.020885,0.054452,0.125001,-0.06
3034,-0.028254,-0.013445,-0.137141,-0.056420,-0.01
7268,
0.025402,-0.013377,-0.013452,-0.018461,-0.103801,-
0.095923,0.031627,-0.019062,-0.026862,-0.037929,
0.152077,0.008395,-0.015221,0.047518,-0.029718,0.0
41764,-0.040790,-0.055156,0.048479,-0.036886,
The MLwiN-gtWinBUGS interface has given the IGLS
starting values to WinBUGS.

29
Running the reunion island dataset in WinBUGS

The model was run for 3 parallel chains using
MLwiN starting values and 2 other sets.
We run the 3 chains for 5,000 after a 500 burnin
Even the worst mixing parameter shows overlap in
the three chains (see below)

30
Running the reunion island dataset in WinBUGS

Brooks,Gelman Rubin Diagnostic
Summary Statistics for the run
node mean sd (2.5, 97.5)
beta1 4.217 0.0192 (4.18,4.255)
sigma2 0.135 0.0048 (0.126,0.145)
sigma2.u2 0.0193 0.0041 (0.0117,0.0274)
sigma2.u3 0.0152 0.0039 (0.0092,0.0244)

31
Testing the sensitivity to the Normal assumption

When we look at plots of residuals in a
multilevel model we often see outlying residuals.
These may be genuine outlying higher level units
that do not follow the same distribution or it
may be that the Normal distribution is not
appropriate.
The Normal is a member of the t family of
distributions and other members of the family
have longer tails and so may be more appropriate.
We will investigate this in the practical.

32
Information on the practical

In the practical we will look at how to fit the
variance components model for the education
dataset using WinBUGS.
We will also examine sensitivity to the Normal
assumption by considering instead a t
distribution with unknown degrees of freedom and
with fixed degrees of freedom (8).

Write a Comment

User Comments (0)

About PowerShow.com

Other MCMC features in MLwiN and the MLwiN->WinBUGS interface PowerPoint PPT Presentation