Title: Other MCMC features in MLwiN and the MLwiN->WinBUGS interface
1Lecture 8
- Other MCMC features in MLwiN and the
MLwiN-gtWinBUGS interface
2Lecture Contents
- Convergence diagnostics.
- Starting values.
- MLwiN demonstration.
- MLwiN to WinBUGS interface.
- t distributed residuals example.
3Recap of practical session
- The practical covered two features of MCMC
sampling that we will cover in more detail here - Convergence diagnostics and summary statistics.
- Choosing starting values for unknown parameters.
4Reunion Island Model revisited
- We firstly fitted the following model to the
dataset
5Trajectories window
- By default we have the last 500 iterations
- Clearly some chains mix better than others!
6MCMC diagnostics for ?0
- We will describe each pane separately
7Trace plot
- This graph plots the generated values of the
parameter against the iteration number. - A crude test of mixing is the blue finger test.
- This chain doesnt mix that well but could be
worse!
8Kernel Density plot
- This plot is like a smoothed histogram.
- Instead of counting the estimates into bins of
particular widths like a histogram, the effect of
each iteration is spread around the estimate via
a Kernel function e.g. a normal distribution. - This means that at each point we get the sum of
the Kernel function parts for each iteration. - The Kernel density plot has a smoothness
parameter that can be modified.
9Time series diagnostics
Here we have the Auto correlation function (ACF)
and partial autocorrelation function (PACF)
plots. The ACF measures how correlated the values
in the chain are with their close neighbours. The
lag is the distance between the two chains to be
compared. An independent chain will have
approximately zero autocorrelation at each lag. A
Markov chain should have a power relationship in
the lags i.e. if ACF(1) ? then ACF(2) ?2 etc.
This is known as an AR(1) process. The PACF
measures discrepancies from such a process and so
should normally have values 0 after lag 1.
10Accuracy Diagnostics
- MLwiN has 2 accuracy diagnostics
- Raftery-Lewis works on quantiles of distribution.
- Brooks-Draper works on quoting the mean to n
significant figures. Its formulae uses the
estimate, its s.d. and the ACF and it can often
give very small or very large values!
11Raftery Lewis Diagnostic
- This diagnostic is based on the properties of a 2
state Markov chain. - For a given quantile the chain is dichotomized
into values that are greater (1) or less than (0)
the value of the quantile. - For an independent chain the 0s and 1s should
appear randomly. If there is large clustering (in
time) of 0s or 1s then the chain is mixing
badly.
12Summary Statistics
Three estimates of location are given Mean
from the chain. Mode from the kernel
plot. Median (50 quantile) by sorting the
chain and finding the middle value. The SD is
calculated from the chain and the other quantiles
are used to give (possibly) non-symmetric
interval estimates. The MCSE and ESS will be
discussed next.
13Monte Carlo Standard Error
The Monte Carlo Standard Error (MCSE) is an
indication of how much error is in the estimate
due to the fact that MCMC is used. As the number
of iterations increases the MCSE?0. For an
independent sampler it equals the SD/?n. However
it is adjusted due to the autocorrelation in the
chain. The graph above gives estimates for the
MCSE for longer runs.
14Effective Sample Size
- This quantity gives an estimate of the equivalent
number of independent iterations that the chain
represents. - This is related to the ACF and the MCSE.
- Its formula is
- For this parameter our 5,000 actual iterations
are equivalent to only 344 independent iterations!
15MCMC diagnostics for ?2u
This is the worst mixing parameter with a
suggestion of a run of over 100k iterations being
required.
16MCMC diagnostics for ?2uAfter 100,000 iterations
- Note that now the diagnostics are all satisfied.
17Starting Values in MLwiN
- By default MLwiN uses the values currently stored
for each parameter. - As we usually run MCMC directly after IGLS this
means we use IGLS estimates as starting values. - These values are a good place to start and hence
burnin will often be short. - This is OK for most multilevel models as we know
they should be unimodal. - However it is better to try several starting
values to ensure we have convergence. - WinBUGS allows multiple chains to be run which
allows multiple chain MCMC diagnostics.
18Diagnostics in WinBUGS
- WinBUGS uses a multiple chain diagnostic by
Brooks, Gelman and Rubin. - This is based on the ratio of betweenwithin
chain variance (ANOVA). - WinBUGS produces plots of
- Average 80 interval within-chains (blue) and
pooled 80 interval between chains (green) which
should converge to stable values. - Ratio pooledaverage interval widths (red)
should converge to 1.0. - To print values of the diagnostic double click on
the plot and press CTRL left hand mouse button.
19MCSE in WinBUGS
- Accuracy of the posterior estimates can be
assessed by the Monte Carlo standard error for
each parameter. - If samples were generated independently could
estimate SE of mean as - is the sample variance.
- This will underestimate the true MCSE due to
autocorrelation in the chain. - WinBUGS uses a batch means method replace
sample variance S2 by variance of batched means
that are assumed independent. - MLwiN replaces actual sample size with effective
sample size.
20BUGS history
- Bayesian inference Using Gibbs Sampling
- Language for specifying complex Bayesian models.
- Constructs object-oriented internal
representation of the model. - Builds up an arbitrary complex model through
specification of local structure. - Simulation from full conditionals using Gibbs
sampling - Current version (WinBUGS 1.4) runs in Windows,
and incorporates a script language for running in
batch mode. - Classic BUGS available for UNIX but this is an
old version. - WinBUGS is freely available from
http//www.mrc-bsu.cam.ac.uk
21WinBUGS common distributions
Expression Distribution Usage
dbin binomial r dbin(p,n)
dnorm normal x dnorm(mu,tau)
dpois Poisson r dpois(lambda)
dunif uniform x dunif(a,b)
dgamma gamma x dgamma(a,b)
NB As noted earlier the Normal is parameterised
in terms of its mean and precision 1/variance
1/sd2. See Model Specification/The BUGS
language stochastic nodes/ Distributions in
manual for full syntax. Note Functions cannot be
used as argument in distributions!! You need to
create additional nodes
22Some Built in WinBUGS functions
- p lt- step(x-.5) 1 if x 0.5, 0 otherwise
hence can be used to monitor when x 0.5 - p lt- equals(x,.5) 1 if x0.5, 0 otherwise.
- tau lt- 1/pow(s,2) sets
- s lt- 1/sqrt(tau) sets
- pi,k lt- inprod(pi, Lambdai,,k) sets
- See Model Specification/Logical nodes in manual
for full syntax.
23Graphical Models
- Model building
- Statistical modelling of complex systems involve
usually many interconnected random variables. - How to we build the connections?
- Key idea conditional independence
- It is helpful to represent the modelling process
by a graph - Nodes all random quantities
- Links (directed or undirected) association
between the nodes - Directed edges natural ordering of association,
causal influence - Undirected edges symmetric association,
correlation - The graph is used to represent a set of
conditional independence statements
24Graphical Model Doodle in WinBUGS
- The following is a doodle for the rats example in
WinBUGS
25The MLwiN to WinBUGS interface
- MLwiN has an interface that will generate WinBUGS
code for an MLwiN model.
The interface will save the model in two possible
versions of WinBUGS and will input back the
WinBUGS output in CODA format and use MLwiN
diagnostics.
26Model Description
- model
-
- Level 1 definition
- for(i in 1N)
- lncfsi dnorm(mui,tau)
- muilt- beta1 consi
- u2cowi consi
- u3herdi consi
-
- Higher level definitions
- for (j in 1n2)
- u2j dnorm(0,tau.u2)
-
- for (j in 1n3)
- u3j dnorm(0,tau.u3)
-
- Priors for fixed effects
- beta1 dflat()
- Priors for random terms
WinBUGS describes the model in a mixture of
deterministic and stochastic statements. The
interface takes the names from the MLwiN columns
and uses some other standard names e.g. beta,
sigma2. Note this means there is some redundancy
e.g. cons is included. Caution Care must be
taken with some column names.
27Data description
- Note that the data is stored in the file after
the word list and between parentheses. - For example
- list(N 3027, n2 1575, n3 50,
- cow c(1,1,2,2,3,4,5,5,5,5,6,6,6,7,7,7,8,8,8,9,
- 9,9,9,10,10,10,11,11,12,12,13,14,14,14,14,15,15,15
,16,16, - 16,17,17,17,17,18,18,18,18,19,20,20,20,21,21,21,22
,22,22,22, - 23,23,23,23,24,24,25,26,27,27,28,29,29,29,30,30,31
,31,32,33, - 33,33,33,34,34,34,35,35,35,36,36,37,38,39,39,40,40
,40,41,41, - 41,41,42,42,42,43,43,43,44,44,44,44,45,45,45,46,46
,47,47,47, - .),.
- Note some notation e.g. c for vectors borrowed
from R and S-plus.
28Initial values in WinBUGS
- Note that the initial values are also stored in
the file after the word list and between
parentheses. - For example
- list(beta c(4.217674),
- u2 c( 0.062732,-0.020885,0.054452,0.125001,-0.06
3034,-0.028254,-0.013445,-0.137141,-0.056420,-0.01
7268, - 0.025402,-0.013377,-0.013452,-0.018461,-0.103801,-
0.095923,0.031627,-0.019062,-0.026862,-0.037929, - 0.152077,0.008395,-0.015221,0.047518,-0.029718,0.0
41764,-0.040790,-0.055156,0.048479,-0.036886, - The MLwiN-gtWinBUGS interface has given the IGLS
starting values to WinBUGS.
29Running the reunion island dataset in WinBUGS
- The model was run for 3 parallel chains using
MLwiN starting values and 2 other sets. - We run the 3 chains for 5,000 after a 500 burnin
- Even the worst mixing parameter shows overlap in
the three chains (see below)
30Running the reunion island dataset in WinBUGS
- Brooks,Gelman Rubin Diagnostic
- Summary Statistics for the run
- node mean sd (2.5, 97.5)
- beta1 4.217 0.0192 (4.18,4.255)
- sigma2 0.135 0.0048 (0.126,0.145)
- sigma2.u2 0.0193 0.0041 (0.0117,0.0274)
- sigma2.u3 0.0152 0.0039 (0.0092,0.0244)
31Testing the sensitivity to the Normal assumption
- When we look at plots of residuals in a
multilevel model we often see outlying residuals. - These may be genuine outlying higher level units
that do not follow the same distribution or it
may be that the Normal distribution is not
appropriate. - The Normal is a member of the t family of
distributions and other members of the family
have longer tails and so may be more appropriate. - We will investigate this in the practical.
32Information on the practical
- In the practical we will look at how to fit the
variance components model for the education
dataset using WinBUGS. - We will also examine sensitivity to the Normal
assumption by considering instead a t
distribution with unknown degrees of freedom and
with fixed degrees of freedom (8).