Title: Introduction to WinBUGS
1Introduction to WinBUGS
- Olivier Gimenez
- University of St Andrews, Scotland
2A brief history
- 1989 project began with a Unix version called
BUGS - 1998 first Windows version, WinBUGS was born
- Initially developed by the MRC Biostatistics Unit
in Cambridge and now joint work with Imperial
College School of Medicine at St Mary's, London. - Windows Bayesian inference Using Gibbs Sampling
- Software for the Bayesian analysis of complex
statistical models using Markov chain Monte Carlo
(MCMC) methods
3Who?
Nicky Best Imperial College Faculty of Medicine,
London (UK)
Thomas Andrew University of Helsinki, Helsinki
(Finland)
Wally Gilks MRC Biostatistics Unit Institute of
Public Health Cambridge (UK)
David Spiegelhalter MRC Biostatistics Unit
Institute of Public Health Cambridge (UK)
Freely downloadable from http//www.mrc-bsu.cam.a
c.uk/bugs/winbugs/contents.shtml
4Key principle
- You specify the prior and build up the likelihood
- WinBUGS computes the posterior by running a Gibbs
sampling algorithm, based on - ?(?D) / L(D?) ?(?)
- WinBUGS computes some convergence diagnostics
that you have to check
5A biological example throughout White stork
(Ciconia ciconia) in Alsace 1948-1970
Demographic components (fecundity, breeding
success, survival, etc)
Climate (Temperature, rainfall, etc)
6WinBUGS Linear Regression
15.1 67 13.3 52 15.3 88 13.3 61 14.6 32
15.6 36 13.1 72 13.1 43 15.0 92 11.7 32
15.3 86 14.4 28 14.4 57 12.7 55 11.7 66
11.9 26 15.9 28 13.4 96 14.0 48 13.9 90
12.9 86 15.1 78 13.0 87
2.55 1.85 2.05 2.88 3.13 2.21 2.43 2.69 2.55 2.84
2.47 2.69 2.52 2.31 2.07 2.35 2.98 1.98 2.53 2.21
2.62 1.78 2.30
Y Number of chicks per pairs
T Temp. May (C) R Rainf. May (mm)
7WinBUGS Linear Regression
1. Do temperature and rainfall affect the number
of chicks?
2. Regression model
Yi ? ?r Ri ?t Ti ?i , i1,...,23 ?i
i.i.d. N(0,?2)
Yi i.i.d. N(?i,?2), i1,...,23 ?i ? ?r Ri
?t Ti
3. Estimation of parameters ?, ?r, ?t, ?
4. Frequentist inference uses t-tests
8Linear Regression using Frequentist approach
15.1 67 13.3 52 15.3 88 13.3 61 14.6 32
15.6 36 ... ... 13.0 87
2.55 1.85 2.05 2.88 3.13 2.21 ... 2.30
Y Number of chicks per pairs
T Temp. May (C) R Rainf. May (mm)
Y 2.451 0.031 T - 0.007 R
Estimate Std. Error t value Pr(gtt)
temperature 0.031069 0.054690 0.568
0.57629 rainfall -0.007316 0.002897
-2.525 0.02011
9Linear Regression using Frequentist approach
15.1 67 13.3 52 15.3 88 13.3 61 14.6 32
15.6 36 ... ... 13.0 87
2.55 1.85 2.05 2.88 3.13 2.21 ... 2.30
Y Number of chicks per pairs
T Temp. May (C) R Rainf. May (mm)
Y 2.451 0.031 T - 0.007 R
Estimate Std. Error t value Pr(gtt)
temperature 0.031069 0.054690 0.568
0.57629 rainfall -0.007316 0.002897
-2.525 0.02011
? Influence of Rainfall only
10Running WinBUGS What do you need?
11Running WinBUGS The model
12Running WinBUGS Data and initial values
13Running WinBUGS Overall
14Running WinBUGS At last!!
1- check model 2- load data 3- compile model 4-
load initial values 5- generate burn-in values 6-
parameters to be monitored 7- perform the
sampling to generate posteriors 8- check
convergence and display results
15Running WinBUGS 1. Check model
16Running WinBUGS 1. Check model highlight 'model'
17Running WinBUGS 1. Check model open the Model
Specification Tool
18Running WinBUGS 1. Check model Now click 'check
model'
19Running WinBUGS 1. Check model Watch out for the
confirmation at the foot of the screen
20Running WinBUGS 2. Load data Now highlight the
'list' in the data window
21Running WinBUGS 2. Load data then click 'load
data'
22Running WinBUGS 2. Load data watch out for the
confirmation at the foot of the screen
23Running WinBUGS 3. Compile model Next, click
'compile'
24Running WinBUGS 3. Compile model watch out for
the confirmation at the foot of the screen
25Running WinBUGS 4. Load initial values highlight
the 'list' in the data window
26Running WinBUGS 4. Load initial values click
'load inits'
27Running WinBUGS 4. Load initial values watch out
for the confirmation at the foot of the screen
28Running WinBUGS 5. Generate Burn-in values Open
the Model Update Tool
29Running WinBUGS 5. Generate Burn-in values Give
the number of burn-in iterations (1000)
30Running WinBUGS 5. Generate Burn-in values click
'update' to do the sampling
31Running WinBUGS 6. Monitor parameters open the
Inference Samples Tool
32Running WinBUGS 6. Monitor parameters Enter
'intercept' in the node box and click 'set'
33Running WinBUGS 6. Monitor parameters Enter
'slope_temperature' in the node box and click
'set'
34Running WinBUGS 6. Monitor parameters Enter
'slope_rainfall' in the node box and click 'set'
35Running WinBUGS 7. Generate posterior values
enter the number of samples you want to take
(10000)
36Running WinBUGS 7. Generate posterior values
click 'update' to do the sampling
37Running WinBUGS 8. Summarize posteriors Enter
'' in the node box and click 'stats'
38Running WinBUGS 8. Summarize posteriors mean,
median and credible intervals
39Running WinBUGS 8. Summarize posteriors 95
Credible intervals
tell us the same story
Estimate Std. Error t value Pr(gtt)
temperature 0.031069 0.054690 0.568
0.57629 rainfall -0.007316 0.002897
-2.525 0.02011
40Running WinBUGS 8. Summarize posteriors 95
Credible intervals
tell us the same story
Estimate Std. Error t value Pr(gtt)
temperature 0.031069 0.054690 0.568
0.57629 rainfall -0.007316 0.002897
-2.525 0.02011
41Running WinBUGS 8. Summarize posteriors click
'history'
42Running WinBUGS 8. Summarize posteriors click
'auto cor'
? Problem of autocorrelation
43Coping with autocorrelation use standardized
covariates
44Coping with autocorrelation use standardized
covariates
45Re-running WinBUGS 1,2,...7, and 8. Summarize
posteriors click 'auto cor'
? autocorrelation OK
46Re-running WinBUGS 1,2,...7, and 8. Summarize
posteriors click 'density'
47Re-running WinBUGS 1,2,...7, and 8. Summarize
posteriors click 'quantiles'
48Running WinBUGS 8. Checking for convergence using
the Brooks-Gelman-Rubin criterion
- A way to identify non-convergence is to simulate
multiple sequences for over-dispersed starting
points - Intuitively, the behaviour of all of the chains
should be basically the same. - In other words, the variance within the chains
should be the same as the variance across the
chains. - In WinBUGS, stipulate the number of chains after
'load data' and before 'compile' (obviously, as
many sets of initial values as chains have to be
loaded, or generated)
49Running WinBUGS 8. Checking for convergence using
the Brooks-Gelman-Rubin criterion
The normalized width of the central 80 interval
of the pooled runs is green The normalized
average width of the 80 intervals within the
individual runs is blue
50Re-running WinBUGS 1,2,...7, and 8. Summarize
posteriors others...
- Click 'coda' to produce lists of data suitable
for external treatment via the Coda R package - Click 'trace' to produce dynamic history
changing in real time
51Another example logistic regression
15.1 67 13.3 52 15.3 88 13.3 61 14.6 32
15.6 36 13.1 72 13.1 43 15.0 92 11.7 32
15.3 86 14.4 28 14.4 57 12.7 55 11.7 66
11.9 26 15.9 28 13.4 96 14.0 48 13.9 90
12.9 86 15.1 78 13.0 87
151 / 173 105 / 164 73 / 103 107 / 113 113
/ 122 87 / 112 77 / 98 108 / 121 118 /
132 122 / 136 112 / 133 120 / 137 122 / 145
89 / 117 69 / 90 71 / 80 53 / 67
41 / 54 53 / 58 31 / 39 35 /
42 14 / 23 18 / 23
Y Proportion of nests with success (gt0 youngs)
T Temp. May (C) R Rainf. May (mm)
52Performing a logistic regression with WinBUGS
53Performing a logistic regression with WinBUGS
54Performing a logistic regression with WinBUGS
noninformative priors
55Performing a logistic regression with WinBUGS
data initial values
56Performing a logistic regression with WinBUGS the
results
lower
upper
- influence of rainfall, but not temperature (see
credible intervals)
57Performing a logistic regression with WinBUGS the
results
- additional parameters as a by-product of the
MCMC samples just add them in the model as
parameters to be monitored - - geometric mean
- geom lt- pow(prod(p),1/N)
- - odds-ratio
- odds.rainfall lt- exp(slope.rainfall)
- odds.temperature lt- exp(slope.temperature)
58Performing a logistic regression with WinBUGS the
results
- additional parameters as a by-product of the
MCMC samples - - geom. mean probability of success around 82
8184 - - odds-ratio -16 for an increase of rainfall of
1 unit
59Running WinBUGS from R R2WinBUGS package The
logistic regression example revisited
- It may be uneasy to read complex sets of data
and initial values - It is also quite boring to specify the
parameters to be monitored in each run - It might be interesting to save the output and
read it into R for further analyses - solution 1 WinBUGS can be used in batch mode
using scripts - solution 2 R2WinBUGS allows WinBUGS to be run
from R - We can work with the results after importing
them back into R again - ? create graphical displays of data and posterior
simulations or use WinBUGS in Monte Carlo
simulation studies
60Running WinBUGS from R R2WinBUGS package The
logistic regression example revisited
- To call WinBUGS from R
- 1. Write a WinBUGS model in a ASCII file.
- 2. Go into R.
- 3. Prepare the inputs to the 'bugs' function and
run it - 4. A WinBUGS window will pop up amd R will
freeze up. The - model will now run in WinBUGS. You will see
things happening - in the Log window within WinBUGS. When
WinBugs is done, its - window will close and R will work again.
- If an error message appears, re-run with 'debug
TRUE'.
61Running WinBUGS from R R2WinBUGS package 1 -
Write the WinBUGS code in a ASCII file
This covers logistic regression model using two
explicative variables. The White storks in
Baden-Wurttemberg (Germany) data set is
provided model for( i in 1 N) A binomial
distribution as a likelihood nbsuccessi
dbin(pi,nbpairsi) The probability of
success is a function of both rainfall and
temperature logit(pi) lt- intercept
slope.temperature (temperaturei -
mean(temperature))/(sd(temperature))
slope.rainfall (rainfalli -
mean(rainfall))/(sd(rainfall)) priors
for regression parameters intercept
dnorm(0,0.001) slope.temperature
dnorm(0,0.001) slope.rainfall dnorm(0,0.001)
62Running WinBUGS from R R2WinBUGS package 3 -
Prepare the inputs to the 'bugs' function
Nbsuccess nbpairs temperature rainfall 151 173
15.1 67 105 164 13.3 52 73 103 15.3
88 107 113 13.3 61 113 122 14.6
32 87 112 15.6 36 53 58
14.0 48 31 39 13.9 90 35 42 12.9
86 14 23 15.1 78 18 23 13.0 87
63Running WinBUGS from R R2WinBUGS package The
logistic regression example revisited
Load R2WinBUGS package library(R2WinBUGS)
Data (R 'list' format) N 23 data
read.table("datalogistic.dat",headerT) attach(dat
a) datax list("N","nbsuccess","nbpairs","tempera
ture","rainfall") MCMC details nb.iterations
10000 nb.burnin 1000
64Running WinBUGS from R R2WinBUGS package The
logistic regression example revisited
Initial values init1 list(intercept-1,slope.t
emperature-1,slope.rainfall-1) init2
list(intercept0,slope.temperature0,slope.rainfal
l0) init3 list(intercept1,slope.temperature1,
slope.rainfall1) inits list(init1,init2,init3)
nb.chains length(inits) Parameters to be
monitored parameters lt- c("intercept","slope.tempe
rature","slope.rainfall") MCMC simulations
res.sim lt- bugs(datax, inits, parameters,
"logisticregressionstandardized.bug", n.chains
nb.chains, n.iter nb.iterations, n.burnin
nb.burnin) Save results save(res.sim, file
"logistic.Rdata")
65Running WinBUGS from R R2WinBUGS package The
logistic regression example revisited
Summarize results res.simsummary use
print(res.sim) alternatively mean
sd 50 97.5
Rhat Intercept 1.55122555 0.05396574
1.55100 1.6589750 1.0020641 slope.temperatur
e 0.03030854 0.06128879 0.03148
0.1510975 0.9997848 slope.rainfall
-0.15302837 0.06206554 -0.15295
-0.0243025 1.0014894 deviance
204.60259481 2.48898337 203.90000 211.2000000
1.0002280
66Running WinBUGS from R R2WinBUGS package The
logistic regression example revisited
Numerical summaries for slope.rainfall? quantile
(slope.rainfall,c(0.025,0.975)) 2.5
97.5 -0.2693375 -0.0243025 Calculate the
odds-ratio odds.rainfall lt- exp(slope.rainfall)
quantile(odds.rainfall,c(0.025,0.975))
2.5 97.5 0.7638855 0.9759904
67Running WinBUGS from R R2WinBUGS package The
logistic regression example revisited
Graphical summaries for slope.rainfall? plot(den
sity(slope.rainfall),xlab"",ylab"",
main"slope.rainfall a posteriori density")
68Recent developments in WinBUGS Model selection
using RJMCMC
- We consider data relating to population of White
storks breeding in Baden Württemberg (Germany). - Interest lies in the impact of climate variation
(rainfall) in their wintering area on their
population dynamics (adult survival rates). - Mark-recapture data from 1956-71 are available.
- The covariates relate to the amount of rainfall
between June-September each year from 10 weather
stations located around the Sahel region. - Interest lies in identifying the given rainfall
locations that explain the adult survival rates.
69Bayesian Model Selection
- Discriminating between different models can often
be of particular interest, since they represent
competing biological hypotheses. - How do we decide which covariates to use? often
there may be a large number of possible
covariates.
70Example (cont)
- We express the survival rate as a logistic
function of the covariates - logit ?t ? ?Txt ?t
- where xt denotes the set of covariate values at
time t and ?t are random effects, - ?t N(0,?2).
- However, which rainfalls explain the survival
rates for the adults? - Alternatively, what values of ? are non-zero?
71Model Selection
- In the classical framework, likelihood ratio
tests or information criterion (e.g. AIC) are
often used. - There is a similar Bayesian statistic the
DIC. - This is programmed within WinBUGS however its
implementation is not suitable for hierarchical
models (e.g. random effect models). - In addition, the DIC is known to give fallacious
results in even simple problems. - Within the general Bayesian framework, there is a
more natural way of dealing with the issue of
model discrimination.
72Bayesian Approach
- We treat the model itself as an unknown parameter
to be estimated. - Then, applying Bayes Theorem we obtain the
posterior distribution over both parameter and
model space - ?(?m, m data) Ç L(data ?m, m) p(?m) p(m).
- Here ?m denotes the parameters in model m.
73Posterior Model Probabilities
- The Bayesian approach then allows us to
quantitatively discriminate between competing
models via posterior model probabilities - ?(m data) s ?(?m, m data) d?m
- Ç p(m) s L(data ?m, m) p(?m) d?m
- Note that we need to specify priors on both the
parameters and now also on the models themselves. - Thus we need to specify a prior probability for
each model to be considered.
74MCMC-based estimates
- We have a posterior distribution (over parameter
and model space) defined up to proportionality - ?(?m, m data) Ç L(data ?m, m) p(?m m) p(m)
- If we can sample from this posterior distribution
then we are able to obtain posterior estimates of
summary statistics of interest. - In particular the posterior model probabilities
can be estimated as the proportion of time that
the chain is in each model. - So, all we need to do is define how we construct
such a Markov chain!!
75Reversible Jump MCMC
- The reversible jump MCMC algorithm allows us to
construct a Markov chain with stationary
distribution equal to the posterior distribution. - It is simply an extension of the
Metropolis-Hastings algorithm that allows moves
between different dimensions. - This algorithm is needed because the number of
parameters, ?m, in model m, may differ between
models. - Note that this algorithm needs only one Markov
chain irrespective of the number of models!!
76Markov chain
- Each iteration of the Markov chain essentially
involves two steps - Within model moves updating each parameter
using standard MCMC moves (Gibbs sampler,
Metropolis-Hastings) - Between model moves updating the model using a
reversible jump type move. - Then, standard MCMC-type processes apply, such as
using an appropriate burn-in, obtaining summary
statistics of interest etc. - To illustrate the RJ algorithm, we consider a
particular example relating to variable selection
(e.g. covariate analysis).
77WinBUGS
- General RJ updates cannot currently be programmed
into WinBUGS. - Bespoke code needs to be written instead.
- However, the recent add-on called jump allows
two particular RJ updates to be performed in
WinBUGS - Variable selection
- Splines
- See http//www.winbugs-development.org.uk/rjmcmc.h
tml - So, in particular, WinBUGS can be used for model
selection in the White storks example.
78Example White Storks
- Recall our white storks example there are 10
possible covariates, hence a total of 210
possible models (1024!). - We specify a prior probability of 0.5 that each
covariate is in the model. - Then, conditional on the covariate being present,
we specify, - ? N(0,10).
- Finally, for the random effect error variance,
- ?2 ?-1(0.001,0.001).
79Example White Storks
- WinBUGS demonstration.
- RJMCMC is performed on the betas only, neither
on the ? nor on the intercept ?
80Results
Models with largest posterior support
0001000000 0.5178058297 0.5178058297 0000000000
0.07629791141 0.5941037411 0001010000 0.0474087675
0.6415125086 0001100000 0.03814092265 0.679653431
3 0001000001 0.03085379849 0.7105072297 1001000000
0.02549001607 0.7359972458 0001000100 0.023745696
58 0.7597429424 0001001000 0.02336699564 0.7831099
38 0001000010 0.0229240303 0.8060339683 0000000100
0.02123479458 0.8272687629 0101000000 0.018099609
82 0.8453683727 0011000000 0.01540739041 0.8607757
631 1000000000 0.01186825798 0.8726440211 00000100
00 0.01103282075 0.8836768419
81Results
Additionally the (marginal) posterior probability
that each covariate influence the survival rates
node mean sd posterior marg
prob effect1 0.01 0.04 0.06 effect2 0.00 0.0
3 0.04 effect3 0.00 0.02 0.03 effect4 0.30 0
.17 0.83 effect5 -0.01 0.04 0.07 effect6 0.0
1 0.05 0.09 effect7 -0.00 0.03 0.04 effect8
0.01 0.06 0.07 effect9 0.01 0.04 0.05 effect10
-0.01 0.04 0.05 p 0.91 0.01 sdeps 0.20 0.14
82Results survival rates