Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review - PowerPoint PPT Presentation

About This Presentation

Title:

Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review

Description:

Based on normal theory approximation to exact Bayesian ... Step 2: Re-estimating the ... as it near 1 for all scalars, collect burn-out samples. ... – PowerPoint PPT presentation

Number of Views:132

Avg rating:3.0/5.0

Slides: 22

Provided by: Yutin9

Learn more at: https://people.ee.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review

1
Markov Chain Monte Carlo Convergence Diagnostics
A Comparative Review

By Mary Kathryn Cowles and Bradley P. Carlin
Presented by Yuting Qi
12/01/2006

2
OUTLINE

MCMC Convergence Diagnostics
Introduce 4 Methods in details
Focus on
Prescriptive summary
Underlying theoretical basis
Advantages and disadvantages
Comparative results

3
1. Gelman and Rubin (1992) 1/4

What ?
Based on normal theory approximation to exact
Bayesian posterior inference
Focus on applied inference for Bayesian posterior
distributions in real problem, which often tend
toward normality after transformations and
marginalization.
Two major steps
Create an overdispersed estimate of the target
distribution and use it to start several
independent sequences.
Analyze the multiple sequences to form a
distributional estimate of what is known about
the target r.v. given the simulations so far.
The distributional estimate is a Students t
distribution of each scalar quantity of interest.
Convergence
Convergence is monitored by estimating the factor
by which the scale parameter might shrink for
infinite sampling.

4
1. Gelman and Rubin (1992) 2/4

How ?
Step 1 Creating a starting distribution
Locate the high-density regions of the target
distribution of x and find the K modes.
Approximate the high-density regions by a GMM
Form an overdispersed distribution by first
drawing from the GMM and then dividing each
sample by a positive number, which results in a
mixture t distributions
Sharpen the overdispersed approximation by
downweighting regions that have relatively low
density through importance resampling for example.

5
1. Gelman and Rubin (1992) 3/4

Step 2 Re-estimating the target distributions
Independently simulate m sequences of length 2n
from the overdispersed distribution and discard
the first n iterations.
For each scalar parameter of interest, estimate
the following quantity from the last n iterations
of m sequences
B the variance between the means from m
sequences
W the average of the m within-sequence
variances
estimate of target mean mean of mn samples
estimate of target variance (unbiased)
Estimate the posterior of target distribution as
a t distribution (considering variability of the
estimates and ) with center and
scale .
Monitor the convergence by shrink factor
, as it near 1 for all scalars,
collect burn-out samples.

6
1. Gelman and Rubin (1992) 4/4

Comments
approaches to 1 within-sequences variance
dominant between-sequences variance, all
sequences escaped the influence of starting
points and traverse all target distributions.
Quantitative.
Criticisms
Rely on the users ability to find a start
distribution.
Rely on normal approximation for diagnosing
convergence to the true posterior.
Inefficient, multiple sequences and discard a
large number of early iterations.

7
2. Geweke (1992) 1/3

What ?
Use methods from spectral analysis to assess
convergence and the intent is to estimate the
mean Eg(?) of some function g(?) of interest.
Collect g(? (j)) after each iteration
Treat g(? (j))j1,p as time series and compute
spectral density SG(?).
Use numerical standard error (NSE) and relative
numerical efficiency (RNE) to monitor
convergence.
Assumption
The MCMC process and the importance function g(?)
, jointly imply the existence of a spectrum, and
the existence of a spectral density with no
discontinuities at the frequency 0.

8
2. Geweke (1992) 2/3

How ?
Estimate Eg(?) from p iterations
Asymptotically estimator
Asymptotic variance
Determine preliminary iterations
Given the sequence G(j)j1,p, if G(j) is
stationary, as p-gtinf
Determine sufficient iterations
Numerical standard error (NSE)
Relative numerical efficiency (RNE)

0
Indicating the number of draws wound be required
to produce the same numerical accuracy if the
draws had been made from an iid sample drawn
directly from the posterior distribution.
9
2. Geweke (1992) 3/3

Comments
Address the issues of both bias and variance.
Is univariate.
Require a single sampler chain.
Disadvantages
Is sensitive to the spectral window.
Not specify a procedure for applying the
diagnostic but leave to the subjective choice of
the users.

10
3. Ritter and Tanner (1992) 1/3

The Gibbs Stopper
Convert the output of the Gibbs sampler to a
sample from the exact distribution.
Assign a weight w to the d-dimensional vector X
drawn from the current iteration
q is a function proportional to the joint
distribution
gi is the current Gibbs sampler approximation.
Assess the convergence
If the current approximation to the joint
distribution is close to the true one, then the
distribution of the weights will be degenerate
about a constant.

11
3. Ritter and Tanner (1992) 2/3

Compute gi
Let
The joint distribution of the samples obtained at
iteration i1 is
gi1(X)
The integration can be approximated by Monte
Carlo method
gi1(X) ?
X1, , Xm are samples drawn at
iteration i.

Probability of moving from X (at iteration i)
to X at iteration i1.
12
3. Ritter and Tanner (1992) 3/3

Comments
Assess distributional convergence
Disadvantages
Applicable only with the Gibbs sampler
Coding is problem-specific
Computation of weights can be time-intensive
If full conditionals are not standard
distributions, we must estimate the normalizing
constants.

13
4. Zellner and Min (1995) 1/3

Gibbs Sampler Convergence Criteria (GSC2)
Aim to determine whether the Gibbs sampler not
only has converged, but also has converged to a
correct result.
Divide the model parameters into two parts ?, ?
Derive analytical forms for
Three convergence criterions
Assume (?1, ?1) and (?2, ?2) are two points
in the parameter space

prior
likelihood
14
4. Zellner and Min (1995) 2/3

1. The anchored ratio convergence criterion
(ARC2)
Calculate
If the Gibbs sampler output is satisfactory,
then
and will be close to .
2. The difference convergence criterion (DC2)
Since
If -gt0, then satisfactory
3. The ratio convergence criterion (RC2)
If -gt1, then satisfactory

15
4. Zellner and Min (1995) 3/3

Comments
Quantitative
Require a single sampler chain
Coding is problem-specific and analytical work is
needed
Disadvantage
Application is limited when the factorization
cannot be achieved.

16
Comparative results 1/3

Trivariate Normal with high correlations
Run the samplers for relatively few iterations to
test these methods detect convergence failure or
ambiguity.

17
Comparative results 2/3

1. Gelman Rubin shrink factors (-gt1)
2. Geweke NSE (-gt0)

18
Comparative results 3/4

Ritter Tanner Gibbs stopper (weights w -gt
constant)

19
Comparative results 4/4

Zellner Min Difference convergence Criterion
( -gt 0)

20
Comparative results 5/5

Remarks
Gewekes diagnostic appears to be premature
Gelman Rubins method may be consistent with
the fact however choosing the starting points is
critical
The results of other methods are difficult to
interpret.

21
Summary, Discussion, and Recommendation

Be cautious when using these diagnostics
Use a variety of diagnostic tools rather than any
single one
Learn as much as possible about the target
density before applying MCMC algorithm

Write a Comment

User Comments (0)