Title: Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review
1Markov Chain Monte Carlo Convergence Diagnostics
A Comparative Review
- By Mary Kathryn Cowles and Bradley P. Carlin
- Presented by Yuting Qi
- 12/01/2006
2OUTLINE
- MCMC Convergence Diagnostics
- Introduce 4 Methods in details
- Focus on
- Prescriptive summary
- Underlying theoretical basis
- Advantages and disadvantages
- Comparative results
31. Gelman and Rubin (1992) 1/4
- What ?
- Based on normal theory approximation to exact
Bayesian posterior inference - Focus on applied inference for Bayesian posterior
distributions in real problem, which often tend
toward normality after transformations and
marginalization. - Two major steps
- Create an overdispersed estimate of the target
distribution and use it to start several
independent sequences. - Analyze the multiple sequences to form a
distributional estimate of what is known about
the target r.v. given the simulations so far.
The distributional estimate is a Students t
distribution of each scalar quantity of interest. - Convergence
- Convergence is monitored by estimating the factor
by which the scale parameter might shrink for
infinite sampling.
41. Gelman and Rubin (1992) 2/4
- How ?
- Step 1 Creating a starting distribution
- Locate the high-density regions of the target
distribution of x and find the K modes. - Approximate the high-density regions by a GMM
- Form an overdispersed distribution by first
drawing from the GMM and then dividing each
sample by a positive number, which results in a
mixture t distributions - Sharpen the overdispersed approximation by
downweighting regions that have relatively low
density through importance resampling for example.
51. Gelman and Rubin (1992) 3/4
- Step 2 Re-estimating the target distributions
- Independently simulate m sequences of length 2n
from the overdispersed distribution and discard
the first n iterations. - For each scalar parameter of interest, estimate
the following quantity from the last n iterations
of m sequences - B the variance between the means from m
sequences - W the average of the m within-sequence
variances - estimate of target mean mean of mn samples
- estimate of target variance (unbiased)
- Estimate the posterior of target distribution as
a t distribution (considering variability of the
estimates and ) with center and
scale . - Monitor the convergence by shrink factor
, as it near 1 for all scalars,
collect burn-out samples.
61. Gelman and Rubin (1992) 4/4
- Comments
- approaches to 1 within-sequences variance
dominant between-sequences variance, all
sequences escaped the influence of starting
points and traverse all target distributions. - Quantitative.
- Criticisms
- Rely on the users ability to find a start
distribution. - Rely on normal approximation for diagnosing
convergence to the true posterior. - Inefficient, multiple sequences and discard a
large number of early iterations.
72. Geweke (1992) 1/3
- What ?
- Use methods from spectral analysis to assess
convergence and the intent is to estimate the
mean Eg(?) of some function g(?) of interest. - Collect g(? (j)) after each iteration
- Treat g(? (j))j1,p as time series and compute
spectral density SG(?). - Use numerical standard error (NSE) and relative
numerical efficiency (RNE) to monitor
convergence. - Assumption
- The MCMC process and the importance function g(?)
, jointly imply the existence of a spectrum, and
the existence of a spectral density with no
discontinuities at the frequency 0.
82. Geweke (1992) 2/3
- How ?
- Estimate Eg(?) from p iterations
- Asymptotically estimator
- Asymptotic variance
- Determine preliminary iterations
- Given the sequence G(j)j1,p, if G(j) is
stationary, as p-gtinf - Determine sufficient iterations
- Numerical standard error (NSE)
- Relative numerical efficiency (RNE)
0
Indicating the number of draws wound be required
to produce the same numerical accuracy if the
draws had been made from an iid sample drawn
directly from the posterior distribution.
92. Geweke (1992) 3/3
- Comments
- Address the issues of both bias and variance.
- Is univariate.
- Require a single sampler chain.
- Disadvantages
- Is sensitive to the spectral window.
- Not specify a procedure for applying the
diagnostic but leave to the subjective choice of
the users.
103. Ritter and Tanner (1992) 1/3
- The Gibbs Stopper
- Convert the output of the Gibbs sampler to a
sample from the exact distribution. - Assign a weight w to the d-dimensional vector X
drawn from the current iteration -
- q is a function proportional to the joint
distribution - gi is the current Gibbs sampler approximation.
- Assess the convergence
- If the current approximation to the joint
distribution is close to the true one, then the
distribution of the weights will be degenerate
about a constant.
113. Ritter and Tanner (1992) 2/3
- Compute gi
- Let
- The joint distribution of the samples obtained at
iteration i1 is - gi1(X)
- The integration can be approximated by Monte
Carlo method - gi1(X) ?
- X1, , Xm are samples drawn at
iteration i.
Probability of moving from X (at iteration i)
to X at iteration i1.
123. Ritter and Tanner (1992) 3/3
- Comments
- Assess distributional convergence
- Disadvantages
- Applicable only with the Gibbs sampler
- Coding is problem-specific
- Computation of weights can be time-intensive
- If full conditionals are not standard
distributions, we must estimate the normalizing
constants.
134. Zellner and Min (1995) 1/3
- Gibbs Sampler Convergence Criteria (GSC2)
- Aim to determine whether the Gibbs sampler not
only has converged, but also has converged to a
correct result. - Divide the model parameters into two parts ?, ?
- Derive analytical forms for
-
-
- Three convergence criterions
- Assume (?1, ?1) and (?2, ?2) are two points
in the parameter space
prior
likelihood
144. Zellner and Min (1995) 2/3
- 1. The anchored ratio convergence criterion
(ARC2) - Calculate
- If the Gibbs sampler output is satisfactory,
then - and will be close to .
- 2. The difference convergence criterion (DC2)
- Since
- If -gt0, then satisfactory
- 3. The ratio convergence criterion (RC2)
- If -gt1, then satisfactory
154. Zellner and Min (1995) 3/3
- Comments
- Quantitative
- Require a single sampler chain
- Coding is problem-specific and analytical work is
needed - Disadvantage
- Application is limited when the factorization
cannot be achieved.
16Comparative results 1/3
- Trivariate Normal with high correlations
- Run the samplers for relatively few iterations to
test these methods detect convergence failure or
ambiguity.
17Comparative results 2/3
- 1. Gelman Rubin shrink factors (-gt1)
- 2. Geweke NSE (-gt0)
18Comparative results 3/4
- Ritter Tanner Gibbs stopper (weights w -gt
constant)
19Comparative results 4/4
- Zellner Min Difference convergence Criterion
( -gt 0)
20Comparative results 5/5
- Remarks
- Gewekes diagnostic appears to be premature
- Gelman Rubins method may be consistent with
the fact however choosing the starting points is
critical - The results of other methods are difficult to
interpret.
21Summary, Discussion, and Recommendation
- Be cautious when using these diagnostics
- Use a variety of diagnostic tools rather than any
single one - Learn as much as possible about the target
density before applying MCMC algorithm