Review of Bayesian Analysis - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Review of Bayesian Analysis

Description:

Calculated directly from the distribution of parameters. ... Errors in calculation or programming. ... of m parameters and calculate the following statistics: ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 15
Provided by: fsu
Category:

less

Transcript and Presenter's Notes

Title: Review of Bayesian Analysis


1
Review of Bayesian Analysis and Modeling

James B. Elsner Department of Geography,
Florida State University
2
  • Bayesian analysis and modeling using MCMC is good
    for you because
  • It has a solid decision-theoretical framework.
    Output from a probability model can be linked
    naturally to a utility model.
  • It is intuitive. Combines the prior
    distribution (prior beliefs and/or experience)
    with the likelihood (experiment) to obtain the
    posterior distribution (accumulated information).
    In this context, knowledge is equivalent to a
    distribution, and knowledge precision can be
    quantified by the precision parameter.
  • It is exact. Asymptotic approximations are not
    used. The plug-in principle is avoided.
  • Computations are tractable for practically all
    models.
  • Focus shifts from model estimation to model
    correctness.

3
http//garnet.fsu.edu/jelsner/www/research.html
Anticipating Florida Hurricanes, November 2004.
4
Frequentist
Bayesian
Definition of probability Point
estimate Confidence intervals of
parameters Confidence intervals of
non-parameters Model selection Difficulties
Long-run expected frequency in repeated (actual
or hypothetical) experiments. Maximum likelihood
estimate. Based on the likelihood ratio test,
which is based on the expected probability
distribution of the maximum likelihood over many
experiments. Based on likelihood profile, or by
resampling from the sampling distribution of the
parameter. Discard terms that are not
significantly different from a null model
(nested) at a previously set confidence
level. CI are confusing (range which will
contain true value in proportion a of repeated
experiments) rejection of model terms for
non-significance.
Relative degree of belief in the state of the
world. Mean, mode, or median of the posterior
probability distribution. Credible intervals
based on the posterior probability
distribution. Calculated directly from the
distribution of parameters. Retain terms in
models on the argument that processes are not
absent simply because they are not statistically
significant. Subjectivity need to specify
priors.
5
Why use Bayesian analysis and models? When you
have good (informative) prior information that
you want to use (from previous experiments, or
reliable information from literature).
Frequentist meta-analysis can also be used in
this case, but the Bayesian approach is
natural. When you want to specify the
distribution of non-parameters (future values),
Bayesian models work well. In particular, if you
are going to try to make a practical decision
based on your model (using decision theory), for
example picking a management regime that
minimizes expected risks or maximizes expected
returns, having the posterior distribution of all
the parameters makes the decision calculations
easier. Complicated models such as hierarchical
models or models with missing or unobserved data
can be particularly conveniently fit using
Bayesian methods. Frequentists call these
random-effect or mixed models and have their own
estimation procedures.
6
How Gibbs Sampling Works
The contours in the plot represent the joint
distribution of q and the labels (0), (1) etc.,
denote the simulated values. Note that one
iteration of the algorithm is complete after both
components are revised. Also notice that each
component is revised along the direction of the
coordinate axes. This feature can be a source of
problems if the two components are correlated
because then the contours get compressed and
movements along the coordinate axes tend to
produce small moves.
q2
(3)
Gibbs sampling algorithm in two dimensions
starting from an initial point and then
completing three iterations.
(2)
(1)
(0)
q1
7
  • After the model has converged, samples from the
    conditional distributions are used to summarize
    the posterior distribution of parameters of
    interest.
  • Convergence refers to the idea that eventually
    the Gibbs samples will reach a stationary
    distribution.
  • The question is
  • 1) How many samples are needed before
    convergence? The samples discarded before
    convergence are called burn-in.
  • 2) After convergence, how many samples are
    needed to summarize the posterior distribution?
  • Answers vary from model to model and diagnostics
    are used to examine the samples for convergence.
  • Possible convergence problems.
  • The assumed model may not be realistic from a
    substantive point of view or may not fit.
  • Errors in calculation or programming. Often,
    simple syntax mistakes may be responsible
    however, it is possible that the algorithm may
    not converge to a proper distribution.
  • Slow convergence this is the problem we are
    most likely to run into. The simulation can
    remain for many iterations in a region heavily
    influenced by the starting distribution. If the
    iterations are used to summarize the target
    distribution, they can yield falsely precise
    estimates.

8
  • Diagnostics for examining convergence.
  • One intuitive and easily implemented diagnostic
    tool is a trace plot (or history plot) which
    plots the parameter value as a function of sample
    number. If the model has converged, the trace
    plot will move up and down around the mode of the
    distribution. A clear sign of non-convergence
    occurs when we observe some trending in the trace
    plot. In WinBugs, you can setup trace plots to
    monitor parameters while the program runs.

Trace plots
convergence
non-convergence
9
An autocorrelation plot will reveal whether or
not there is correlation between successive
samples. The presence of correlation indicates
that the samples are not effective in moving
through the entire posterior distribution.
WinBUGS plots the autocorrelation out to 50 lags.
No large autocorrelation
Autocorrelation
10
  • If the model has converged, additional samples
    from a parameters posterior distribution should
    not influence the calculation of the mean.
    Running means of the samples will reveal if the
    posterior mean has settled to a particular value.
  • A lumpy posterior may indicate non-convergence.
    It may be necessary to let the sampler generate
    additional samples.

Geweke Time-Series Diagnostic If a model has
converged, then the series of samples from the
first half of the chain will have the same
time-series properties of the series of samples
from the second half of the chain.
11
Gelman-Rubin Diagnostic Another way to identify
convergence is to simulate multiple sequences
from different starting points. The behavior of
the sequence of chains should be the same. That
is, the variance within the chains should be the
same as the variance across the chains.
convergence
non-convergence
12
  • If more than 1 chain is initialized, the
    Gelman-Rubin statistic is reported in WinBugs.
  • The statistic is based on the following
    procedure
  • 1) estimate your model with a variety of
    different initial values and iterate for an
    n-iteration burn-in and an n-iteration monitored
    period.
  • 2) take the n-monitored draws of m parameters
    and calculate the following statistics

Before convergence, W underestimates total
posterior variance in?? because it has not fully
explored the target distribution. V(?) on the
other hand overestimates variance in ? because
the starting points are over-dispersed relative
to the target.
13
Once convergence is reached, W and V(?) should be
almost equivalent because variation within the
chains and variations between the chains should
coincide, so R should approximately equal one.
The normalized width of the central 80 interval
of the pooled runs is green The normalized
average width of the 80 intervals within the
individual runs is blue R is red. R would
generally be expected to be greater than 1 if the
starting values are suitably over-dispersed.
Brooks and Gelman (1998) emphasize that one
should be concerned both with convergence of R to
1, and with convergence of both the pooled and
within interval widths to stability.
14
Speeding up the sampling Convergence sometimes
requires generating 10s or 100s of thousands of
samples. If this is the case, it is important to
speed up the sampling. Standardize your input
variables. Subtract the mean and divide by the
standard deviation. This decreases the
correlation between model variables which can
decrease the autocorrelation between samples. Use
WinBugs over-relaxation algorithm. This
generates multiple samples at each iteration and
then selects one that is negatively correlated
with the current value. The time per iteration
will increase, but the within-chain correlations
should be reduced and hence fewer iterations may
be necessary. However, this method is not always
effective and should be used with caution. The
autocorrelation function should be used to check
whether the mixing of the chain improves. Pick
good initial values. If your initial values are
near their posterior modes, then convergence
should occur relatively quickly, especially if
there is not a big problem with
autocorrelation. Just wait. With models involving
many parameters it may take 100s of thousands of
samples. Using the thin option, you do not
need to save all the sample values for all the
parameters.
Write a Comment
User Comments (0)
About PowerShow.com