Inference V: MCMC Methods - PowerPoint PPT Presentation

About This Presentation

Inference V: MCMC Methods


... methods that are based on Markov Chain Markov Chain Monte Carlo ... methods that are based on Markov Chain Markov Chain Monte Carlo (MCMC) methods ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 22
Provided by: NirFri5


Transcript and Presenter's Notes

Title: Inference V: MCMC Methods

Inference VMCMC Methods
Stochastic Sampling
  • In previous class, we examined methods that use
    independent samples to estimate P(X x e )
  • Problem It is difficult to sample from P(X1, .
    Xn e )
  • We had to use likelihood weighting to reweigh our
  • This introduced bias in estimation
  • In some case, such as when the evidence is on
    leaves, these methods are inefficient

MCMC Methods
  • We are going to discuss sampling methods that are
    based on Markov Chain
  • Markov Chain Monte Carlo (MCMC) methods
  • Key ideas
  • Sampling process as a Markov Chain
  • Next sample depends on the previous one
  • These will approximate any posterior distribution
  • We start by reviewing key ideas from the theory
    of Markov chains

Markov Chains
  • Suppose X1, X2, take some set of values
  • wlog. These values are 1, 2, ...
  • A Markov chain is a process that corresponds to
    the network
  • To quantify the chain, we need to specify
  • Initial probability P(X1)
  • Transition probability P(Xt1Xt)
  • A Markov chain has stationary transition
  • P(Xt1Xt) is the same for all times t

Irreducible Chains
  • A state j is accessible from state i if there is
    an n such that P(Xn j X1 i) gt 0
  • There is a positive probability of reaching j
    from i after some number steps
  • A chain is irreducible if every state is
    accessible from every state

Ergodic Chains
  • A state is positively recurrent if there is a
    finite expected time to get back to state i after
    being in state i
  • If X has finite number of states, then this is
    suffices that i is accessible from itself
  • A chain is ergodic if it is irreducible and every
    state is positively recurrent

(A)periodic Chains
  • A state i is periodic if there is an integer d
    such thatP(Xn i X1 i ) 0 when n is not
    divisible by d
  • A chain is aperiodic if it contains no periodic

Stationary Probabilities
  • Thm
  • If a chain is ergodic and aperiodic, then the
    limitexists, and does not depend on i
  • Moreover, letthen, P(X) is the unique
    probability satisfying

Stationary Probabilities
  • The probability P(X) is the stationary
    probability of the process
  • Regardless of the starting point, the process
    will converge to this probability
  • The rate of convergence depends on properties of
    the transition probability

Sampling from the stationary probability
  • This theory suggests how to sample from the
    stationary probability
  • Set X1 i, for some random/arbitrary i
  • For t 1, 2, , n
  • Sample a value xt1 for Xt1 from P(Xt1Xtxt)
  • return xn
  • If n is large enough, then this is a sample from

Designing Markov Chains
  • How do we construct the right chain to sample
  • Ensuring aperiodicity and irreducibility is
    usually easy
  • Problem is ensuring the desired stationary

Designing Markov Chains
  • Key tool
  • If the transition probability satisfiesthen,
    P(X) Q(X)
  • This gives a local criteria for checking that the
    chain will have the right stationary distribution

MCMC Methods
  • We can use these results to sample from
  • Idea
  • Construct an ergodic aperiodic Markov Chain
    such that P(X1,,Xn) P(X1,,Xne)
  • Simulate the chain n steps to get a sample

MCMC Methods
  • Notes
  • The Markov chain variable Y takes as value
    assignments to all variables that are consistent
  • For simplicity, we will denote such a state using
    the vector of variables

Gibbs Sampler
  • One of the simplest MCMC method
  • At each transition change the state of just on Xi
  • We can describe the transition probability as a
    stochastic procedure
  • Input a state x1,,xn
  • Choose i at random (using uniform probability)
  • Sample xi from P(Xix1, , xi-1, xi1 ,, xn,
  • let xj xj for all j ? i
  • return x1,,xn

Correctness of Gibbs Sampler
  • By chain rule
  • P(x1, , xi-1, xi, xi1 ,, xne) P(x1, ,
    xi-1, xi1 ,, xne)P(xix1, , xi-1, xi1 ,,
    xn, e)
  • Thus, we get
  • Since we choose i from the same distribution at
    each stage, this procedure satisfies the ratio

Gibbs Sampling for Bayesian Network
  • Why is the Gibbs sampler easy in BNs?
  • Recall that the Markov blanket of a variable
    separates it from the other variables in the
  • P(Xi X1,,Xi-1,Xi1,,Xn) P(Xi Mbi )
  • This property allows us to use local computations
    to perform sampling in each transition

Gibbs Sampling in Bayesian Networks
  • How do we evaluate P(Xi x1,,xi-1,xi1,,xn) ?
  • Let Y1, , Yk be the children of Xi
  • By definition of Mbi, the parents of Yj are in
  • It is easy to show that

Sampling Strategy
  • How do we collect the samples?
  • Strategy I
  • Run the chain M times, each run for N steps
  • each run starts from a different state points
  • Return the last state in each run

M chains
Sampling Strategy
  • Strategy II
  • Run one chain for a long time
  • After some burn in period, sample points every
    some fixed number of steps

burn in
M samples from one chain
Comparing Strategies
  • Strategy I
  • Better chance of covering the space of
    pointsespecially if the chain is slow to reach
  • Have to perform burn in steps for each chain
  • Strategy II
  • Perform burn in only once
  • Samples might be correlated (although only
  • Hybrid strategy
  • run several chains, and sample few samples from
  • Combines benefits of both strategies
Write a Comment
User Comments (0)