Markov Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Markov Networks

Description:

... And Graph encodes conditional independences Then Distribution is product of potentials over cliques of graph Inverse is also true. ( Markov ... Knowledge Author ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 13
Provided by: MattRic3
Category:

less

Transcript and Presenter's Notes

Title: Markov Networks


1
Markov Networks
2
Markov Networks
  • Undirected graphical models

B
A
D
C
  • Potential functions defined over cliques

3
Markov Networks
  • Undirected graphical models

B
A
D
C
  • Potential functions defined over cliques

Weight of Feature i
Feature i
4
Hammersley-Clifford Theorem
  • If Distribution is strictly positive (P(x) gt 0)
  • And Graph encodes conditional independences
  • Then Distribution is product of potentials over
    cliques of graph
  • Inverse is also true.
  • (Markov network Gibbs distribution)

5
Markov Nets vs. Bayes Nets
Property Markov Nets Bayes Nets
Form Prod. potentials Prod. potentials
Potentials Arbitrary Cond. probabilities
Cycles Allowed Forbidden
Partition func. Z ? Z 1
Indep. check Graph separation D-separation
Indep. props. Some Some
Inference MCMC, BP, etc. Convert to Markov
6
Inference in Markov Networks
  • Goal compute marginals conditionals of
  • Exact inference is P-complete
  • Conditioning on Markov blanket is easy
  • Gibbs sampling exploits this

7
Markov Chain Monte Carlo
  • Gibbs Sampler
  • 1. Start with an initial assignment to nodes
  • 2. One node at a time, sample node given
    others
  • 3. Repeat
  • 4. Use samples to compute P(X)
  • Convergence Burn-in Mixing time
  • Many modes ? Multiple chains

8
Other Inference Methods
  • Belief propagation (sum-product)
  • Mean field / Variational approximations

9
MAP Inference
  • Iterated conditional modes
  • Simulated annealing
  • Graph cuts
  • Belief propagation (max-product)

10
Learning Weights
  • Maximize likelihood (or posterior)
  • Convex optimization gradient ascent,
    quasi-Newton methods, etc.
  • Requires inference at each step (slow!)

Feature count according to data
Feature count according to model
11
Pseudo-Likelihood
  • Likelihood of each variable given its Markov
    blanket in the data
  • Does not require inference at each step
  • Very fast numerical optimization
  • Ignores non-local dependences

12
Learning Structure
  • Feature search
  • 1. Start with atomic features
  • 2. Form conjunctions of pairs of features
  • 3. Select best and add to feature set
  • 4. Repeat until no improvement
  • Evaluation
  • Likelihood, K-L divergence
  • Approximation Previous weights dont change
Write a Comment
User Comments (0)
About PowerShow.com