Title: Dynamic Programming
1Dynamic Programming Hidden Markov Models.
- Alan Yuille
- Dept. Statistics UCLA
2Goal of this Talk
1. Chair
- This talk introduces one of the major algorithms
dynamic programming (DP). - Then describe how it can be used in conjunction
with EM for learning.
3Dynamic Programming
- Dynamic Programming exploits the graphical
structure of the probability distribution. It can
be applied to any structure without closed loops. - Consider the two-headed coin example given in Tom
Griffiths talk (Monday).
4Probabilistic Grammars
- By the Markov Condition
- Hence we can exploit the graphical structure to
efficiently compute
The structure means that the sum over x2 drops
out. We need only sum over x1 and x3. Only four
operations instead of eight.
5Dynamic Programming Intuition
- Suppose you wish to travel to Boston from Los
Angeles by car. - To determine the cost of going via Chicago you
only need to calculate the shortest cost from Los
Angeles to Chicago and then, independently, the
shortest cost from Chicago to Boston. - Decomposing the route in this way gives an
efficient algorithm which is polynomial in the
number of nodes and feasible for computation.
6Dynamic Programming Diamond
- Compute the shortest cost from A to B.
7Application to a 1-dim chain.
- Consider a distribution defined on a 1-dim chain.
- Important property directed and undirected
graphs are equivalent (for 1-dim chain). - P(A,B) P(AB) P(B)
- or P(A,B) P(BA) P(A)
- For these simple graphs with two nodes -- you
cannot distinguish causation from correlation
without intervention (Wus lecture Friday). - For this lecture we will treat a simple
one-dimensional cover directed and undirected
models simultaneously. (Translating between
directed and undirected is generally possible for
graphs without closed loops but has subtleties).
8Probability distribution on 1-D chain
91-D Chain.
101-Dim Chain
111-Dim Chain
- We can also use DP to compute other properties
e.g. to convert the distribution from undirected
form - To directed form
121-Dim Chain
13Special Case 1-D Ising Spin Model
14Dynamic Programming Summary
- Dynamic Programming can be applied to perform
inference on all graphical models defined on
trees The key insight is that, for trees, we can
define an order on the nodes (not necessarily
unique) and process nodes in sequence (never
needing to return to a node that have already
been processed).
15Extensions of Dynamic Programming
- What to do if you have a graph with closed loops?
- There are a variety of advanced ways to exploit
the graphical structure and obtain efficient
exact algorithms. - Prof. Adnan Darwiche (CS, UCLA) is an expert on
this topic. There will be an introduction to his
SamIam code. - Also can use approximate methods like BP.
16Junction Trees.
- It is also possible to take a probability
distribution defined on a graph with closed loops
and reformulate it as a distribution on a new
nodes without closed loops. (Lauritzen and
Spiegelhalter 1990). - This lead to a variety of algorithm generally
known as junction trees. - This is not a universal solution because the
resulting new graphs may have too many nodes to
make them practical. - Google junction trees to find nice tutorials on
junction trees.
17Graph Conversion
- Convert graph by a set of transformations.
18Triangles Augmented Variables
- From triangles to ordered triangles.
Original Variables Loops
Augmented Variables No Loops
19Summary of Dynamic Programming.
- Dynamic Programming can be used to efficiently
compute properties of a distribution for graphs
defined on trees. - Directed graphs on trees can be reformulated as
undirected graphs on trees, and vice versa. - DP can be extended to apply to graphs with closed
loops by restructuring the graphs (junction
trees). - It is an active research area to determine
efficient inference algorithms which exploit the
graphical structures of these models. - Relationship between DP and reinforcement
learning (week 2). - DP and A. DP and pruning.
20HMMs Learning and Inference
- So far we have considered inference only.
- This assumes that the model is known.
- How can we learn the model?
- For 1D models -- this uses DP and EM.
21A simple HMM for Coin Tossing
- Two coins, one biased and the other fair, with
the coins switched occasionally. - The observable 0,1 is whether the coin is head or
tails. - The hidden state A,B is which coin is used.
- There are unknown transition probabilities
between the hidden states A and B, and unknown
probabilities for the observations conditioned on
the hidden states. - The learning task is to estimate these
probabilities from a sequence of measurements.
22HMM for Speech
23HMM for Speech
24HMM Summary
- HMM define a class of markov models with hidden
variables. Used for speech recognition, and many
other applications. - Tasks involving HMMs involve learning,
inference, and model selection. - These can often be performed by algorithms based
on EM and DP.