PGM 2003/04 Tirgul 2 Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation

Title:

PGM 2003/04 Tirgul 2 Hidden Markov Models

Description:

PGM 2003/04 Tirgul 2 Hidden Markov Models Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models, although they ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 26

Provided by: NirFri5

Category:

more less

Transcript and Presenter's Notes

Title: PGM 2003/04 Tirgul 2 Hidden Markov Models

1
PGM 2003/04 Tirgul 2Hidden Markov Models
2
Introduction
Hidden Markov Models (HMM) are one of the most
common form of probabilistic graphical models,
although they were developed long before the
notion of general models existed (1913). They are
used to model time-invariant and limited horizon
models that have both an underlying mechanism
(hidden states) and an observable consequence.
They have been extremely successful in language
modeling and speech recognition systems and are
still the most widely used technique in these
domains.
3
Markov Models
A Markov process or model assumes that we can
predict the future based just on the present (or
on a limited horizon into the past) Let
X1,,XT be a sequence of random variables
taking values 1,,N then the Markov properties
are Limited Horizon Time invariant
(stationary)
4
Describing a Markov Chain
A Markov chain can be described by the transition
matrix A and initial state probabilities Q or
alternatively and we calculate
5
Hidden Markov Models
In a Hidden Markov Model (HMM) we do not observe
the sequence that the model passed through but
only some probabilistic function of it. Thus, it
is a Markov model with the addition of emission
probabilities For example Observed The
House is on fire States def noun
verb prep noun
6
Why use HMMs?

A lot of real-life processes are composed of
underlying events generating surface phenomena.
Tagging parts of speech is a common example.
We can usually think of processes as having a
limited horizon (we can easily extend to the case
of a constant horizon larger than 1)
We have efficient training algorithm using EM
Once the model is set, we can easily run it
t1, start in state i with probability qi
forever move from state i to j with
probability aij
emit ytk with probability bik
tt1

7
The fundamental questions
Likelihood Given a model ?(A,B,Q), how do we
efficiently compute the likelihood of an
observation P(Y?)? Decoding Given the
observation sequence Y and a model ?, what state
sequence explains it best (MPE)?This is, for
example, the tagging process of an observed
sentence. Learning Given an observation sequence
Y, and a generic model, how do we estimate the
parameters that define the best model to describe
the data?
8
Computing the Likelihood
For any state sequence (X1,,XT) using
P(Y,X)P(YX)P(X) we get But, we have O(TNT)
multiplications!
9
The trellis (lattice) algorithm
To compute likelihood Need to enumerate over all
paths in the lattice (all possible instantiations
of X1XT). But some starting subpath (blue) is
common to many continuing paths
(bluered) Idea using dynamic programming,
calculate a path in terms of shorter sub-paths
10
The trellis (lattice) algorithm
We build a matrix of the probability of being at
time t at state i - ?t(i)P(xti,y1y2yt) is a
function of the previous column (forward
procedure)
1
a1i
2
a2i
i
aNi
N
?t()
?t1(i)
11
The trellis (lattice) algorithm (cont.)
We can similarly define a backwards procedure for
filling the matrix
1
Ai1b1y(t1)
i
aiNbNy(t1)
N
?t (i)
?t1()
12
The trellis (lattice) algorithm (cont.)
And we can easily combine And then
13
Finding the best state sequence
We would like to the most likely path (and not
just the most likely state at each time
slice) The Viterbi algorithm is an efficient
trellis method for finding the MPE and we to
reconstruct the path
14
The Casino HMM
A casino switches from a fair die (state F) to a
loaded one (state U) with probability 0.05 and
the other way around with probability 0.1. The
loaded die has 0.5 probability of showing a 6.
The casino, honestly, reads off the number that
was rolled.
15
The Casino HMM (cont.)
What is the likelyhood of 3151166661? Y 3 1 5
1 1 6 6 6 6 1 ?1(1)0.51/61/12, ?1(2)0.50.10.
05 ?2(1)1/6(0.951/120.10.05) ?
0.014 ?2(1)0.1(0.051/120.90.05) ?
0.0049 ?3(1) ? 0.0023, ?3(2) ? 0.0005 ?4(1) ?
0.0004, ?4(1) ? 0.0001 ?5(1) ? 0.0001, ?5(1) lt
0.0001 all smaller then 0.0001!
16
The Casino HMM (cont.)
What explains 3151166661 best? Y 3 1 5 1 1 6
6 6 6 1 ?1(1)0.51/61/12, ?1(2)0.50.10.05 ?2(
1)1/6max(0.951/12,0.10.05) ?
0.0132 ?2(1)1/10max(0.051/12, 0.90.05) ?
0.0045 ?3(1) ? 0.0021, ?3(2) ? 0.0004 ?4(1) ?
0.0003, ?4(1) lt 0.0001 ?5(1) ? 0.0001, ?5(1) lt
0.001
17
The Casino HMM (cont.)
An example of reconstruction using Viterbi
(Durbin) Rolls 315116246446644245311321631164152
1336 Die 0000000000000000000000000000000000000
Viterbi 0000000000000000000000000000000000000 R
olls 2514454363165662656666665116645313265 Die 00
00000011111111111111111111100000000 Viterbi 000000
0000011111111111111111100000000 Rolls
1245636664631636663162326455236266666 Die 0000111
111111111111100011111111111111 Viterbi 00001111111
11111111111111111111111111
18
Learning
If we were given both X and Y, we could
choose Using the Maximum Likelihood principal,
we simply assign the parameter for each relative
frequency What do we do when we have only Y? ML
here does not have a closed form formula!
19
EM (Baum Welch)
Idea Using current guess to complete data and
re-estimate Thm Likelihood of observables
never decreases!!! (to be proved later
in the course) Problems Gets stuck at
sub-optimal solutions
E-Step Guess X using Y and current parameters
M-Step Reestimate parameters using current
completion of data
20
Parameter Estimation
We define the expected number of transitions from
state i to j at time t the expected of
transition from i to j in Y is then
21
Parameter Esimation (cont.)
We use EM re-estimation formulas using the
expected counts we already have
22
Application Sequence Pair Alignmnet
DNA sequences are strings with a four letter
alphabet C,G,A,T . Real-life sequences often
have by way of mutation of a letter or even a
deletion. A fundamental problem in computation
biology is to align such sequences Input Out
put CTTTACGTTACTTACG CTTTACGTTAC-TTACG CTTAGTTAC
GTTAG C-TTA-GTAACGTTA-G How can we use an HMM
for such a task?
23
Sequence Pair Alignmnet (cont.)
We construct and HMM with 3 states that emits two
letters (M)atch Probability of emission of
aligned pairs (high probability for matching
letter low for mismatch) (D)elete1 Emission of a
letter in the first sequence and
an insert in the second sequence (D)elete2 The
converse of D1
Q Q
M 0.9
D1 0.05
D2 0.05
Matrix B If in M, emit same letter with
probability 0.24 (for each letter) If in D1 or
D2 emit all letters uniformly
Transition matrix A Transition matrix A Transition matrix A Transition matrix A
M D1 D2
M 0.9 0.05 0.05
D1 0.95 0.05 0
D2 0.95 0 0.05
24
Sequence Pair Alignmnet (cont.)

How to align 2 new sequences?
We also need (B)egin and (E)nd states to signify
the start and end of the sequence.
From B we will have to same transition
probabilities as for M and we will never return
to it.
We will have a probability of 0.05 to reach E
from any states and we will never leave it. This
probability will determine the average length of
a an alignment.
?
We now just do Viterbi with some extra technical
details (Programming ex1)

25
Extensions

HMM have been used so extensively, that it is
impossible to even start on the many forms they
take. Several extension however are worth
mentioning
using different kind of transition matrices
using continuous observations
using a larger horizon
assigning probabilities to wait times
using several training sets

Write a Comment

User Comments (0)