Hidden Markov Model presentation

About This Presentation

Transcript and Presenter's Notes

Title: Hidden Markov Model

1
Hidden Markov Model

11/28/07

2
Bayes Rule

The posterior distribution
Select k with the largest posterior distribution.
Minimizes the average misclassification rate.
Maximum likelihood rule is equivalent to Bayes
rule with uniform prior.
Decision boundary is

3
Naïve Bayes approximation

When x is high dimensional, it is difficult to
estimate

4
Naïve Bayes Classifier

When x is high dimensional, it is difficult to
estimate
But if we assume independence, then it becomes a
1-D problem.

5
Naïve Bayes Classifier

Usually the independence assumption is not valid.
But sometimes the NBC can still be a good
classifier.
A lot of times simple models may not perform
badly.

6
Hidden Markov Model
7
A coin toss example

Scenario You are betting with your friend using
a coin toss. And you see (H, T, T, H, )

8
A coin toss example
Scenario You are betting with your friend using
a coin toss. And you see (H, T, T, H, ) But,
you friend is cheating. He occasionally switches
from a fair coin to a biased coin of course,
the switch is under the table!
Biased
Fair
9
A coin toss example
This is what really happening (H, T, H, T, H,
H, H, H, T, H, H, T, ) Of course you cant see
the color. So how can you tell your friend is
cheating?
10
Hidden Markov Model
Hidden state (the coin)
Observed variable (H or T)
11
Markov Property
Hidden state (the coin)
Observed variable (H or T)
12
Markov Property
transition probability
prior distribution
Biased
Fair
13
Observation independence
Hidden state (the coin)
Observed variable (H or T)
Emission probability
14
Model parameters

A (aij) (transition matrix)
p(yt xt) (emission probability)
p(x1) (prior distribution)

15
Model inference

Infer states when model parameters are known.
Both states and model parameters are unknown.

16
Viterbi algorithm
time
t-1
t
t1
1
2
state
3
4
17
Viterbi algorithm
time

Most probable path

t-1
t
t1
1
2
state
3
4
18
Viterbi algorithm
time

Most probable path

t-1
t
t1
1
2
state
3
4
19
Viterbi algorithm
time

Most probable path

t-1
t
t1
1
2
state
3
4
Therefore, the path can be found iteratively.
20
Viterbi algorithm
time

Most probable path

t-1
t
t1
1
2
state
3
4
Let vk(i) be the most probable path ending in
state k. Then
21
Viterbi algorithm

Initialization (i0)
Recursion (i1,...,L)
Termination
Traceback (i L, ..., 1)

22
Advantage of Viterbi path

Identify the most probable path very efficiently.
The most probable path is legitimate, i.e., it is
realizable by the HMM process.

23
Issue with Viterbi path

The most probability path does not predict the
confidence level of a state estimate.
The most probably path may not be much more
probable then other paths.

24
Posterior distribution

Estimate p(xk y1, ..., yL).
Strategy
This is done by a forward-backward algorithm

25
Forward-backward algorithm

Estimate fk(i)

26
Forward algorithm
Estimate fk(i)
Initialization Recursion Termination
27
Backward algorithm
Estimate bk(i)
28
Backward algorithm
Estimate bk(i)
Initialization Recursion Termination
29
Probability of fair coin
1
P(fair)
30
Probability of fair coin
1
P(fair)
31
Posterior distribution

Posterior distribution predicts the confidence
level of a state estimate.
Posterior distribution combines information from
all paths.
But..
The predicted path may not be legitimate.

32
Estimating parameters when state sequence is known

Given the state sequence xk
Define
Ajk transitions from j to k.
Ek(b) emissions of b from k.
The maximum likelihood estimates of parameters
are

33
Infer hidden states together with model parameters

Viterbi training
Baum-Welch

34
Viterbi training

Main idea Use an iterative procedure
Estimate state for fixed parameters using the
Viterbi algorithm.
Estimate model parameters for fixed states.

35
Baum-Welch algorithm

Instead of using the Viterbi path to estimate
state, consider the expected number of Akl and
Ek(b)

36
Baum-Welch algorithm

Instead of using the Viterbi path to estimate
state, consider the expected number of Akl and
Ek(b)

37
Baum-Welch is a special case of EM algorithm

Given an estimate of parameter qt , try to find a
better q.

Choose q to maximize Q

38
Baum-Welch is a special case of EM algorithm

E-step Calculate the Q function
M-step Maximize Q(qqt) with respect to q.

39
Issue with EM

EM only finds local maxima.
Solution
Run multiple EM starting with different initial
guesses.
Use more sophisticated algorithm such as MCMC.

40
Dynamic Bayesian Network
Kelvin Murphy
41
Software

Kevin Murphys Bayes Net Toolbox for Matlab
http//www.cs.ubc.ca/murphyk/Software/BNT/bnt.ht
ml

42
Applications
Copy number changes
(Yi Li)
43
Applications
Protein-binding sites
44
Applications
Sequence alignment
www.biocentral.com
45
Reading list

Hastie et al. (2001) the ESL book
p184-185.
Durbin et al. (1998) Biological Sequence Analysis
Chapter 3.

Write a Comment

User Comments (0)

About PowerShow.com

Hidden Markov Model PowerPoint PPT Presentation