Title: Hidden Markov Model
1Hidden Markov Model
2Bayes Rule
- The posterior distribution
- Select k with the largest posterior distribution.
- Minimizes the average misclassification rate.
- Maximum likelihood rule is equivalent to Bayes
rule with uniform prior. - Decision boundary is
-
3Naïve Bayes approximation
- When x is high dimensional, it is difficult to
estimate
4Naïve Bayes Classifier
- When x is high dimensional, it is difficult to
estimate - But if we assume independence, then it becomes a
1-D problem.
5Naïve Bayes Classifier
- Usually the independence assumption is not valid.
- But sometimes the NBC can still be a good
classifier. - A lot of times simple models may not perform
badly.
6Hidden Markov Model
7A coin toss example
- Scenario You are betting with your friend using
a coin toss. And you see (H, T, T, H, )
8A coin toss example
Scenario You are betting with your friend using
a coin toss. And you see (H, T, T, H, ) But,
you friend is cheating. He occasionally switches
from a fair coin to a biased coin of course,
the switch is under the table!
Biased
Fair
9A coin toss example
This is what really happening (H, T, H, T, H,
H, H, H, T, H, H, T, ) Of course you cant see
the color. So how can you tell your friend is
cheating?
10Hidden Markov Model
Hidden state (the coin)
Observed variable (H or T)
11Markov Property
Hidden state (the coin)
Observed variable (H or T)
12Markov Property
transition probability
prior distribution
Biased
Fair
13Observation independence
Hidden state (the coin)
Observed variable (H or T)
Emission probability
14Model parameters
- A (aij) (transition matrix)
- p(yt xt) (emission probability)
- p(x1) (prior distribution)
15Model inference
- Infer states when model parameters are known.
- Both states and model parameters are unknown.
16Viterbi algorithm
time
t-1
t
t1
1
2
state
3
4
17Viterbi algorithm
time
t-1
t
t1
1
2
state
3
4
18Viterbi algorithm
time
t-1
t
t1
1
2
state
3
4
19Viterbi algorithm
time
t-1
t
t1
1
2
state
3
4
Therefore, the path can be found iteratively.
20Viterbi algorithm
time
t-1
t
t1
1
2
state
3
4
Let vk(i) be the most probable path ending in
state k. Then
21Viterbi algorithm
- Initialization (i0)
- Recursion (i1,...,L)
- Termination
- Traceback (i L, ..., 1)
22Advantage of Viterbi path
- Identify the most probable path very efficiently.
- The most probable path is legitimate, i.e., it is
realizable by the HMM process.
23Issue with Viterbi path
- The most probability path does not predict the
confidence level of a state estimate. - The most probably path may not be much more
probable then other paths.
24Posterior distribution
- Estimate p(xk y1, ..., yL).
- Strategy
- This is done by a forward-backward algorithm
25Forward-backward algorithm
26Forward algorithm
Estimate fk(i)
Initialization Recursion Termination
27Backward algorithm
Estimate bk(i)
28Backward algorithm
Estimate bk(i)
Initialization Recursion Termination
29Probability of fair coin
1
P(fair)
30Probability of fair coin
1
P(fair)
31Posterior distribution
- Posterior distribution predicts the confidence
level of a state estimate. - Posterior distribution combines information from
all paths. - But..
- The predicted path may not be legitimate.
32Estimating parameters when state sequence is known
- Given the state sequence xk
- Define
- Ajk transitions from j to k.
- Ek(b) emissions of b from k.
- The maximum likelihood estimates of parameters
are
33Infer hidden states together with model parameters
- Viterbi training
- Baum-Welch
34Viterbi training
- Main idea Use an iterative procedure
- Estimate state for fixed parameters using the
Viterbi algorithm. - Estimate model parameters for fixed states.
35Baum-Welch algorithm
- Instead of using the Viterbi path to estimate
state, consider the expected number of Akl and
Ek(b)
36Baum-Welch algorithm
- Instead of using the Viterbi path to estimate
state, consider the expected number of Akl and
Ek(b)
37Baum-Welch is a special case of EM algorithm
- Given an estimate of parameter qt , try to find a
better q.
38Baum-Welch is a special case of EM algorithm
- E-step Calculate the Q function
- M-step Maximize Q(qqt) with respect to q.
39Issue with EM
- EM only finds local maxima.
- Solution
- Run multiple EM starting with different initial
guesses. - Use more sophisticated algorithm such as MCMC.
40Dynamic Bayesian Network
Kelvin Murphy
41Software
- Kevin Murphys Bayes Net Toolbox for Matlab
- http//www.cs.ubc.ca/murphyk/Software/BNT/bnt.ht
ml
42Applications
Copy number changes
(Yi Li)
43Applications
Protein-binding sites
44Applications
Sequence alignment
www.biocentral.com
45Reading list
- Hastie et al. (2001) the ESL book
- p184-185.
- Durbin et al. (1998) Biological Sequence Analysis
- Chapter 3.