Title: CS479679 Pattern Recognition Spring 2006 Prof' Bebis
1CS479/679 Pattern RecognitionSpring 2006 Prof.
Bebis
- Hidden Markov Models (HMMs)
- Chapter 3 (Duda et al.) Section 3.10
2Hidden Markov Models (HMMs)
- Sequential patterns
- The order of the data points is irrelevant.
- No explicit sequencing ...
- Temporal patterns
- The result of a time process (e.g., time series).
- Can be represented by a number of states.
- States at time t are influenced directly by
states in previous time steps (i.e., correlated).
3Hidden Markov Models (HMMs)
- HMMs are appropriate for problems that have an
inherent temporality. - Speech recognition
- Gesture recognition
- Human activity recognition
4First-Order Markov Models
- They are represented by a graph where every node
corresponds to a state ?i. - The graph can be fully-connected with self-loops.
- Links between nodes ?i and ?j are associated with
a transition probability - P( ?(t1)?j/?(t)?i )aij
- which is the probability of going to state ?j
at time t1 given that the state at time t was ?i
(first-order model).
5First-Order Markov Models (contd)
- The following constraints should be satisfied
- Markov models are fully described by their
transition probabilities aij -
6Example Weather Prediction Model
- Assume three weather states
- ?1 Precipitation (rain, snow, hail, etc.)
- ?2 Cloudy
- ?3 Sunny
Transition Matrix
?1
?2
? 1 ? 2 ? 3
?1 ?2 ?3
?3
7Computing P(?T) of a sequence of states ?T
- Given a sequence of states ?T(?(1), ?(2),...,
?(T)), the probability that the model generated
?T is equal to the product of the corresponding
transition probabilities - where P(?(1)/ ?(0))P(?(1)) is the prior
probability of the first state.
8Example Weather Prediction Model (contd)
- What is the probability that the weather for
eight consecutive days is - sun-sun-sun-rain-rain-sun-cloudy-sun ?
- ?8?3?3?3?1?3?2?3
- P(?8)P(?3)P(?3/?3)P(?3/?3)P(?1/?3)P(?3/?1)
- P(?2/?3)P(?3/?2)1.536 x 10-4
9Limitations of Markov models
- In Markov models, each state is uniquely
associated with an observable event. - Once an observation is made, the state of the
system is trivially retrieved. - Such systems are not of practical use for most
practical applications.
10Hidden States and Observations
- Assume that observations are a probabilistic
function of each state. - Each state can produce can generate a number of
outputs (i.e., observations) according to a
unique probability distribution. - Each observation can potentially be generated at
any state. - State sequence is not directly observable.
- Can be approximated by a sequence of
observations.
11First-order HMMs
- We augment the model such that when it is in
state ?(t) it also emits some symbol v(t)
(visible states) among a set of possible symbols. - We have access to the visible states only, while
the ?(t) are unobservable.
12Example Weather Prediction Model (contd)
v1 temperature v2 humidity etc.
Observations
13First-order HMMs
- For every sequence of -hidden- states, there is
an associated sequence of visible states - ?T(?(1), ?(2),..., ?(T)) ? VT(v(1),
v(2),..., v(T)) - When the model is in state ?j at time t, the
probability of emitting a visible state vk at
that time is denoted as - P(v(t)vk/ ?(t) ?j)bjk where
- (observation probabilities)
14Absorbing State
- Given a state sequence and its corresponding
observation sequence - ?T(?(1), ?(2),..., ?(T)) ? VT(v(1),
v(2),..., v(T)) - we assume that ?(T)?0 is some absorbing
state, which uniquely emits symbol v(T)v0 - Once entering the absorbing state, the system can
not escape from it.
15HMM Formalism
- An HMM is defined by O, V, P, A, B
- O ?1 ? n are the possible states
- V v1vm are the possible observations
- P pi prior state probabilities
- A aij are the state transition probabilities
- B bik are the observation state probabilities
16Some Terminology
- Causal the probabilities depend only upon
previous states. - Ergodic Every one of the states has a non-zero
probability of occurring given some starting
state.
left-right HMM
17Coin toss example
- You are in a room with a barrier (e.g., a
curtain) through which you cannot see what is
happening. - On the other side of the barrier is another
person who is performing a coin (or multiple
coin) toss experiment. - The other person will tell you only the result of
the experiment, not how he obtained that result!! - e.g., VTHHTHTTHH...Tv(1),v(2), ..., v(T)
18Coin toss example (contd)
- Problem derive an HMM model to explain the
observed sequence of heads and tails. - The coins represent the states these are hidden
because we do not know which coin was tossed each
time. - The outcome of each toss represents an
observation. - A likely sequence of coins may be inferred from
the observations. - As we will see, the state sequence will not be
unique in general.
19Coin toss example1-fair coin model
- There are 2 states, each associated with either
heads (state1) or tails (state2). - Observation sequence uniquely defines the states
(model is not hidden).
observations
20Coin toss example2-fair coins model
- There are 2 states but neither state is uniquely
associated with either heads or tails (i.e., each
state can be associated with a different fair
coin). - A third coin is used to decide which of the fair
coins to flip.
observations
21Coin toss example2-biased coins model
- There are 2 states with each state associated
with a biased coin. - A third coin is used to decide which of the
biased coins to flip.
observations
22Coin toss example3-biased coins model
- There are 3 states with each state associated
with a biased coin. - We decide which coin to flip using some way
(e.g., other coins).
observations
23Which model is best?
- Since the states are not observable, the best we
can do is select the model that best explains the
data. - Long observation sequences would be best for
selecting the best model ...
24Classification Using HMMs
- Given an observation sequence VT and set of
possible models, choose the model with the
highest probability.
Bayes formula
25Main Problems in HMMs
- Evaluation
- Determine the probability P(VT) that a particular
sequence of visible states VT was generated by a
given model (based on dynamic programming). - Decoding
- Given a sequence of visible states VT, determine
the most likely sequence of hidden states ?T that
led to those observations (based on dynamic
programming). - Learning
- Given a set of visible observations, determine
aij and bjk (based on EM algorithm).
26Evaluation
(i.e., possible of state sequences)
27Evaluation (contd)
(enumerate all possible transitions to determine
how good the model is)
28Example Evaluation
(enumerate all possible transitions to determine
how good the model is)
29Computational Complexity
30Recursive computation of P(VT) (HMM Forward)
?(1)
?(t)
?(t1)
?(T)
?i
?j
...
v(T)
v(t1)
v(1)
v(t)
31Recursive computation of P(VT) (HMM Forward)
(contd)
32Recursive computation of P(VT) (HMM Forward)
(contd)
?0
?0
?0
?0
?0
33Recursive computation of P(VT) (HMM Forward)
(contd)
for j0 to c do
(i.e., corresponds to state ?0 ?(T))
34Example
?0 ?1 ?2 ?3
?0 ?1 ?2 ?3
?0 ?1 ?2 ?3
35Example (contd)
- Similarly for t2,3,4
- Finally
VT
(0.00108)
0.2
initial state
0.2
0.8
36The backward algorithm (HMM backward)
ßj(t1)
/? (t1)?j)
ßi(t)
i
ßi(t)
?i
?(t)
?(t1)
?(T)
?i
?j
...
v(t)
v(t1)
v(T)
37The backward algorithm (HMM backward) (contd)
?j))
or
i
?(t)
?(t1)
?(T)
?i
?j
v(t)
v(t1)
v(T)
38The backward algorithm (HMM backward) (contd)
39Decoding
- We need to use an optimality criterion to solve
this problem (i.e., there are several possible
ways solving this problem since there are various
optimality criteria we could use). - Algorithm 1 choose the states ?(t) which are
individually most likely (i.e., maximize the
expected number of correct individual states).
40Decoding Algorithm 1 (contd)
41Decoding Algorithm 2
- Algorithm 2 at each time step t, find the state
that has the highest probability ai(t). - Uses the forward algorithm with minor changes.
42Decoding Algorithm 2 (contd)
43Decoding Algorithm 2 (contd)
44Decoding Algorithm 2 (contd)
- There is no guarantee that the path is a valid
one. - The path might imply a transition that is not
allowed by the model.
0 1 2 3
4
not allowed! ?320
45Decoding Algorithm 3
46Decoding Algorithm 3 (contd)
47Decoding Algorithm 3 (contd)
48Decoding Algorithm 3 (contd)
49Learning
- Use EM
- Update the weights iteratively to better explain
the - observed training sequences.
50Learning (contd)
51Learning (contd)
- Define the probability of transitioning from ?i
to ?j at step t given VT
(expectation step)
52Learning (contd)
53Learning (contd)
(maximization step)
54Learning (contd)
(maximization step)
55Difficulties
- How do we decide on the number of states and the
structure of the model? - Use domain knowledge otherwise very hard problem!
- What about the size of observation sequence ?
- Should be sufficiently long to guarantee that all
state transitions will appear a sufficient number
of times. - A large number of training data is necessary to
learn the HMM parameters.