CS479679 Pattern Recognition Spring 2006 Prof' Bebis - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

CS479679 Pattern Recognition Spring 2006 Prof' Bebis

Description:

Speech recognition. Gesture recognition. Human activity ... For every sequence of -hidden- states, there is an associated sequence of visible states: ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 56

Provided by: cse5

Category:

more less

Transcript and Presenter's Notes

Title: CS479679 Pattern Recognition Spring 2006 Prof' Bebis

1
CS479/679 Pattern RecognitionSpring 2006 Prof.
Bebis

Hidden Markov Models (HMMs)
Chapter 3 (Duda et al.) Section 3.10

2
Hidden Markov Models (HMMs)

Sequential patterns
The order of the data points is irrelevant.
No explicit sequencing ...
Temporal patterns
The result of a time process (e.g., time series).
Can be represented by a number of states.
States at time t are influenced directly by
states in previous time steps (i.e., correlated).

3
Hidden Markov Models (HMMs)

HMMs are appropriate for problems that have an
inherent temporality.
Speech recognition
Gesture recognition
Human activity recognition

4
First-Order Markov Models

They are represented by a graph where every node
corresponds to a state ?i.
The graph can be fully-connected with self-loops.
Links between nodes ?i and ?j are associated with
a transition probability
P( ?(t1)?j/?(t)?i )aij
which is the probability of going to state ?j
at time t1 given that the state at time t was ?i
(first-order model).

5
First-Order Markov Models (contd)

The following constraints should be satisfied
Markov models are fully described by their
transition probabilities aij

6
Example Weather Prediction Model

Assume three weather states
?1 Precipitation (rain, snow, hail, etc.)
?2 Cloudy
?3 Sunny

Transition Matrix
?1
?2
? 1 ? 2 ? 3
?1 ?2 ?3
?3
7
Computing P(?T) of a sequence of states ?T

Given a sequence of states ?T(?(1), ?(2),...,
?(T)), the probability that the model generated
?T is equal to the product of the corresponding
transition probabilities
where P(?(1)/ ?(0))P(?(1)) is the prior
probability of the first state.

8
Example Weather Prediction Model (contd)

What is the probability that the weather for
eight consecutive days is
sun-sun-sun-rain-rain-sun-cloudy-sun ?
?8?3?3?3?1?3?2?3
P(?8)P(?3)P(?3/?3)P(?3/?3)P(?1/?3)P(?3/?1)
P(?2/?3)P(?3/?2)1.536 x 10-4

9
Limitations of Markov models

In Markov models, each state is uniquely
associated with an observable event.
Once an observation is made, the state of the
system is trivially retrieved.
Such systems are not of practical use for most
practical applications.

10
Hidden States and Observations

Assume that observations are a probabilistic
function of each state.
Each state can produce can generate a number of
outputs (i.e., observations) according to a
unique probability distribution.
Each observation can potentially be generated at
any state.
State sequence is not directly observable.
Can be approximated by a sequence of
observations.

11
First-order HMMs

We augment the model such that when it is in
state ?(t) it also emits some symbol v(t)
(visible states) among a set of possible symbols.
We have access to the visible states only, while
the ?(t) are unobservable.

12
Example Weather Prediction Model (contd)

v1 temperature v2 humidity etc.
Observations
13
First-order HMMs

For every sequence of -hidden- states, there is
an associated sequence of visible states
?T(?(1), ?(2),..., ?(T)) ? VT(v(1),
v(2),..., v(T))
When the model is in state ?j at time t, the
probability of emitting a visible state vk at
that time is denoted as
P(v(t)vk/ ?(t) ?j)bjk where
(observation probabilities)

14
Absorbing State

Given a state sequence and its corresponding
observation sequence
?T(?(1), ?(2),..., ?(T)) ? VT(v(1),
v(2),..., v(T))
we assume that ?(T)?0 is some absorbing
state, which uniquely emits symbol v(T)v0
Once entering the absorbing state, the system can
not escape from it.

15
HMM Formalism

An HMM is defined by O, V, P, A, B
O ?1 ? n are the possible states
V v1vm are the possible observations
P pi prior state probabilities
A aij are the state transition probabilities
B bik are the observation state probabilities

16
Some Terminology

Causal the probabilities depend only upon
previous states.
Ergodic Every one of the states has a non-zero
probability of occurring given some starting
state.

left-right HMM
17
Coin toss example

You are in a room with a barrier (e.g., a
curtain) through which you cannot see what is
happening.
On the other side of the barrier is another
person who is performing a coin (or multiple
coin) toss experiment.
The other person will tell you only the result of
the experiment, not how he obtained that result!!
e.g., VTHHTHTTHH...Tv(1),v(2), ..., v(T)

18
Coin toss example (contd)

Problem derive an HMM model to explain the
observed sequence of heads and tails.
The coins represent the states these are hidden
because we do not know which coin was tossed each
time.
The outcome of each toss represents an
observation.
A likely sequence of coins may be inferred from
the observations.
As we will see, the state sequence will not be
unique in general.

19
Coin toss example1-fair coin model

There are 2 states, each associated with either
heads (state1) or tails (state2).
Observation sequence uniquely defines the states
(model is not hidden).

observations
20
Coin toss example2-fair coins model

There are 2 states but neither state is uniquely
associated with either heads or tails (i.e., each
state can be associated with a different fair
coin).
A third coin is used to decide which of the fair
coins to flip.

observations
21
Coin toss example2-biased coins model

There are 2 states with each state associated
with a biased coin.
A third coin is used to decide which of the
biased coins to flip.

observations
22
Coin toss example3-biased coins model

There are 3 states with each state associated
with a biased coin.
We decide which coin to flip using some way
(e.g., other coins).

observations
23
Which model is best?

Since the states are not observable, the best we
can do is select the model that best explains the
data.
Long observation sequences would be best for
selecting the best model ...

24
Classification Using HMMs

Given an observation sequence VT and set of
possible models, choose the model with the
highest probability.

Bayes formula
25
Main Problems in HMMs

Evaluation
Determine the probability P(VT) that a particular
sequence of visible states VT was generated by a
given model (based on dynamic programming).
Decoding
Given a sequence of visible states VT, determine
the most likely sequence of hidden states ?T that
led to those observations (based on dynamic
programming).
Learning
Given a set of visible observations, determine
aij and bjk (based on EM algorithm).

26
Evaluation

(i.e., possible of state sequences)
27
Evaluation (contd)

(enumerate all possible transitions to determine
how good the model is)
28
Example Evaluation

(enumerate all possible transitions to determine
how good the model is)
29
Computational Complexity

30
Recursive computation of P(VT) (HMM Forward)

?(1)
?(t)
?(t1)
?(T)
?i
?j
...
v(T)
v(t1)
v(1)
v(t)
31
Recursive computation of P(VT) (HMM Forward)
(contd)

Using maginalization

32
Recursive computation of P(VT) (HMM Forward)
(contd)

?0

?0
?0
?0
?0
33
Recursive computation of P(VT) (HMM Forward)
(contd)

for j0 to c do

(i.e., corresponds to state ?0 ?(T))
34
Example

?0 ?1 ?2 ?3
?0 ?1 ?2 ?3
?0 ?1 ?2 ?3
35
Example (contd)

Similarly for t2,3,4
Finally

VT
(0.00108)
0.2
initial state
0.2
0.8
36
The backward algorithm (HMM backward)

ßj(t1)
/? (t1)?j)
ßi(t)
i
ßi(t)
?i
?(t)
?(t1)
?(T)
?i
?j
...
v(t)
v(t1)
v(T)
37
The backward algorithm (HMM backward) (contd)

?j))
or
i
?(t)
?(t1)
?(T)
?i
?j
v(t)
v(t1)
v(T)
38
The backward algorithm (HMM backward) (contd)

39
Decoding

We need to use an optimality criterion to solve
this problem (i.e., there are several possible
ways solving this problem since there are various
optimality criteria we could use).
Algorithm 1 choose the states ?(t) which are
individually most likely (i.e., maximize the
expected number of correct individual states).

40
Decoding Algorithm 1 (contd)

41
Decoding Algorithm 2

Algorithm 2 at each time step t, find the state
that has the highest probability ai(t).
Uses the forward algorithm with minor changes.

42
Decoding Algorithm 2 (contd)

43
Decoding Algorithm 2 (contd)

44
Decoding Algorithm 2 (contd)

There is no guarantee that the path is a valid
one.
The path might imply a transition that is not
allowed by the model.

0 1 2 3
4
not allowed! ?320
45
Decoding Algorithm 3

46
Decoding Algorithm 3 (contd)

47
Decoding Algorithm 3 (contd)

48
Decoding Algorithm 3 (contd)

49
Learning

Use EM
Update the weights iteratively to better explain
the
observed training sequences.

50
Learning (contd)

Idea

51
Learning (contd)

Define the probability of transitioning from ?i
to ?j at step t given VT

(expectation step)
52
Learning (contd)

53
Learning (contd)

(maximization step)
54
Learning (contd)

(maximization step)
55
Difficulties

How do we decide on the number of states and the
structure of the model?
Use domain knowledge otherwise very hard problem!
What about the size of observation sequence ?
Should be sufficiently long to guarantee that all
state transitions will appear a sufficient number
of times.
A large number of training data is necessary to
learn the HMM parameters.

Write a Comment

User Comments (0)