HIDDEN MARKOV MODELS - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

HIDDEN MARKOV MODELS

Description:

HIDDEN MARKOV MODELS Prof. Navneet Goyal Department of Computer Science BITS, Pilani Topics Markov Models Hidden Markov Models HMM Problems Application to Sequence ... – PowerPoint PPT presentation

Number of Views:203

Avg rating:3.0/5.0

Slides: 52

Provided by: csisBits6

Category:

more less

Transcript and Presenter's Notes

Title: HIDDEN MARKOV MODELS

1
HIDDEN MARKOV MODELS

Prof. Navneet Goyal
Department of Computer Science
BITS, Pilani

2
Topics

Markov Models
Hidden Markov Models
HMM Problems
Application to Sequence Alignment

3
Markov Analysis

A technique that deals with the probabilities of
future occurrences by analyzing presently known
probabilities
Founder of the concept was A.A. Markov whose 1905
studies of sequence of experiments conducted in a
chain were used to describe the principle of
Brownian motion

4
Markov Analysis

Applications
Market share analysis
Bad debt prediction
Speech recognition
University enrollment prediction

5
Markov Analysis

Two competing manufacturers might have 40 60
market share today. May be in two months time,
their market shares would become 45 55
respectively
Predicting these future states involve knowing
the systems probabilities of changing from one
state to another
Matrix of transition probabilities
This is Markov Process

6
Markov Analysis

A finite number of possible states.
2. Probability of change remains the same over
time.
3. Future state predictable from current state.
4. Size of system remains the same.
5. States collectively exhaustive.
6. States mutually exclusive.

7
The Markov Process
? ? P ? ?
8
Markov ProcessEquations
9
Predicting Future States
Market Share of Grocery Stores AMERICAN FOOD
STORE 40 FOOD MART 30 ATLAS FOODS
30 ?(1)0.4,0.3,0.3
10
Predicting Future States
11
Predicting Future States

Will this trend continue in the future?
Is it an equilibrium state?
WILL Atlas food lose all of its market share

12
Markov Analysis Machine Operations

P 0.8 0.2
0.1 0.9
State1 machine functioning correctly
State2 machine functioning incorrectly
P11 0.8 probability that the machine will be
correctly functioning given it was correctly
functioning last month
?2?1P1,0P0.8,0.2
?3?2P0.8,0.2P0.66,0.34

13
Machine Example Periods to Reach Equilibrium
Period
State 1
State 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0 .8 .66 .562 .4934
.44538 .411766 .388236
.371765 .360235
.352165 .346515 .342560
.339792 .337854
0.0 .2 .34 .438 .5066
.55462 .588234
.611763 .628234
.639754 .647834
.653484 .657439
.660207 .662145
14
Equilibrium Equations
15
Markov System

16
Markov System
17
Markov System
18
Markov System
19
Markov System
20
Markov System
21
Markov System

At regularly spaced discrete times, the system
undergoes a change of state (possibly back to
same state)
Discrete first order Markov Chain
PqtSjqt-1Si,qt-2Sk,..
PqtSjqt-1Si
Consider only those processes in which the RHS
is independent of time
State transition probabilities are given by
aij PqtSjqt-1Si 1lti,jltN

22
Markov Models

A model of sequences of events where the
probability of an event occurring depends upon
the fact that a preceding event occurred.
Observable states 1, 2, , N
Observed sequences O1, O2, , Ol, , OT
P(OljO1a,,Ol-1b,Ol1c,)P(OljO1a,,Ol-1
b)
Order n model
A Markov process is a process which moves from
state to state depending (only) on the previous n
states.

23
Markov Models

First Order Model (n1)
P(OljOl-1a,Ol-2b,)P(OljOl-1a)
The state of model depends only on its previous
state.
Components States, initial probabilities state
transition probabilities

24
Markov Models

Consider a simple 3-state Markov model of weather
Assume that once a day (eg at noon), the weather
is observed as one of the folowiing
State 1 Rain or (snow)
State 2 Cloudy
State 3 Sunny
Transition Probabilities
0.4 0.3 0.3
0.2 0.6 0.2
0.1 0.1 0.8
Given that on day 1 the weather is sunny
What is the probability that the weather for the
next 7 days will be S S R R S C S?

25
Hidden Markov Model

HMMs allow you to estimate probabilities of
unobserved events
E.g., in speech recognition, the observed data is
the acoustic signal and the words are the hidden
parameters

26
HMMs and their Usage

HMMs are very common in Computational
Linguistics
Speech recognition (observed acoustic signal,
hidden words)
Handwriting recognition (observed image, hidden
words)
Machine translation (observed foreign words,
hidden words in target language)

27
Hidden Markov Models

Markov Model is used to predict what will come
next based on previous observations.
However, sometimes, what we want to predict is
not what we observed.
Example
Someone trying to deduce the weather from a piece
of seaweed
For some reason, he can not access weather
information (sun, cloud, rain) directly
But he can know the dampness of a piece of
seaweed (soggy, damp, dryish, dry)
And the state of the seaweed is probabilistically
related to the state of the weather

28
Hidden Markov Models

Hidden Markov Models are used to solve this kind
of problems.
Hidden Markov Model is an extension of First
Order Markov Model
The true states are not observable directly
(Hidden)
Observable states are probabilistic functions of
the hidden states
The hidden system is First Order Markov

29
Hidden Markov Models

A Hidden Markov Model is consist of two sets of
states and three sets of probabilities
hidden states the (TRUE) states of a system
that may be described by a Markov process (e.g.
weather states in our example).
observable states the states of the process
that are visible (e.g. dampness of the
seaweed).
Initial probabilities for hidden states
Transition probabilities for hidden states
Confusion probabilities from hidden states to
observable states

30
Hidden Markov Models
31
Hidden Markov Models
Initial matrix
Transition matrix
Confusion matrix
32
The Trellis
33
HMM problems

HMMs are used to solve three kinds of problems
Finding the probability of an observed sequence
given a HMM (evaluation)
Finding the sequence of hidden states that most
probably generated an observed sequence
(decoding).
The third problem is generating a HMM given a
sequence of observations (learning). learning
the probabilities from training data.

34
HMM Problems

1. Evaluation
Problem
We have a number of HMMs and a sequence of
observations. We may want to know which HMM most
probably generated the given sequence.
Solution
Computing the probability of the observed
sequences for each HMM.
Choose the one produced highest probability
Can use Forward algorithm to reduce complexity.

35
HMM problems
Pr(dry,damp,soggy HMM) Pr(dry,damp,soggy
sunny,sunny,sunny) Pr(dry,damp,soggy
sunny,sunny ,cloudy) Pr(dry,damp,soggy
sunny,sunny ,rainy) . . . . Pr(dry,damp,soggy
rainy,rainy ,rainy)
36
HMM problems

2. Decoding
Problem
Given a particular HMM and an observation
sequence, we want to know the most likely
sequence of underlying hidden states that might
have generated the observation sequence.
Solution
Computing the probability of the observed
sequences for each possible sequence of
underlying hidden states.
Choose the one produced highest probability
Can use Viterbi algorithm to reduce the
complexity.

37
HMM Problems
the most probable sequence of hidden states is
the sequence that maximizes Pr(dry,damp,soggy
sunny,sunny,sunny), Pr(dry,damp,soggy
sunny,sunny,cloudy), Pr(dry,damp,soggy
sunny,sunny,rainy), . . . . Pr(dry,damp,soggy
rainy,rainy,rainy)
38
HMM problems (cont.)

3. Learning
Problem
Estimate the probabilities of HMM from training
data
Solution
Training with labeled data
Transition probability P(a,b)(number of
transitions from a to b)/ total number of
transitions of a
Confusion probability P(a, o)(number of symbol o
occurrences in state a)/(number of all symbol
occurrences in state a)
Training with unlabeled data
Baum-Welch algorithm
The basic idea
Random generate HMM at the beginning
Estimate new probability from the previous HMM
until P(current HMM) P( previous HMM) lt e (a
small number)

39
Experiments

Problem
Parsing a reference string into fields (author,
journal, volume, page, year, etc.)
Model as HMM
Hidden states fields (author, journal, volume,
etc) and some special characters ( ,, and,
etc.)
Observable states words
Probability matrixes --learning from training
data
Reference parsing
Using Viterbi algorithm to find the most possible
sequence of hidden states for an observation
sequences.

40
Experiment (cont.)

Select 1000 reference strings (refer to article)
from APS.
Using the first 750 for training and the rest for
testing.
Do similar feature generalization as that we did
for SVM last time.
M. ? init
Phys. ? abbrev
1994 ? Digs4
Liu ? CapNonWord
physics ? LowDictWord
Should but not special processing tag ltetal/gt

41
Experiment(cont.)

Measurement
Let C be the total number of words (tokens) which
are predicted correctly.
Let N be the total number of words
Correct Rate
RC/N 100
Our result
N4284 C4219 R98.48

42
Conclusion

HMM is used to model
What we want to predict is not what we observed
The underlying system can be model as first order
Markov
HMM assumption
The next state is independent of all states but
its previous state
The probability matrixes learned from samples are
the actual probability matrixes.
After learning, the probability matrixes will
keep unchanged

43
Sequence Models

Observed biological sequences (DNA, RNA, protein)
can be thought of as the outcomes of random
processes.
So, it makes sense to model sequences using
probabilistic models.
You can think of a sequence model as a little
machine that randomly generates sequences.

44
A Simple Sequence Model

Imagine a tetrahedral (four-sided) die with the
letters A, C, G and T on its sides.
You roll the die 100 times and write down the
letters that come up (down, actually).
This is a simple random sequence model.

45
Complete 0-order Markov Model

To model the length of the sequences that the
model can generate, we need to add start and
end states.

46
Generating a Sequence

This Markov model can generate any DNA sequence.
Associated with each sequence is a path and a
probability.
Start in state S P 1
Move to state M P1P
Print x P qXP
Move to state M PpP or to state E P(1-p) P
If in state M, go to 3. If in state E, stop.

Sequence GCAGCT
Path S, M, M, M, M, M, M, E
P1qGpqCpqApqGpqCpqT(1-p)

47
Using a 0-order Markov Model

This model can generate any DNA sequence, so it
can be used to model DNA.
We used it when we created scoring matrices for
sequence alignment as the background model.
Its a pretty dumb model, though.
DNA is not very well modeled by a 0-order Markov
model because the probability of seeing, say, a
G following a C is usually different than a
G following an A
So we need a better models higher order Markov
models.

48
Markov Model Order

This simple sequence model is called a 0-order
Markov model because the probability distribution
of the next letter to be generated doesnt depend
on any (zero) of the letters preceding it.

The Markov Property
Let X X1X2XL be a sequence.
In an n-order Markov sequence model, the
probability distribution of the next letter
depends on the previous n letters generated.
0-order Pr(XiX1X2Xi-1)Pr(Xi)
1-order Pr(XiX1X2Xi-1)Pr(XiXi-1)
n-order Pr(XiX1X2Xi-1)Pr(XiXi-1Xi-2Xi-n)

49
A 1-order Markov Sequence Model

In a first-order Markov sequence model, the
probability of the next letter depends on what
the previous letter generated was.
We can model this by making a state for each
letter. Each state always emits the letter it is
labeled with. (Not all transitions are shown.)

50
A 2-order Markov Model

To make a second order Markov sequence model,
each state is labelled with two letters. It
emits the second letter in its label.
There would have to be sixteen states AA, AC,
AG, AT, CA, CG, CT etc., plus four states for the
first letter in the sequence A, C, G, T
Each state would have transitions only to states
whose first letter matched their second letter.

51
Part of a 2-order Model