Title: HIDDEN MARKOV MODELS
1HIDDEN MARKOV MODELS
- Prof. Navneet Goyal
- Department of Computer Science
- BITS, Pilani
2Topics
- Markov Models
- Hidden Markov Models
- HMM Problems
- Application to Sequence Alignment
3Markov Analysis
- A technique that deals with the probabilities of
future occurrences by analyzing presently known
probabilities - Founder of the concept was A.A. Markov whose 1905
studies of sequence of experiments conducted in a
chain were used to describe the principle of
Brownian motion
4Markov Analysis
- Applications
- Market share analysis
- Bad debt prediction
- Speech recognition
- University enrollment prediction
5Markov Analysis
- Two competing manufacturers might have 40 60
market share today. May be in two months time,
their market shares would become 45 55
respectively - Predicting these future states involve knowing
the systems probabilities of changing from one
state to another - Matrix of transition probabilities
- This is Markov Process
6Markov Analysis
- A finite number of possible states.
- 2. Probability of change remains the same over
time. - 3. Future state predictable from current state.
- 4. Size of system remains the same.
- 5. States collectively exhaustive.
- 6. States mutually exclusive.
7The Markov Process
? ? P ? ?
8Markov ProcessEquations
9Predicting Future States
Market Share of Grocery Stores AMERICAN FOOD
STORE 40 FOOD MART 30 ATLAS FOODS
30 ?(1)0.4,0.3,0.3
10Predicting Future States
11Predicting Future States
- Will this trend continue in the future?
- Is it an equilibrium state?
- WILL Atlas food lose all of its market share
12Markov Analysis Machine Operations
- P 0.8 0.2
- 0.1 0.9
- State1 machine functioning correctly
- State2 machine functioning incorrectly
- P11 0.8 probability that the machine will be
correctly functioning given it was correctly
functioning last month - ?2?1P1,0P0.8,0.2
- ?3?2P0.8,0.2P0.66,0.34
13Machine Example Periods to Reach Equilibrium
Period
State 1
State 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0 .8 .66 .562 .4934
.44538 .411766 .388236
.371765 .360235
.352165 .346515 .342560
.339792 .337854
0.0 .2 .34 .438 .5066
.55462 .588234
.611763 .628234
.639754 .647834
.653484 .657439
.660207 .662145
14Equilibrium Equations
15Markov System
16Markov System
17Markov System
18Markov System
19Markov System
20Markov System
21Markov System
- At regularly spaced discrete times, the system
undergoes a change of state (possibly back to
same state) - Discrete first order Markov Chain
- PqtSjqt-1Si,qt-2Sk,..
PqtSjqt-1Si - Consider only those processes in which the RHS
is independent of time - State transition probabilities are given by
- aij PqtSjqt-1Si 1lti,jltN
22Markov Models
- A model of sequences of events where the
probability of an event occurring depends upon
the fact that a preceding event occurred. - Observable states 1, 2, , N
- Observed sequences O1, O2, , Ol, , OT
- P(OljO1a,,Ol-1b,Ol1c,)P(OljO1a,,Ol-1
b) - Order n model
- A Markov process is a process which moves from
state to state depending (only) on the previous n
states.
23Markov Models
- First Order Model (n1)
- P(OljOl-1a,Ol-2b,)P(OljOl-1a)
- The state of model depends only on its previous
state. - Components States, initial probabilities state
transition probabilities
24Markov Models
- Consider a simple 3-state Markov model of weather
- Assume that once a day (eg at noon), the weather
is observed as one of the folowiing - State 1 Rain or (snow)
- State 2 Cloudy
- State 3 Sunny
- Transition Probabilities
- 0.4 0.3 0.3
- 0.2 0.6 0.2
- 0.1 0.1 0.8
- Given that on day 1 the weather is sunny
- What is the probability that the weather for the
next 7 days will be S S R R S C S?
25Hidden Markov Model
- HMMs allow you to estimate probabilities of
unobserved events - E.g., in speech recognition, the observed data is
the acoustic signal and the words are the hidden
parameters
26HMMs and their Usage
- HMMs are very common in Computational
Linguistics - Speech recognition (observed acoustic signal,
hidden words) - Handwriting recognition (observed image, hidden
words) - Machine translation (observed foreign words,
hidden words in target language)
27Hidden Markov Models
- Markov Model is used to predict what will come
next based on previous observations. - However, sometimes, what we want to predict is
not what we observed. - Example
- Someone trying to deduce the weather from a piece
of seaweed - For some reason, he can not access weather
information (sun, cloud, rain) directly - But he can know the dampness of a piece of
seaweed (soggy, damp, dryish, dry) - And the state of the seaweed is probabilistically
related to the state of the weather
28Hidden Markov Models
- Hidden Markov Models are used to solve this kind
of problems. - Hidden Markov Model is an extension of First
Order Markov Model - The true states are not observable directly
(Hidden) - Observable states are probabilistic functions of
the hidden states - The hidden system is First Order Markov
29Hidden Markov Models
- A Hidden Markov Model is consist of two sets of
states and three sets of probabilities - hidden states the (TRUE) states of a system
that may be described by a Markov process (e.g.
weather states in our example). - observable states the states of the process
that are visible (e.g. dampness of the
seaweed). - Initial probabilities for hidden states
- Transition probabilities for hidden states
- Confusion probabilities from hidden states to
observable states
30Hidden Markov Models
31Hidden Markov Models
Initial matrix
Transition matrix
Confusion matrix
32The Trellis
33HMM problems
- HMMs are used to solve three kinds of problems
- Finding the probability of an observed sequence
given a HMM (evaluation) - Finding the sequence of hidden states that most
probably generated an observed sequence
(decoding). - The third problem is generating a HMM given a
sequence of observations (learning). learning
the probabilities from training data.
34HMM Problems
- 1. Evaluation
- Problem
- We have a number of HMMs and a sequence of
observations. We may want to know which HMM most
probably generated the given sequence. - Solution
- Computing the probability of the observed
sequences for each HMM. - Choose the one produced highest probability
- Can use Forward algorithm to reduce complexity.
35HMM problems
Pr(dry,damp,soggy HMM) Pr(dry,damp,soggy
sunny,sunny,sunny) Pr(dry,damp,soggy
sunny,sunny ,cloudy) Pr(dry,damp,soggy
sunny,sunny ,rainy) . . . . Pr(dry,damp,soggy
rainy,rainy ,rainy)
36HMM problems
- 2. Decoding
- Problem
- Given a particular HMM and an observation
sequence, we want to know the most likely
sequence of underlying hidden states that might
have generated the observation sequence. - Solution
- Computing the probability of the observed
sequences for each possible sequence of
underlying hidden states. - Choose the one produced highest probability
- Can use Viterbi algorithm to reduce the
complexity.
37HMM Problems
the most probable sequence of hidden states is
the sequence that maximizes Pr(dry,damp,soggy
sunny,sunny,sunny), Pr(dry,damp,soggy
sunny,sunny,cloudy), Pr(dry,damp,soggy
sunny,sunny,rainy), . . . . Pr(dry,damp,soggy
rainy,rainy,rainy)
38HMM problems (cont.)
- 3. Learning
- Problem
- Estimate the probabilities of HMM from training
data - Solution
- Training with labeled data
- Transition probability P(a,b)(number of
transitions from a to b)/ total number of
transitions of a - Confusion probability P(a, o)(number of symbol o
occurrences in state a)/(number of all symbol
occurrences in state a) - Training with unlabeled data
- Baum-Welch algorithm
- The basic idea
- Random generate HMM at the beginning
- Estimate new probability from the previous HMM
until P(current HMM) P( previous HMM) lt e (a
small number)
39Experiments
- Problem
- Parsing a reference string into fields (author,
journal, volume, page, year, etc.) - Model as HMM
- Hidden states fields (author, journal, volume,
etc) and some special characters ( ,, and,
etc.) - Observable states words
- Probability matrixes --learning from training
data - Reference parsing
- Using Viterbi algorithm to find the most possible
sequence of hidden states for an observation
sequences.
40Experiment (cont.)
- Select 1000 reference strings (refer to article)
from APS. - Using the first 750 for training and the rest for
testing. - Do similar feature generalization as that we did
for SVM last time. - M. ? init
- Phys. ? abbrev
- 1994 ? Digs4
- Liu ? CapNonWord
- physics ? LowDictWord
-
- Should but not special processing tag ltetal/gt
41Experiment(cont.)
- Measurement
- Let C be the total number of words (tokens) which
are predicted correctly. - Let N be the total number of words
- Correct Rate
- RC/N 100
- Our result
- N4284 C4219 R98.48
42Conclusion
- HMM is used to model
- What we want to predict is not what we observed
- The underlying system can be model as first order
Markov - HMM assumption
- The next state is independent of all states but
its previous state - The probability matrixes learned from samples are
the actual probability matrixes. - After learning, the probability matrixes will
keep unchanged
43Sequence Models
- Observed biological sequences (DNA, RNA, protein)
can be thought of as the outcomes of random
processes. - So, it makes sense to model sequences using
probabilistic models. - You can think of a sequence model as a little
machine that randomly generates sequences.
44A Simple Sequence Model
- Imagine a tetrahedral (four-sided) die with the
letters A, C, G and T on its sides. - You roll the die 100 times and write down the
letters that come up (down, actually). - This is a simple random sequence model.
45Complete 0-order Markov Model
- To model the length of the sequences that the
model can generate, we need to add start and
end states.
46Generating a Sequence
- This Markov model can generate any DNA sequence.
Associated with each sequence is a path and a
probability. - Start in state S P 1
- Move to state M P1P
- Print x P qXP
- Move to state M PpP or to state E P(1-p) P
- If in state M, go to 3. If in state E, stop.
- Sequence GCAGCT
- Path S, M, M, M, M, M, M, E
- P1qGpqCpqApqGpqCpqT(1-p)
47Using a 0-order Markov Model
- This model can generate any DNA sequence, so it
can be used to model DNA. - We used it when we created scoring matrices for
sequence alignment as the background model. - Its a pretty dumb model, though.
- DNA is not very well modeled by a 0-order Markov
model because the probability of seeing, say, a
G following a C is usually different than a
G following an A - So we need a better models higher order Markov
models.
48Markov Model Order
- This simple sequence model is called a 0-order
Markov model because the probability distribution
of the next letter to be generated doesnt depend
on any (zero) of the letters preceding it.
- The Markov Property
- Let X X1X2XL be a sequence.
- In an n-order Markov sequence model, the
probability distribution of the next letter
depends on the previous n letters generated. - 0-order Pr(XiX1X2Xi-1)Pr(Xi)
- 1-order Pr(XiX1X2Xi-1)Pr(XiXi-1)
- n-order Pr(XiX1X2Xi-1)Pr(XiXi-1Xi-2Xi-n)
49A 1-order Markov Sequence Model
- In a first-order Markov sequence model, the
probability of the next letter depends on what
the previous letter generated was. - We can model this by making a state for each
letter. Each state always emits the letter it is
labeled with. (Not all transitions are shown.)
50A 2-order Markov Model
- To make a second order Markov sequence model,
each state is labelled with two letters. It
emits the second letter in its label. - There would have to be sixteen states AA, AC,
AG, AT, CA, CG, CT etc., plus four states for the
first letter in the sequence A, C, G, T - Each state would have transitions only to states
whose first letter matched their second letter.
51Part of a 2-order Model
- Each state remembers what the previous letter
emitted was in its label.
AC