HIDDEN MARKOV MODELS - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

HIDDEN MARKOV MODELS

Description:

HIDDEN MARKOV MODELS Prof. Navneet Goyal Department of Computer Science BITS, Pilani Topics Markov Models Hidden Markov Models HMM Problems Application to Sequence ... – PowerPoint PPT presentation

Number of Views:200
Avg rating:3.0/5.0
Slides: 52
Provided by: csisBits6
Category:

less

Transcript and Presenter's Notes

Title: HIDDEN MARKOV MODELS


1
HIDDEN MARKOV MODELS
  • Prof. Navneet Goyal
  • Department of Computer Science
  • BITS, Pilani

2
Topics
  • Markov Models
  • Hidden Markov Models
  • HMM Problems
  • Application to Sequence Alignment

3
Markov Analysis
  • A technique that deals with the probabilities of
    future occurrences by analyzing presently known
    probabilities
  • Founder of the concept was A.A. Markov whose 1905
    studies of sequence of experiments conducted in a
    chain were used to describe the principle of
    Brownian motion

4
Markov Analysis
  • Applications
  • Market share analysis
  • Bad debt prediction
  • Speech recognition
  • University enrollment prediction

5
Markov Analysis
  • Two competing manufacturers might have 40 60
    market share today. May be in two months time,
    their market shares would become 45 55
    respectively
  • Predicting these future states involve knowing
    the systems probabilities of changing from one
    state to another
  • Matrix of transition probabilities
  • This is Markov Process

6
Markov Analysis
  • A finite number of possible states.
  • 2. Probability of change remains the same over
    time.
  • 3. Future state predictable from current state.
  • 4. Size of system remains the same.
  • 5. States collectively exhaustive.
  • 6. States mutually exclusive.

7
The Markov Process
? ? P ? ?
8
Markov ProcessEquations
9
Predicting Future States
Market Share of Grocery Stores AMERICAN FOOD
STORE 40 FOOD MART 30 ATLAS FOODS
30 ?(1)0.4,0.3,0.3
10
Predicting Future States
11
Predicting Future States
  • Will this trend continue in the future?
  • Is it an equilibrium state?
  • WILL Atlas food lose all of its market share

12
Markov Analysis Machine Operations
  • P 0.8 0.2
  • 0.1 0.9
  • State1 machine functioning correctly
  • State2 machine functioning incorrectly
  • P11 0.8 probability that the machine will be
    correctly functioning given it was correctly
    functioning last month
  • ?2?1P1,0P0.8,0.2
  • ?3?2P0.8,0.2P0.66,0.34

13
Machine Example Periods to Reach Equilibrium
Period
State 1
State 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0 .8 .66 .562 .4934
.44538 .411766 .388236
.371765 .360235
.352165 .346515 .342560
.339792 .337854
0.0 .2 .34 .438 .5066
.55462 .588234
.611763 .628234
.639754 .647834
.653484 .657439
.660207 .662145
14
Equilibrium Equations
15
Markov System

16
Markov System
17
Markov System
18
Markov System
19
Markov System
20
Markov System
21
Markov System
  • At regularly spaced discrete times, the system
    undergoes a change of state (possibly back to
    same state)
  • Discrete first order Markov Chain
  • PqtSjqt-1Si,qt-2Sk,..
    PqtSjqt-1Si
  • Consider only those processes in which the RHS
    is independent of time
  • State transition probabilities are given by
  • aij PqtSjqt-1Si 1lti,jltN

22
Markov Models
  • A model of sequences of events where the
    probability of an event occurring depends upon
    the fact that a preceding event occurred.
  • Observable states 1, 2, , N
  • Observed sequences O1, O2, , Ol, , OT
  • P(OljO1a,,Ol-1b,Ol1c,)P(OljO1a,,Ol-1
    b)
  • Order n model
  • A Markov process is a process which moves from
    state to state depending (only) on the previous n
    states.

23
Markov Models
  • First Order Model (n1)
  • P(OljOl-1a,Ol-2b,)P(OljOl-1a)
  • The state of model depends only on its previous
    state.
  • Components States, initial probabilities state
    transition probabilities

24
Markov Models
  • Consider a simple 3-state Markov model of weather
  • Assume that once a day (eg at noon), the weather
    is observed as one of the folowiing
  • State 1 Rain or (snow)
  • State 2 Cloudy
  • State 3 Sunny
  • Transition Probabilities
  • 0.4 0.3 0.3
  • 0.2 0.6 0.2
  • 0.1 0.1 0.8
  • Given that on day 1 the weather is sunny
  • What is the probability that the weather for the
    next 7 days will be S S R R S C S?

25
Hidden Markov Model
  • HMMs allow you to estimate probabilities of
    unobserved events
  • E.g., in speech recognition, the observed data is
    the acoustic signal and the words are the hidden
    parameters

26
HMMs and their Usage
  • HMMs are very common in Computational
    Linguistics
  • Speech recognition (observed acoustic signal,
    hidden words)
  • Handwriting recognition (observed image, hidden
    words)
  • Machine translation (observed foreign words,
    hidden words in target language)

27
Hidden Markov Models
  • Markov Model is used to predict what will come
    next based on previous observations.
  • However, sometimes, what we want to predict is
    not what we observed.
  • Example
  • Someone trying to deduce the weather from a piece
    of seaweed
  • For some reason, he can not access weather
    information (sun, cloud, rain) directly
  • But he can know the dampness of a piece of
    seaweed (soggy, damp, dryish, dry)
  • And the state of the seaweed is probabilistically
    related to the state of the weather

28
Hidden Markov Models
  • Hidden Markov Models are used to solve this kind
    of problems.
  • Hidden Markov Model is an extension of First
    Order Markov Model
  • The true states are not observable directly
    (Hidden)
  • Observable states are probabilistic functions of
    the hidden states
  • The hidden system is First Order Markov

29
Hidden Markov Models
  • A Hidden Markov Model is consist of two sets of
    states and three sets of probabilities
  • hidden states the (TRUE) states of a system
    that may be described by a Markov process (e.g.
    weather states in our example).
  • observable states the states of the process
    that are visible (e.g. dampness of the
    seaweed).
  • Initial probabilities for hidden states
  • Transition probabilities for hidden states
  • Confusion probabilities from hidden states to
    observable states

30
Hidden Markov Models
31
Hidden Markov Models
Initial matrix
Transition matrix
Confusion matrix
32
The Trellis
33
HMM problems
  • HMMs are used to solve three kinds of problems
  • Finding the probability of an observed sequence
    given a HMM (evaluation)
  • Finding the sequence of hidden states that most
    probably generated an observed sequence
    (decoding).
  • The third problem is generating a HMM given a
    sequence of observations (learning). learning
    the probabilities from training data.

34
HMM Problems
  • 1. Evaluation
  • Problem
  • We have a number of HMMs and a sequence of
    observations. We may want to know which HMM most
    probably generated the given sequence.
  • Solution
  • Computing the probability of the observed
    sequences for each HMM.
  • Choose the one produced highest probability
  • Can use Forward algorithm to reduce complexity.

35
HMM problems
Pr(dry,damp,soggy HMM) Pr(dry,damp,soggy
sunny,sunny,sunny) Pr(dry,damp,soggy
sunny,sunny ,cloudy) Pr(dry,damp,soggy
sunny,sunny ,rainy) . . . . Pr(dry,damp,soggy
rainy,rainy ,rainy)
36
HMM problems
  • 2. Decoding
  • Problem
  • Given a particular HMM and an observation
    sequence, we want to know the most likely
    sequence of underlying hidden states that might
    have generated the observation sequence.
  • Solution
  • Computing the probability of the observed
    sequences for each possible sequence of
    underlying hidden states.
  • Choose the one produced highest probability
  • Can use Viterbi algorithm to reduce the
    complexity.

37
HMM Problems
the most probable sequence of hidden states is
the sequence that maximizes Pr(dry,damp,soggy
sunny,sunny,sunny), Pr(dry,damp,soggy
sunny,sunny,cloudy), Pr(dry,damp,soggy
sunny,sunny,rainy), . . . . Pr(dry,damp,soggy
rainy,rainy,rainy)
38
HMM problems (cont.)
  • 3. Learning
  • Problem
  • Estimate the probabilities of HMM from training
    data
  • Solution
  • Training with labeled data
  • Transition probability P(a,b)(number of
    transitions from a to b)/ total number of
    transitions of a
  • Confusion probability P(a, o)(number of symbol o
    occurrences in state a)/(number of all symbol
    occurrences in state a)
  • Training with unlabeled data
  • Baum-Welch algorithm
  • The basic idea
  • Random generate HMM at the beginning
  • Estimate new probability from the previous HMM
    until P(current HMM) P( previous HMM) lt e (a
    small number)

39
Experiments
  • Problem
  • Parsing a reference string into fields (author,
    journal, volume, page, year, etc.)
  • Model as HMM
  • Hidden states fields (author, journal, volume,
    etc) and some special characters ( ,, and,
    etc.)
  • Observable states words
  • Probability matrixes --learning from training
    data
  • Reference parsing
  • Using Viterbi algorithm to find the most possible
    sequence of hidden states for an observation
    sequences.

40
Experiment (cont.)
  • Select 1000 reference strings (refer to article)
    from APS.
  • Using the first 750 for training and the rest for
    testing.
  • Do similar feature generalization as that we did
    for SVM last time.
  • M. ? init
  • Phys. ? abbrev
  • 1994 ? Digs4
  • Liu ? CapNonWord
  • physics ? LowDictWord
  • Should but not special processing tag ltetal/gt

41
Experiment(cont.)
  • Measurement
  • Let C be the total number of words (tokens) which
    are predicted correctly.
  • Let N be the total number of words
  • Correct Rate
  • RC/N 100
  • Our result
  • N4284 C4219 R98.48

42
Conclusion
  • HMM is used to model
  • What we want to predict is not what we observed
  • The underlying system can be model as first order
    Markov
  • HMM assumption
  • The next state is independent of all states but
    its previous state
  • The probability matrixes learned from samples are
    the actual probability matrixes.
  • After learning, the probability matrixes will
    keep unchanged

43
Sequence Models
  • Observed biological sequences (DNA, RNA, protein)
    can be thought of as the outcomes of random
    processes.
  • So, it makes sense to model sequences using
    probabilistic models.
  • You can think of a sequence model as a little
    machine that randomly generates sequences.

44
A Simple Sequence Model
  • Imagine a tetrahedral (four-sided) die with the
    letters A, C, G and T on its sides.
  • You roll the die 100 times and write down the
    letters that come up (down, actually).
  • This is a simple random sequence model.

45
Complete 0-order Markov Model
  • To model the length of the sequences that the
    model can generate, we need to add start and
    end states.

46
Generating a Sequence
  • This Markov model can generate any DNA sequence.
    Associated with each sequence is a path and a
    probability.
  • Start in state S P 1
  • Move to state M P1P
  • Print x P qXP
  • Move to state M PpP or to state E P(1-p) P
  • If in state M, go to 3. If in state E, stop.
  • Sequence GCAGCT
  • Path S, M, M, M, M, M, M, E
  • P1qGpqCpqApqGpqCpqT(1-p)

47
Using a 0-order Markov Model
  • This model can generate any DNA sequence, so it
    can be used to model DNA.
  • We used it when we created scoring matrices for
    sequence alignment as the background model.
  • Its a pretty dumb model, though.
  • DNA is not very well modeled by a 0-order Markov
    model because the probability of seeing, say, a
    G following a C is usually different than a
    G following an A
  • So we need a better models higher order Markov
    models.

48
Markov Model Order
  • This simple sequence model is called a 0-order
    Markov model because the probability distribution
    of the next letter to be generated doesnt depend
    on any (zero) of the letters preceding it.
  • The Markov Property
  • Let X X1X2XL be a sequence.
  • In an n-order Markov sequence model, the
    probability distribution of the next letter
    depends on the previous n letters generated.
  • 0-order Pr(XiX1X2Xi-1)Pr(Xi)
  • 1-order Pr(XiX1X2Xi-1)Pr(XiXi-1)
  • n-order Pr(XiX1X2Xi-1)Pr(XiXi-1Xi-2Xi-n)

49
A 1-order Markov Sequence Model
  • In a first-order Markov sequence model, the
    probability of the next letter depends on what
    the previous letter generated was.
  • We can model this by making a state for each
    letter. Each state always emits the letter it is
    labeled with. (Not all transitions are shown.)

50
A 2-order Markov Model
  • To make a second order Markov sequence model,
    each state is labelled with two letters. It
    emits the second letter in its label.
  • There would have to be sixteen states AA, AC,
    AG, AT, CA, CG, CT etc., plus four states for the
    first letter in the sequence A, C, G, T
  • Each state would have transitions only to states
    whose first letter matched their second letter.

51
Part of a 2-order Model
  • Each state remembers what the previous letter
    emitted was in its label.

AC
Write a Comment
User Comments (0)
About PowerShow.com