Title: Hidden Markov Models (HMM) in Sequence Analysis
1Hidden Markov Models (HMM) in Sequence Analysis
2Current Applications
- Multiple Sequence Alignment
- PFAMProtein families database of alignments and
HMMs - HMMpro at www.netid.com,(Baldi, Chauvin and
Mittal-Henkle) - HMMER at hmmer.wustl.edu(S.Eddy)
- SAM (Karplus et al.) at www.cse.ucsc.edu/research/
compbio/sam.html - Gene finding (GLIMMER)
- Motif/Promoter region finding
3Markov chains
4A Markov Chain of Weather
5A Markov chain
For all L
The current state of the chain only depends on
the last state (not on the future no memory)
6(No Transcript)
7Weather and Seaweed
Hidden states the (TRUE) states of a system
that may be described by a Markov process (e.g.,
the weather). Observable states (symbols) the
states of the process that are visible' (e.g.,
seaweed dampness).
8Emission Probability
Output matrix (emission probability) containing
the probability of observing a particular
observable state given that the hidden model is
in a particular hidden state. Initial
Distribution contains the probability of the
(hidden) model being in a particular hidden state
at time t 1. State transition matrix holding
the probability of a hidden state given the
previous hidden state.
9(No Transcript)
10(No Transcript)
11Ex2.HMM for Sequence Alignment
HMM with 3 states that emits residue
pairs (M)atch emit an aligned pair (D)elete1
emit a residue in seq.1 and a
gap in seq.2 (D)elete2 The converse of D1
Emission Prob. If in M, emit same letter with
probability 0.24 (for each letter) If in D1 or
D2 emit all letters uniformly
M 0.9
D1 0.05
D2 0.05
Transition matrix A Transition matrix A Transition matrix A Transition matrix A
M D1 D2
M 0.9 0.05 0.05
D1 0.95 0.05 0
D2 0.95 0 0.05
12Ex3. HMM for gene finding
An HMM for unspliced genes. x non-coding DNA c
coding state
13HMM Illustration
- An Occasionally Dishonest Casino
14(No Transcript)
15A sequence of rolls by the casino player
6
4
6
6
6
1
66
66
6
3
1
6
3
1
6
1
6
6
6
6
16(No Transcript)
17(No Transcript)
18Question 1
Evaluation
GIVEN
A sequence of rolls by the casino player
6
4
6
6
6
1
66
66
6
3
1
6
3
1
6
1
6
6
6
6
QUESTION
How likely is this sequence, given our model of
how the casino
works?
This is the
EVALUATION
problem in HMMs
19(No Transcript)
20Question 3
Learning
GIVEN
A sequence of rolls by the casino player
124552
6
4
6
214
6
14
6
13
6
13
666
1
66
4
66
1
6
3
66
1
6
3
66
1
6
3
6
1
6
515
6
1511514
6
1235
6
2344
QUESTION
How loaded is the loaded die? How fair is the
fair die? How
often
does the casino player change from fair to
loaded, and back?
This is the
LEARNING
question in HMMs
Lecture 4, Thursday April 10, 2003
21(No Transcript)
22N
23(No Transcript)
24The three main questions on HMMs
1.
Evaluation
GIVEN
a HMM M,
and a sequence x,
FIND
Prob
x M
2.
Decoding
GIVEN
a HMM M,
and a sequence x,
FIND
the sequence
of states that maximizes P x,
M
p
p
3.
Learning
GIVEN
a HMM M, with unspecified transition/emission
probs.,
and a sequence x,
FIND
parameters
(
e
(.),
a
) that maximize P x
q
q
i
ij
Lecture 4, Thursday April 10, 2003
250.90
0.10
26Transition / Emission Probability
- Hidden State Space
- S0 (fair), 1(loaded)
- Observable symbols
- S1,2,3,4,5,6
- At position i in a sequence of N tosses,
transition prob - Emission prob
27- Joint Probability of a sequence x of length N and
a corresponding state sequence (a parse) p
28The Optimal State Path
- The state sequence that maximizes the joint
probability (or the most probable path given the
observed sequence)
29Finding the Optimal Path
- The joint probability is multiplicative
- ?The log joint probability is additive
30Finding the Optimal Path
- The joint probability is multiplicative
- ?The log joint probability is additive
- Use dynamic programming!
- The Viterbi Algorithm
31Deriving the Viterbi Algorithm
- Denote the maximum joint prob of x and p up to
position N that ends with hidden state
by - is the corresponding best parse ending with
32Deriving the Viterbi Algorithm
- Denote the maximum joint prob of x and p up to
position N that ends with by - Or (3.8) in Durbins book pp 55.
33A short sequence of rolls
3 1 5 1 1 6 6 6 6 1
Fair 0.51/61/12 1/120.95/60.013, 1/200.05/6 0.0021, -- 0.0003, -- 5e-5, -- 7.5e-6, -- 1.2e-6, 2.1e-8 1.9e-7, 5e-9 3e-8, 2.4e-9 5e-9, 1e-9
Loaded 0.51/101/20 1/120.05/10, 0.00475 --,0.00045 0.00210.05/10, 0.000450.95 /104e-5 3e-40.05/1015e-7, 4e-6 5e-5 0.05/2 1.25e-6, 1e-7 1.9e-7, 6e-7 3e-8, 3e-7 --, 1.4e-7 1.5e-10, 1.3e-8
34Optimal Path of fair/loaded die