Hidden Markov Models: Applications in Bioinformatics

About This Presentation

Title:

Hidden Markov Models: Applications in Bioinformatics

Description:

Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003 Definition A Hidden Markov Model (HMM) is a discrete ... – PowerPoint PPT presentation

Number of Views:307

Avg rating:3.0/5.0

Slides: 17

Provided by: dnaCsByu

Learn more at: http://dna.cs.byu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models: Applications in Bioinformatics

1
Hidden Markov Models Applications in
Bioinformatics

Gleb Haynatzki, Ph.D.
Creighton University
March 31, 2003

2
Definition

A Hidden Markov Model (HMM) is a discrete-time
finite-state Markov chain coupled with a sequence
of letters emitted when the Markov chain visits
its states.
States (Q) q1 q2 q3
...
Letters (O) O1 O2 O3

3
Definition (Contd)

The sequence O of emitted letters is called the
observed sequence because we often know it while
not knowing the state sequence Q, which is in
this case called hidden.
The triple
represents the full set of parameters of the
HMM, where P is the transition probability matrix
of the Markov chain, B is the emission
probability matrix, and denotes the initial
distribution vector of the Markov chain.

(P, B,
)
4
Important Calculations

Given any observed sequence O (O1,,OT)
and , efficiently calculate P(O )
and , efficiently calculate the hidden
sequence Q (q1,,qT) that is most likely to
have occurred i.e. find argmaxQ P(Q O)
and assuming a fixed graph structure of the
underlying Markov chain, find the parameters
maximizing P(O
)

(P, B,
)
5
Applications of HMM

Modeling protein families
(1) construct multiple sequence alignments
(2) determine the family of a query sequence
Gene finding through semi-Hidden Markov Models
(semiHMM)

6
HMM for Sequence Alignment
Consider the following Markov chain underlying a
HMM, with three types of states ? ? match
? insert ? ? delete
7
HMM for Sequence Alignment (Cont)

The alphabet A consists of the 20 amino acids and
a delete symbol ( )
Delete states output only with probability 1
Each insert match state has its own
distribution over the 20 amino acids and does not
output

8
HMM for Sequence Alignment (Cont)

There are two extreme situations depending on the
HMM parameters
The emission probs for the match insert states
are uniform over the 20 amino acids the model
produces random sequences
Each state emits one specific amino acid with
prob 1 mi ? mi1 with prob 1 the
model produces the same sequence always

9
HMM for Sequence Alignment (Cont)

Between the two extremes consider a family of
somewhat similar sequences
A tight family of very similar sequences
A loose family with little similarity
Similarity may be confined to certain areas of
the sequences if some match states emit a few
amino acids, while other match states emit all
amino acids uniformly/randomly

10
HMM for Sequence Alignments Procedure

(A) Start with training, or estimating, the
parameters of the model using a set of
training sequences from the protein family
(B) Next, compute the path of states most likely
to have produced each sequence
(C) Amino acids are aligned if both are produced
by the same match state in their paths
(D) Finally, indels are inserted appropriately
for insertions and deletions

11
Important Calculations

Given any observed sequence O (O1,,OT)
and , efficiently calculate P(O )
and , efficiently calculate the hidden
sequence Q (q1,,qT) that is most likely to
have occurred i.e. find argmaxQ P(Q O)
and assuming a fixed graph structure of the
underlying Markov chain, find the parameters
maximizing P(O
)

(P, B,
)
12
Example

Consider CAEFDDH, CDAEFPDDH
Suppose the model has length 10, and the most
likely paths for the two sequences are
m0m1m2m3m4d5d6m7m8m9m10 and
m0m1i1m2m3m4d5 m6m7m8m9m10

13
Example (Contd)

The alignment induced is found by aligning
positions generated by the same match state
m0 m1 m2 m3 m4 d5 d6 m7m8m9m10
C A E F D D H
C D A E F P D D H
m0 m1 i1 m2 m3m4 d5 m6m7m8m9m10

14
Example (End)

This leads to the following alignment
C AEFDDH
CDAEFPDDH

15
HMM Strengths Weaknesses

HMM aligns many sequences with little computing
power
HMM allows the sequences themselves to guide the
alignment
Alignments by HMM are sometimes ambiguous and
some regions are left unaligned in the end
HMM weaknesses come from their strengths the
Markov property and stationarity

Thank you.

Write a Comment

User Comments (0)

About PowerShow.com

Hidden Markov Models: Applications in Bioinformatics - PowerPoint PPT Presentation

Hidden Markov Models: Applications in Bioinformatics

Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003 Definition A Hidden Markov Model (HMM) is a discrete ... – PowerPoint PPT presentation