Hidden Markov Models: Applications in Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

Hidden Markov Models: Applications in Bioinformatics

Description:

Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003 Definition A Hidden Markov Model (HMM) is a discrete ... – PowerPoint PPT presentation

Number of Views:307
Avg rating:3.0/5.0
Slides: 17
Provided by: dnaCsByu
Learn more at: http://dna.cs.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Hidden Markov Models: Applications in Bioinformatics


1
Hidden Markov Models Applications in
Bioinformatics
  • Gleb Haynatzki, Ph.D.
  • Creighton University
  • March 31, 2003

2
Definition
  • A Hidden Markov Model (HMM) is a discrete-time
    finite-state Markov chain coupled with a sequence
    of letters emitted when the Markov chain visits
    its states.
  • States (Q) q1 q2 q3
    ...
  • Letters (O) O1 O2 O3

3
Definition (Contd)
  • The sequence O of emitted letters is called the
    observed sequence because we often know it while
    not knowing the state sequence Q, which is in
    this case called hidden.
  • The triple
  • represents the full set of parameters of the
    HMM, where P is the transition probability matrix
    of the Markov chain, B is the emission
    probability matrix, and denotes the initial
    distribution vector of the Markov chain.


(P, B,
)
4
Important Calculations
  • Given any observed sequence O (O1,,OT)
  • and , efficiently calculate P(O )
  • and , efficiently calculate the hidden
    sequence Q (q1,,qT) that is most likely to
    have occurred i.e. find argmaxQ P(Q O)
  • and assuming a fixed graph structure of the
    underlying Markov chain, find the parameters
  • maximizing P(O
    )

(P, B,
)
5
Applications of HMM
  • Modeling protein families
  • (1) construct multiple sequence alignments
  • (2) determine the family of a query sequence
  • Gene finding through semi-Hidden Markov Models
    (semiHMM)

6
HMM for Sequence Alignment
Consider the following Markov chain underlying a
HMM, with three types of states ? ? match
? insert ? ? delete
7
HMM for Sequence Alignment (Cont)
  • The alphabet A consists of the 20 amino acids and
    a delete symbol ( )
  • Delete states output only with probability 1
  • Each insert match state has its own
    distribution over the 20 amino acids and does not
    output

8
HMM for Sequence Alignment (Cont)
  • There are two extreme situations depending on the
    HMM parameters
  • The emission probs for the match insert states
    are uniform over the 20 amino acids the model
    produces random sequences
  • Each state emits one specific amino acid with
    prob 1 mi ? mi1 with prob 1 the
    model produces the same sequence always

9
HMM for Sequence Alignment (Cont)
  • Between the two extremes consider a family of
    somewhat similar sequences
  • A tight family of very similar sequences
  • A loose family with little similarity
  • Similarity may be confined to certain areas of
    the sequences if some match states emit a few
    amino acids, while other match states emit all
    amino acids uniformly/randomly

10
HMM for Sequence Alignments Procedure
  • (A) Start with training, or estimating, the
    parameters of the model using a set of
    training sequences from the protein family
  • (B) Next, compute the path of states most likely
    to have produced each sequence
  • (C) Amino acids are aligned if both are produced
    by the same match state in their paths
  • (D) Finally, indels are inserted appropriately
    for insertions and deletions

11
Important Calculations
  • Given any observed sequence O (O1,,OT)
  • and , efficiently calculate P(O )
  • and , efficiently calculate the hidden
    sequence Q (q1,,qT) that is most likely to
    have occurred i.e. find argmaxQ P(Q O)
  • and assuming a fixed graph structure of the
    underlying Markov chain, find the parameters
  • maximizing P(O
    )

(P, B,
)
12
Example
  • Consider CAEFDDH, CDAEFPDDH
  • Suppose the model has length 10, and the most
    likely paths for the two sequences are
  • m0m1m2m3m4d5d6m7m8m9m10 and
  • m0m1i1m2m3m4d5 m6m7m8m9m10

13
Example (Contd)
  • The alignment induced is found by aligning
    positions generated by the same match state
  • m0 m1 m2 m3 m4 d5 d6 m7m8m9m10
  • C A E F D D H
  • C D A E F P D D H
  • m0 m1 i1 m2 m3m4 d5 m6m7m8m9m10

14
Example (End)
  • This leads to the following alignment
  • C AEFDDH
  • CDAEFPDDH

15
HMM Strengths Weaknesses
  • HMM aligns many sequences with little computing
    power
  • HMM allows the sequences themselves to guide the
    alignment
  • Alignments by HMM are sometimes ambiguous and
    some regions are left unaligned in the end
  • HMM weaknesses come from their strengths the
    Markov property and stationarity

16
  • Thank you.
Write a Comment
User Comments (0)
About PowerShow.com