Profile Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation
Title:

Profile Hidden Markov Models

Description:

state generally a hidden entity which spawns symbols or features ... http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/temp/624288710157514.html. References. Durbin. ... – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 24
Provided by: djo4
Learn more at: http://www.bx.psu.edu
Category:
Tags: cam | hidden | markov | models | profile

less

Transcript and Presenter's Notes

Title: Profile Hidden Markov Models


1
Profile Hidden Markov Models
  • Bioinformatics Fall-2004
  • Dr Webb Miller and Dr Claude Depamphilis
  • Dhiraj Joshi
  • Department of Computer Science and Engineering
  • The Pennsylvania State University

2
Outline
  • Introduction to HMMs
  • Profile HMMs
  • Available resources for Profile HMMs
  • Some online demonstrations

3
Introduction to HMMs
  • Hidden Markov Models Formalism
  • statistical techniques for modeling patterns in
    data
  • First order Markov property - memorylessness
  • state generally a hidden entity which spawns
    symbols or features
  • the same symbol could be emitted by several
    states
  • HMM characterized by transition probabilities and
    emission distribution

4
Introduction to HMMs
  • Hidden Markov Models Parameter Estimation
  • Parameters- transition probabilities and emission
    probabilities
  • iterative computational algorithms used
  • EM algorithm, Viterbi algorithm
  • algorithms based on dynamic programming to save
    computational cost
  • usually the iterations involve variants of the
    following two steps
  • estimate state sequence which maximizes
    likelihood under a parameter set
  • update parameter set based on the estimated state
    sequence
  • algorithms converge to local optima sometimes

5
Outline
  • Introduction to HMMs
  • Profile HMMs
  • Available resources for Profile HMMs
  • Some online demonstrations

6
Profile Hidden Markov Models
  • Stochastic methods to model multiple sequence
    alignments proteins and dna sequences
  • Potential application domains
  • protein families could be modeled as an HMM or a
    group of HMMs
  • constructing a profile HMM
  • new protein sequences could be aligned with
    stored models to detect remote homology
  • aligning a sequence with a stored profile HMM
  • align two or more protein family profile HMMs to
    detect homology
  • finding statistical similarities between two
    profile HMM models

7
Profile Hidden Markov Models
  • Constructing a profile HMM
  • A multiple sequence alignment assumed
  • each consensus column can exist in 3 states
  • match, insert and delete states
  • number of states depends upon length of the
    alignment

8
Profile Hidden Markov Models
  • A typical profile HMM architecture
  • squares represent match states
  • diamonds represent insert states
  • circles represent delete states
  • arrows represent transitions

9
Profile Hidden Markov Models
  • A typical profile HMM architecture
  • transition between match states -
  • transition from match state to insert state -
  • transition within insert state -
  • transition from match state to delete state -
  • transition within delete state -
  • emission of symbol at a state -

10
Profile Hidden Markov Models
  • Estimation of parameters
  • transition probabilities estimated as frequency
    of a transition in a given alignment
  • emission probabilities estimated as frequency of
    an emission in a given alignment
  • pseudo counts usually introduced to account for
    transititions / emissions which were not present
    in the alignment

11
Profile Hidden Markov Models
  • Estimation of parameters
  • with pseudo counts
  • Dirichlet prior distribution used to determine
    pseudo counts

12
Profile Hidden Markov Models
  • Scoring a sequence against a profile HMM
  • Viterbi algorithm used to find the best state
    path
  • Simulated annealing based methods also used
  • Maximization criteria log likelihood or log
    odds
  • Log likelihood score generally depends on length
    of sequence and hence not preferred
  • If an alignment not given initially, the
    alignment could be learnt iteratively using
    Viterbi

13
Profile Hidden Markov Models
  • Comparing two profile HMMs
  • Profile-profile comparison tool based on
    information theory
  • based on Kullback-Leibler divergence criterion
    for comparing 2 statistical distributions
  • dynamic programming used to compare entire
    profiles
  • detect weak similarities between models

14
Outline
  • Introduction to HMMs
  • Profile HMMs
  • Available resources for Profile HMMs
  • Some online demonstrations

15
Available resources for Profile HMMs
  • HMMER and SAM one of the first available programs
    for profile HMMs
  • HMMER S Eddy at Washington University
  • SAM Sequence alignment and Modeling System
  • R. Hughey at University of
    California, Santa Cruz
  • available free for research
  • SAM has online servers to perform sequence
    comparisons
  • http//www.cse.ucsc.edu/research/co
    mpbio/sam.html

16
Available resources for Profile HMMs
  • InterPro consortium in Europe has many resources
    for protein data
  • Database of protein families and domains
  • Brings together several different databases under
    one umbrella
  • Pfam and Superfamily are profile HMM libraries
    associated with Interpro
  • Pfam based on HMMER search and Superfamily based
    on SAM search and modeling

17
Available resources for Profile HMMs
  • SAMs iterative approach for building HMM
  • find a set of close homologs using BLASTP
  • learn the alignment and build model using close
    homologs
  • use BLASTP to get more remote homologs using the
    first set of sequences (relax the E value)
  • iteratively refine the HMM model
  • SAM uses Dirichlet priors as pseudo counts for
    parameters
  • Hand tuned seed alignments not required as the
    alignments are learnt by the algorithm unlike
    HMMER

18
Available resources for Profile HMMs
  • SUPERFAMILY database incorporates
  • library of profile HMMs representing all proteins
    of known structure
  • assignments to predicted proteins from all
    completely sequenced genomes
  • search and alignment services
  • models and domain assignments are freely
    available
  • Based on SCOP classification of protein domains
  • SAM HMM iterative procedure used for model
    building and sequence alignment

19
Available resources for Profile HMMs
  • In Superfamily
  • Each SCOP superfamily is represented as an HMM
    model
  • Model built using SAM procedure based 4 variants
  • accurate structure based alignments
  • hand labeled alignments
  • autonomic alignments using ClustalW
  • sequence members used separately as seeds
  • Assignment of superfamilies
  • for a given sequence, every model is scored
    across the whole sequence using Viterbi scoring
  • model which scores highest has its superfamily
    assigned to the region

20
Outline
  • Introduction to HMMs
  • Profile HMMs
  • Available resources for Profile HMMs
  • Some online demonstrations

21
Online Demonstrations
  • http//supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/temp/6
    24288710157514.html

22
References
  • Durbin. R, Eddy. S, Krough. A, and Mitchenson. G,
    Biological Sequence Analysis, Cambridge
    University Press, 2002
  • Baldi. P and Brunak. S, Bioinformatics, the
    Machine Learning Approach, the MIT Press,
    Cambridge, 1998
  • Eddy. S, Profile Hidden Markov Models,
    Bioinformatics Review, vol. 19, no. 8, pp.
    755-763, 1998
  • Karplus. K, Barrett. C, and Hughey. R, Hidden
    Markov models for detecting remote homologies,
    Bioinformatics, vol. 14, no. 10, pp. 846-856,
    1998
  • Madera. M, Gough, J, A comparison of profile
    hidden Markov model procedures for remote
    homology detection, Nucleic Acids Research,
    vol. 30, no. 19, pp. 4321-4328, 2002
  • Gough. J, Karplus. K, Hughey. R, and Chothia. C,
    Assignment of Homology to Genome Sequences
    using a Library of Hidden Markov Models that
    represent all Proteins of known structure, J.
    Mol. Biol., 313, pp. 903-919, 2001

23
References
  • Yona. G, Levitt. M, Within the Twilight Zone A
    sensitive Profile-Profile comparison tool based
    on Information Theory, J. Mol. Biol., 315,
    1257-1275, 2002
  • Mandera. M, Vogel. C, Kummerfeld. K, Chothia. C,
    and Gough. J, The SUPERFAMILY database in 2004
    additions and improvements, Nucleic Acids
    Research, vol. 32, Database Issue, D235-239, 2004
  • Bateman. A, Birney. E, Durbin. R, Eddy. S, Finn.
    R, Sonnhammer. E, Pfam 3.1 1313 multiple
    alignments and profile HMMs match the majority of
    proteins, Nucleic Acids Research, vol. 27, no.
    1, 1999
  • Andreeva. A, et. al., SCOP database in 2004
    refinements integrate structure and sequence
    family data, Nucleic Acids Research, vol. 32,
    Database Issue, D226-D229,2004
  • Many other online resources and tutorials
Write a Comment
User Comments (0)
About PowerShow.com