Protein Quaternary Fold Recognition Using Conditional Graphical Models - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Protein Quaternary Fold Recognition Using Conditional Graphical Models

Description:

Triple beta-spiral fold in Adenovirus Fiber Shaft. Carnegie Mellon. School of Computer Science ... Virus fibers in adenovirus, reovirus and PRD1. Double barrel ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 25
Provided by: erich177
Category:

less

Transcript and Presenter's Notes

Title: Protein Quaternary Fold Recognition Using Conditional Graphical Models


1
Protein Quaternary Fold Recognition Using
Conditional Graphical Models
  • Yan Liu
  • IBM Research
  • Jaime Carbonell (CMU), Vanathi Gopalakrishnan (U
    Pitt), Peter Weigele (MIT)
  • ICML-2007 workshop

2
Snapshot of Cell Biology
3
Example Protein Structures
Triple beta-spiral fold in Adenovirus Fiber Shaft
Adenovirus Fibre Shaft
Virus Capsid
4
Predicting Protein Structures
  • Protein Structure is a key determinant of protein
    function
  • Crystalography to resolve protein structures
    experimentally in-vitro is very expensive, NMR
    can only resolve very-small proteins
  • The gap between the known protein sequences and
    structures
  • 3,023,461 sequences v.s. 36,247 resolved
    structures (1.2)
  • Therefore we need to predict structures in-silico

5
Quaternary Folds and Alignments
  • Protein fold
  • Identifiable regular arrangement of secondary
    structural elements
  • Thus far, a limited number of protein folds have
    been discovered (1000)
  • Very few research work on quaternary folds
  • Complex structures and few labeled data
  • Quaternary fold recognition

6
Related Work
  • Previous Work in General Protein Structure
    Prediction
  • Sequence similarity perspective Altschul et al,
    1997, Durbin et al, 1998, Karplus et al, 1998,
    Jones, 2001
  • Physical forces perspective Jones, 1998
  • Structural biology perspective Efimov, 1991
    Wilmot and Thornton, 1990 Bradley at al, 2001
  • Previous Work in Quaternary Structure Prediction
  • Mostly on partial tasks, e.g. classification of
    protein sequences, analysis of domain-domain
    docking or interaction types and geometric
    regularities and constraints
  • Computational challenges in viral fold
    recognition
  • Complex structures, insufficient data and less
    sequence similarities between membership proteins

7
Conditional Random Fields
  • Hidden Markov model (HMM) Rabiner, 1989
  • Conditional random fields (CRFs) Lafferty et al,
    2001
  • Model conditional probability directly
    (discriminative models, directly optimizable)
  • Allow arbitrary dependencies in observation
  • Adaptive to different loss functions and
    regularizers
  • Promising results in multiple applications
  • But, need to scale up (computationally) and
    extend to long-distance dependencies

8
Our Solution Conditional Graphical Models
Long-range dependency
Local dependency
  • Segmentation CRF
  • Outputs Y M, Wi , where Wi pi, qi, si
  • Feature definition
  • Node feature
  • Local interaction feature
  • Long-range interaction feature

9
Linked Segmentation CRF
  • Node secondary structure elements and/or simple
    fold
  • Edges Local interactions and long-range
    inter-chain and intra-chain interactions
  • L-SCRF conditional probability of y given x is
    defined as

10
Linked Segmentation CRF (II)
  • Objective
  • Training learn the model parameters ?
  • Minimizing regularized negative log loss
  • Iterative search algorithms by seeking the
    direction whose empirical values agree with the
    expectation
  • Complex graphs results in huge computational
    complexity

11
Approximate Inference - Learning
  • Most approximation algorithms cannot handle
    variable number of nodes in the graph, but we
    need variable graph topologies, so
  • Contrastive Divergence Hinton Welling, 2002
  • ??k Ep0 fk E p1fk
  • P0 estimated from empirical samples
  • P1 estimated from a few samples starting the
    seeds from the empirical samples

12
Approximate Inference - Inference
  • Reversible jump MCMC sampling Greens, 1995,
    Schmidler et al, 2001 with Four types of
    Metropolis operators
  • State switching
  • Position switching
  • Segment split
  • Segment merge
  • MAP estimate using simulated annealing reversible
    jump MCMC Andireu et al, 2000
  • Replace the sample with RJ MCMC
  • Theoretically converge on the global optimum

13
Experiments Target Quaternary Fold
  • Triple beta-spirals van Raaij et al. Nature
    1999
  • Virus fibers in adenovirus, reovirus and PRD1
  • Double barrel trimer Benson et al, 2004
  • Coat protein of adenovirus, PRD1, STIV, PBCV

14
Features for Protein Fold Recognition
15
Experiment Results Fold Recognition
  • Double barrel-trimer

Triple beta-spirals
16
Experiment Results Alignment Prediction
17
Experiment ResultsDiscovery of New Membership
Proteins
  • Predicted membership proteins of triple
    beta-spirals can be accessed at
  • http//www.cs.cmu.edu/yanliu/swissprot_list.xls
  • Membership proteins of double barrel-trimer
    suggested by biologists Benson, 2005 compared
    with L-SCRF predictions

18
Conclusion
  • Conditional graphical models for protein
    structure prediction
  • Effective representation for protein structural
    properties
  • Feasibility to incorporate different kinds of
    informative features
  • Efficient inference algorithms for large-scale
    applications
  • A major extension compared with previous work
  • Knowledge representation through graphical models
  • Ability to handle long-range interactions within
    one chain and between chains
  • Future work
  • Automatic learning of graph topology
  • Applications to other domains

19
(No Transcript)
20
Tertiary Fold Recognition ß-Helix fold
  • Histogram and ranks for known ß-helices against
    PDB-minus dataset

5
Chain graph model reduces the real running time
of SCRFs model by around 50 times
21
Fold Alignment Prediction ß-Helix
  • Predicted alignment for known ß -helices on
    cross-family validation

22
Discovery of New Potential ß-helices
  • Run structural predictor seeking potential
    ß-helices from Uniprot (structurally unresolved)
    databases
  • Full list (98 new predictions) can be accessed at
    www.cs.cmu.edu/yanliu/SCRF.html
  • Verification on 3 proteins with later
    experimentally resolved structures from different
    organisms
  • 1YP2 Potato Tuber ADP-Glucose Pyrophosphorylase
  • 1PXZ The Major Allergen From Cedar Pollen
  • GP14 of Shigella bacteriophage as a ß-helix
    protein
  • No single false positive!

23
Previous Work
  • Sequence similarity perspective
  • Sequence similarity searches, e.g. PSI-BLAST
    Altschul et al, 1997
  • Profile HMM, .e.g. HMMER Durbin et al, 1998 and
    SAM Karplus et al, 1998
  • Window-based methods, e.g. PSI_pred Jones, 2001
  • Physical forces perspective
  • Homology modeling or threading, e.g. Threader
    Jones, 1998
  • Structural biology perspective
  • Painstakingly hand-engineered methods for
    specific structures, e.g. aa- and ßß- hairpins,
    ß-turn and ß-helix Efimov, 1991 Wilmot and
    Thornton, 1990 Bradley at al, 2001

Fail to capture the structure properties and
long-range dependencies
Generative models based on rough approximation of
free-energy, perform very poorly on complex
structures
Very Hard to generalize due to built-in
constants, fixed features
24
Graphical Models
  • A graphical model is a graph representation of
    probability dependencies Pearl 1993 Jordan
    1999
  • Node random variables
  • Edges dependency relations
  • Directed graphical model (Bayesian networks)
  • Undirected graphical model (Markov random fields)
Write a Comment
User Comments (0)
About PowerShow.com