Protein secondary structure Prediction - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Protein secondary structure Prediction

Description:

Protein secondary structure Prediction. Why 2nd Structure prediction? The problem ... A protein folds into a unique 3D structure in physiological condition ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 21
Provided by: cs146
Category:

less

Transcript and Presenter's Notes

Title: Protein secondary structure Prediction


1
Protein secondary structure Prediction
  • The problem

Seq RPLQGLVLDTQLYGFPGAFDDWERFMRE
PredCCCCCHHHHHCCCCEEEECCHHHHHHCC
  • Why 2nd Structure prediction?

2
Some historical landmarks
  • 1st generation 70s (50-60 accuracy)
  • single residue statistics, explicit rules
  • Chou Fasman 1974, GOR1 1978
  • 2nd generation 80s (60-70 accuracy)
  • single residue statistics, nearest-neighbors,
    neural network (more with local interaction)
  • GOR3 1987, Levin et al. 1986, Qian Sejnowski
    1988, Holly Karplus, 1989
  • 3rd generation 90s (78 accuracy)
  • neural network with homologous sequence
    information
  • PHD 1993, PSIPRED 1999, SSPRO 2000

3
Chou-Fasman method
  • Straight statistical approach
  • Conformational propensity e.g. helical propensity
  • Categorize each amino acid
  • e.g. helix former, helix breaker, helix
    indifferent
  • Find nucleation sites
  • short sequence with high concentration of a
    category
  • Extend the nucleation sites till a threshold
  • Handle overlaps

4
Chou-Fasman method
Conformational parameters
(Table from Krane and Raymers book)
  • What is the drawback of the method?

5
Introduction to neural network
  • A self learning system using a training data set
  • A perceptron
  • An analogy apple and orange sorter
  • Threshold unit classify a vector of inputs
  • Weight ! How to get it?

6
Basic neural network in secondary structure
prediction
(Figure from Kneller et. al. JMB 1990)
Activation a1
Output y1
Error E1
E1
E2
E3
y1
y2
y3
w11
w12
w13
w14
x1
x2
x3
x4
7
Multi-layer neural network
  • Complete neural network
  • - a set of continuous threshold units
    interconnected in a topology
  • - output of some unit is input of other units

Output units (z)
Hidden units (y)
Input units (x)
x1
x2
x3
x4
8
PHD method (Rost B. Sander C, JMB 1993)
  • Use profile of multiple sequence alignment
  • Multiple layers
  • Accuracy gt70

9
Protein Folding Problem
  • A protein folds into a unique 3D structure in
    physiological condition
  • What is the protein folding problem?
  • 3D structure is a key to understand function
    mechanism
  • Rational drug design
  • 3D structure prediction

10
Protein Folding Problem
  • Hard?
  • Can it be done?
  • Sampling conformational space
  • SS structures offer simplicity
  • Side chain filling the space
  • May not be random search
  • Free energy (? G)
  • Interaction energy Entropic energy

11
Protein Folding Problem
  • Experimental finding
  • Protein does not start folding from the end
  • SS seem to fold early
  • Hydrophobic aa in the core
  • Hydrophilic aa on surface
  • Energy function approximation
  • Physics based (bond length, bond angle, pair
    interactions)
  • Statistics based

12
Scope of the problem
  • Majority of the newly solved protein structure
    share certain level of similarity with a known
    structure
  • Certain families of proteins have no or few
    structures solved
  • Human genes 20k
  • Structure genomics initiative

13
Protein structure prediction
  • Comparative modeling
  • gt30 sequence identify
  • Fold recognition formally known as threading
  • twilight zone lt25 sequence identity
  • Ab initio
  • new fold

14
CASP
Compare and rank
Experimentally solved structure Predicted
structure
CASP 5 2003 papers e.g. Skolnick (2003)
Proteins 53p469-79 Ginalski (2003) Proteins
53 p410-17
15
Comparative Modeling
http//www.salilab.org/andras/watanabe/main.html
  • Sequence identity vs. structure overlap (Fig)

16
Comparative Modeling
  • Search for structures
  • pair-wise sequence alignment with database
  • multiple sequence alignment -gt profile
  • fold assignment / threading use structure
    information in comparison
  • Select template
  • sequence similarity, evolutionary relationship,
    environment, resolution
  • Sequence alignment (target and template)
  • standard method with tune

17
Model Building
  • Assembly of rigid bodies
  • dissecting structure into core, loops and
    side- chains
  • Satisfy spatial constraints (Fig.)
  • derive spatial constraints, find a structure
    that optimize all the constraints
  • spatial constraints generated from
  • input alignment
  • general spatial preferences found in known
    structures
  • molecular force field

18
Ab Inito Prediction
  • Challenge
  • Search space
  • Energy function
  • Reduction in search space
  • use lattice
  • use simplified amino acids
  • use building blocks available in nature
  • Energy function
  • physics
  • statistics - empirical

19
Ab inito 3D Structure prediction
An example - ROSETTA
Simons KT, Kooperberg C, Huang E, Baker D J Mol
Biol. (1997) 268, 209-225 Schonbrun J, Wedemeyer
W, Baker D Current Opinion in Structure biology,
(2002), 12348-54
ROSETTA narrow search - use local structure
available statistical based energy function
one of the top few ab initio methods in CASP4.
20
ROSETTA segment matching
Observations Analysis of 9-a.a. segments in
structure database distribution of the
conformations of 9-mers
Main idea of the method build segment
conformational library (fragment library for
3mer and 9mer) put pieces together better
(energy function and search space)
Write a Comment
User Comments (0)
About PowerShow.com