BCB 444/544 - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

BCB 444/544

Description:

Orientation can be predicted using 'positive inside' rule ... periplasmic side (space between inner & outer membrane in gram-negative bacteria) ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 19

Provided by: publicI

Category:

Tags: bcb | grampositive | helical | in | initio

more less

Transcript and Presenter's Notes

Title: BCB 444/544

1
BCB 444/544

Lab 7
Protein Structure Prediction
Oct 11, 2007

2
Chp 14 - Secondary Structure Prediction

SECTION V STRUCTURAL BIOINFORMATICS
Xiong Chp 14
Protein Secondary Structure Prediction
Secondary Structure Prediction for Globular
Proteins
Secondary Structure Prediction for Transmembrane
Proteins
Coiled-Coil Prediction

3
Secondary Structure Prediction

Has become highly accurate in recent years (gt85)
Usually 3 (or 4) state predictions
H ?-helix
E ?-strand
C coil (or loop)
(T turn)

4
Secondary Structure Prediction Methods

1st Generation methods
Ab initio - used relatively small dataset of
structures available
Chou-Fasman - based on amino acid propensities
(3-state)
GOR - also propensity-based (4-state)
2nd Generation methods
based on much larger datasets of structures now
available
GOR II, III, IV, SOPM, GOR V, FDM
3rd Generation methods
Homology-based Neural network based
PHD, PSIPRED, SSPRO, PROF, HMMSTR, CDM
Meta-Servers
combine several different methods
Consensus Ensemble based
JPRED, PredictProtein, Proteus

5
Secondary Structure Prediction Servers

Prediction Evaluation?
Q3 score - of residues correctly predicted
(3-state)
in cross-validation experiments
Best results? Meta-servers
http//expasy.org/tools/ (scroll for 2'
structure prediction)
http//www.russell.embl-heidelberg.de/gtsp/secstru
cpred.html
JPred www.compbio.dundee.ac.uk/www-jpred
PredictProtein http//www.predictprotein.org/
Rost, Columbia
Best "individual" programs? ??
CDM http//gor.bb.iastate.edu/cdm/
SenJernigan, ISU
FDM (not available separately as server)
ChengJernigan, ISU
GOR V http//gor.bb.iastate.edu/
KloczkowskyJernigan, ISU

6
Consensus Data Mining (CDM)

Developed by Jernigan Group at ISU
Basic premise combination of 2 complementary
methods can enhance performance by harnessing
distinct advantages of both methods combines
FDM GOR V
FDM - Fragment Data Mining - exploits
availability of sequence-similar fragments in the
PDB, which can lead to highly accurate prediction
- much better than GOR V - for such fragments,
but such fragments are not available for many
cases
GOR V - Garnier, Osguthorpe, Robson V - predicts
secondary structure of less similar fragments
with good performance these are protein
fragments for which FDM method cannot find
suitable structures
For references additional details
http//gor.bb.iastate.edu/cdm/

7
Secondary Structure Prediction for Different
Types of Proteins/Domains

For Complete proteins
Globular Proteins - use methods previously
described
Transmembrane (TMM) Proteins - use special
methods
(next slides)
For Structural Domains many under development
Coiled-Coil Domains (Protein interaction
domains)
Zinc Finger Domains (DNA binding domains),
others

8
SS Prediction for Transmembrane Proteins

Transmembrane (TM) Proteins
Only a few in the PDB - but 30 of cellular
proteins are membrane-associated !
Hard to determine experimentally, so prediction
important
TM domains are relatively 'easy' to predict!
Why? constraints due to hydrophobic environment
2 main classes of TM proteins
??- helical
?- barrel

9
SS Prediction for TM ?-Helices

??-Helical TM domains
Helices are 17-25 amino acids long (span the
membrane)
Predominantly hydrophobic residues
Helices oriented perpendicular to membrane
Orientation can be predicted using "positive
inside" rule
Residues at cytosolic (inside or cytoplasmic)
side of TM helix, near hydrophobic anchor are
more positively charged than those on lumenal
(inside an organelle in eukaryotes) or
periplasmic side (space between inner outer
membrane in gram-negative bacteria)
Alternating polar hydrophobic residues provide
clues to interactions among helices within
membrane
Servers?
TMHMM or HMMTOP - 70 accuracy - confused by
hydrophobic signal peptides (short hydrophobic
sequences that target proteins to the
endoplasmic reticulum, ER)
Phobius - 94 accuracy - uses distinct HMM
models for TM helices
signal peptide sequences

10
SS Prediction for TM ?-Barrels ?

?-Barrel TM domains ?
?-strands are amphipathic (partly hydrophobic,
partly hydrophilic)
Strands are 10 - 22 amino acids long
Every 2nd residue is hydrophobic, facing lipid
bilayer
Other residues are hydrophilic, facing "pore" or
opening
Servers? Harder problem, fewer servers
TBBPred - uses NN or SVM (more on these ML
methods later)
Accuracy ?

11
Chp 15 - Tertiary Structure Prediction

SECTION V STRUCTURAL BIOINFORMATICS
Xiong Chp 15
Protein Tertiary Structure Prediction
Methods
Homology Modeling
Threading and Fold Recognition
Ab Initio Protein Structural Prediction
CASP

12
Protein Tertiary Structure Prediction

3 Major Methods
Homology Modeling (easiest!)
Threading and Fold Recognition (harder)
Ab Initio Protein Structural Prediction (really
hard)

13
Comparative Modeling?

Comparative modeling - term is sometimes used
interchangeably with homology modeling, but also
sometimes used to mean both homology modeling
and/or threading/fold recognition

14
Ab Initio Prediction

Develop energy function
bond energy
bond angle energy
dihedral angle energy
van der Waals energy
electrostatic energy
Calculate structure by minimizing energy function
(usually Molecular Dynamics or Monte Carlo
methods)
Ab initio prediction - impractical for most real
(long) proteins
Computationally? very expensive
Accuracy? Usually poor for all except short
peptides
(but much improvement recently!)

Provides both folding pathway folded structure
15
Comparative Modeling

Two types
1) Homology modeling
2) Threading (fold recognition)
Both rely on availability of experimentally
determined structures that are "homologous" or
at least structurally very similar to target

Provide folded structure only
16
Homology Modeling

Identify homologous protein sequences (?-BLAST)
Among available structures (in PDB), choose one
with closest sequence to target as template
(can combine steps 1 2 by using PDB-BLAST)
Build model by placing target sequence residues
in corresponding positions of homologous
structure refine by "tweaking" modeled
structure (energy minimization)
Homology modeling - works "well"
Computationally? "relatively" inexpensive
Accuracy? higher sequence identity ? better
model
Requires 30 sequence identity with sequence for
which structure is known

17
Threading - Fold Recognition

Identify best fit between target sequence
template structure

Develop energy function
Develop template library
Align target sequence with each template score
Identify top scoring template (1D to 3D
alignment)
Refine structure as in homology modeling
Threading - works "sometimes"
Computationally? Can be expensive or cheap,
depends on energy function whether "all atom"
or "backbone only" threading
Accuracy? in theory, should not depend on
sequence identity (should depend on quality of
template library "luck")
Usually, higher sequence identity to protein of
known structure ? better model