Protein Structure Prediction - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Protein Structure Prediction

Description:

There is a choice of hydrogen bond partner for each residue. ... Structure class as defined by SCOP or CATH. Folds, as defined by SCOP and CATH ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 41

Provided by: Shan62

Category:

more less

Transcript and Presenter's Notes

Title: Protein Structure Prediction

1
Protein Structure Prediction

Shandar Ahmad
Kyushu Institute of Technology,
Iizuka 820 8502,
Fukuoka-ken, Japan
shandar_at_bse.kyutech.ac.jp

2
Secondary structure The basic unit of protein
structure

Protein structures are stabilized by Hydrogen
bonds between atoms of the amino acid sequence.
There is a choice of hydrogen bond partner for
each residue.
The pattern of hydrogen bond pairing determines
secondary structure.

3
Types of secondary structure

Eight types of secondary structures have been
defined by Kabsch and Sander in DSSP (Dictionary
of secondary structures in proteins). They are
Alpha helix (H) 5. Pi-helix (I)
Isolated beta bridge (B) 6. Turn (T)
Extended Beta (E) 7. Bend (S)
3-10 helix (G) 8. Coil (C)

4
Alpha helixHydrogen bond is formed between nth
and (n4)th residues
5
Beta strand (E), part of beta ladder
6
Beta strand (E) cont..
7
Turn structure (T)
8
Other helices
9
Bend Conformation

Bend is the caused by interactions with other
parts of protein.
Proline introduces bend due to conformational
constraints.
Water molecules cause bend to maximize CO
exposure to water.

10
Some structural domains in proteins

Movies data

11
Methods to get secondary structure from
experimentally known structures

DSSP is the most commonly used program to
calculate secondary structure of proteins.
DSSP also provides a database to get Sec
structure by searching their PDB codes.
Database and programs can be accessed at
http//www.cmbi.kun.nl/gv/dssp/
Program can be downloaded for local calculations.
PDB files also contain secondary structures in
their headers, but only the broad details.

12
Prediction of secondary structure

Older methods
Chou and fasman method (1974)
The Chou-Fasman method of secondary structure
prediction depends on assigning a set of
prediction values to a residue and then applying
a simple algorithm to those numbers.
For example
p(t) f(j)f(j1)f(j2)f(j3) See next table
Online predictions http//fasta.bioch.virginia.ed
u/fasta_www/chofas.htm
Typical success rate of prediction is of 50

13
Name P(a) P(b) P(turn) f(i) f(i1) f(i2)
f(i3) Alanine 142 83 66 0.06 0.076 0.035
0.058 Arginine 98 93 95 0.070 0.106 0.099
0.085 Aspartic Acid 101 54 146 0.147
0.110 0.179 0.081 Asparagine 67 89 156
0.161 0.083 0.191 0.091 Cysteine 70 119
119 0.149 0.050 0.117 0.128 Glutamic Acid
151 37 74 0.056 0.060 0.077 0.064
Glutamine 111 110 98 0.074 0.098 0.037
0.098 Glycine 57 75 156 0.102 0.085 0.190
0.152 Histidine 100 87 95 0.140 0.047
0.093 0.054 Isoleucine 108 160 47 0.043
0.034 0.013 0.056 Leucine 121 130 59
0.061 0.025 0.036 0.070 Lysine 114 74 101
0.055 0.115 0.072 0.095 Methionine 145 105
60 0.068 0.082 0.014 0.055 Phenylalanine
113 138 60 0.059 0.041 0.065 0.065 Proline
57 55 152 0.102 0.301 0.034 0.068 Serine
77 75 143 0.120 0.139 0.125 0.106
Threonine 83 119 96 0.086 0.108 0.065
0.079 Tryptophan 108 137 96 0.077 0.013
0.064 0.167 Tyrosine 69 147 114 0.082
0.065 0.114 0.125 Valine 106 170 50 0.062
0.048 0.028 0.053
Download
Source http//prowl.rockefeller.edu/aainfo/chou.h
tm
14
Further improvements

1978 Garnier improved the method by using
statistically significant pair-wise interactions
as a determinant of the statistical significance.
This improved the success rate to 62
1993 Levin improved the prediction level by using
multiple sequence alignments.
The reasoning is as follows.
Conserved regions in a multiple sequence
alignment provides a strong evolutionary
indicator of a role in the function of the
protein.
Those regions are also likely to have conserved
structure, including secondary structure and
strengthen the prediction by their joint
propensities.
This improved the success rate to 69.

15
Neural network based methods

In 1993, Qian Sejnowski and Holey and Karplus
introduced first neural network based method.
Sequence information is sent to the neural
network and the output is classified as helix,
beta, or other secondary structures
See next figure

16
(No Transcript)
17
Encoding the amino acids

Amino acid residues are coded as 21 bit binary
vectors.
For predicting secondary structure of a residue,
this information about residue and its neighbour
is sent to the neural network.
For known structural data, network is trained and
validated.

18
Other advanced methods of prediction

PHD Predict Protein
1994 Rost and Sander combined neural networks
with multiple sequence alignments. The success
rate is 72.
http//www.embl-heidelberg.de/predictprotein/predi
ctprotein.html
Jpred (Cuff and Barton)
http//www.compbio.dundee.ac.uk/www-jpred/
Predator (Frishman D, Argos P )
http//npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?p
age/NPSA/npsa_preda.html
PSIPRED (DT Jones)
http//bioinf.cs.ucl.ac.uk/psipred/

19
Solvent accessibility of amino acid residues

This is another important property of amino acids
in proteins, which we want to predict.
Solvent accessibility is defined as the Area
around the surface of a residue, which is exposed
to water (or any solvent).
Higher solvent accessibility or accessible
surface area (ASA) indicates greater chance of
interactions with DNA, Ligands etc. and being in
the active sites.

20
Accessible surface area or solvent accessibility
21
Relative solvent accessibility

Total ASA of an amino acid is normalised to
percentage scale.
Scaling is different for 20 types of amino acids.
ASA of extended state (Gly-X-Gly or Ala-X-Ala)
are used for scaling.
Sometimes this relative ASA is used to say if a
residue is exposed or buried. E.g. if ASA is more
than 25, it may be called exposed, and if less
than 25 it may be called buried.
Different values of threshold (other than 25)
are used by different people.

22
Solvent accessibility prediction methods

PHD server described above gives ASA predictions
also. It devided residues into buried and exposed
categories at 16 threshold and gives a
prediction.
Real value prediction method based on neural
network was developed by us (Ahmad and Sarai
2003), which can make a prediction upto 18 mean
absolute error (better than any other prediction
method available).
http//gibk26.bse.kyutech.ac.jp/shandar/netasa/rv
p-net/
This is the only server which also provides
graphical outputs.

23
A graphical prediction of Solvent accessibility
by RVP-Net. Shandar Ahmad and Akinori Sarai, 2003
24
Measuring prediction accuracy

Different scales of prediction are used.
Single residue accuracy or Qindex
(Qhelix, Qstrand, Qcoil, Q3) gives percentage of
residues predicted correctly as helix, strand,
coil or for all three conformational states. The
definition of Qindex is as follows.
For a single conformational state
number of residues correctly predicted in
state i
Qi --------------------------------------------
----- ----------- 100,
number of residues observed in state
i
where i is either helix, strand or coil.

25
Other scores

R Sxy / Sxx Syy Where Sxy ? (x xo)
(y-yo)
Sxx ? ? (x xo)2
Syy ? ? (y yo)2
Subscript o represents mean value of the
corresponding variable.
Sensitivity TP/ (TPFN)
Specificity TN/(TNFP) (T-True, F-False,
P-Positive, N-Negative)

26
Segment overlap SOV score

SOV Segment OVerlap quantity measure for a
single conformational state
1 SUM MINOV(S1S2)
DELTA(S1S2)
SOV(i) --- SUM -------------------------
-- LEN(S1)
N(i) SUM MAXOV(S1S2)
S(i)
Where
S1 and S2 are the observed and predicted
secondary structure segments (in state i, which
can be either H, E or C) LEN(S1) is the number
of residues in the segments
S1 MINOV(S1S2) is the length of actual overlap
of S1 and S2, i.e. the extent for which both
segments have residues in state i, for example H
MAXOV(S1S2) is the length of the total extent
for which either of the segments S1 or S2 has a
residue in state i DELTA(S1S2) is the integer
value defined as being equal to the
MIN(MAXOV(S1S2)- MINOV(S1S2)) MINOV(S1S2)
INT(LEN(S1)/2) INT(LEN(S2)/2)
THE SUM is taken over S, all the pairs of
segments S1S2, where S1 and S2 have at least
one residue in state i in common N(i) is the
number of residues in state i

27
Higher level predictions of protein structure

This includes prediction of
Structure class as defined by SCOP or CATH
Folds, as defined by SCOP and CATH
Complete three dimensional structure.

28
Prediction of protein structure calss

Some secondary structure prediction servers also
predict classe e.g.
http//www.cmpharm.ucsf.edu/jmc/pred2ary/
(Chandonia and Karplus)
All helix, all beta classes are easier to predict
than a/b and ab structural classes.

29
Protein fold prediction

Approaches to fold prediction may be classified
into two categories
Sequence to sequence prediction Based on getting
the best alignments with known structures and
predicting fold.
Sequence to structure methods Structure is
encoded as a sequence of residue environments.
Score is assigned to each residue and finally
score is added to detect the probability of a
given fold.

30
Best fold predictors

CASP is a biannual meeting for evaluating
structure prediction. Following methods were
found to be the best in 2002.
Krzysztof Ginalski and Leszek Rychlewski
Nucleic Acids Research, 2003, Vol. 31, No. 13
3291-3292
http//BioInfo.PL/Meta
This is a metserver working on 3D-Jury method. It
collects predictions from many servers and
develops a consensus model based prediction.

31
(No Transcript)
32
Pcons Consensus predictor

Earlier version of Meta server, but with slight
difference in building the final prediction
http//www.sbc.su.se/arne/pmodeller/

33
ROSETA predictor by Baker

This is an ab-initio method of structure
prediction.
When there is no significant alignment available,
this is the only way to predict.
Performs better than all other predictors.
No online predictions, but group website is here
http//depts.washington.edu/bakerpg/highlights1.ht
ml

34
Three dimensional structure prediction