Protein Structure Prediction - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Protein Structure Prediction

Description:

There is a choice of hydrogen bond partner for each residue. ... Structure class as defined by SCOP or CATH. Folds, as defined by SCOP and CATH ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 41
Provided by: Shan62
Category:

less

Transcript and Presenter's Notes

Title: Protein Structure Prediction


1
Protein Structure Prediction
  • Shandar Ahmad
  • Kyushu Institute of Technology,
  • Iizuka 820 8502,
  • Fukuoka-ken, Japan
  • shandar_at_bse.kyutech.ac.jp

2
Secondary structure The basic unit of protein
structure
  • Protein structures are stabilized by Hydrogen
    bonds between atoms of the amino acid sequence.
  • There is a choice of hydrogen bond partner for
    each residue.
  • The pattern of hydrogen bond pairing determines
    secondary structure.

3
Types of secondary structure
  • Eight types of secondary structures have been
    defined by Kabsch and Sander in DSSP (Dictionary
    of secondary structures in proteins). They are
  • Alpha helix (H) 5. Pi-helix (I)
  • Isolated beta bridge (B) 6. Turn (T)
  • Extended Beta (E) 7. Bend (S)
  • 3-10 helix (G) 8. Coil (C)

4
Alpha helixHydrogen bond is formed between nth
and (n4)th residues
5
Beta strand (E), part of beta ladder
6
Beta strand (E) cont..
7
Turn structure (T)
8
Other helices
9
Bend Conformation
  • Bend is the caused by interactions with other
    parts of protein.
  • Proline introduces bend due to conformational
    constraints.
  • Water molecules cause bend to maximize CO
    exposure to water.

10
Some structural domains in proteins
  • Movies data

11
Methods to get secondary structure from
experimentally known structures
  • DSSP is the most commonly used program to
    calculate secondary structure of proteins.
  • DSSP also provides a database to get Sec
    structure by searching their PDB codes.
  • Database and programs can be accessed at
  • http//www.cmbi.kun.nl/gv/dssp/
  • Program can be downloaded for local calculations.
  • PDB files also contain secondary structures in
    their headers, but only the broad details.

12
Prediction of secondary structure
  • Older methods
  • Chou and fasman method (1974)
  • The Chou-Fasman method of secondary structure
    prediction depends on assigning a set of
    prediction values to a residue and then applying
    a simple algorithm to those numbers.
  • For example
  • p(t) f(j)f(j1)f(j2)f(j3) See next table
  • Online predictions http//fasta.bioch.virginia.ed
    u/fasta_www/chofas.htm
  • Typical success rate of prediction is of 50

13
Name P(a) P(b) P(turn) f(i) f(i1) f(i2)
f(i3) Alanine 142 83 66 0.06 0.076 0.035
0.058 Arginine 98 93 95 0.070 0.106 0.099
0.085 Aspartic Acid 101 54 146 0.147
0.110 0.179 0.081 Asparagine 67 89 156
0.161 0.083 0.191 0.091 Cysteine 70 119
119 0.149 0.050 0.117 0.128 Glutamic Acid
151 37 74 0.056 0.060 0.077 0.064
Glutamine 111 110 98 0.074 0.098 0.037
0.098 Glycine 57 75 156 0.102 0.085 0.190
0.152 Histidine 100 87 95 0.140 0.047
0.093 0.054 Isoleucine 108 160 47 0.043
0.034 0.013 0.056 Leucine 121 130 59
0.061 0.025 0.036 0.070 Lysine 114 74 101
0.055 0.115 0.072 0.095 Methionine 145 105
60 0.068 0.082 0.014 0.055 Phenylalanine
113 138 60 0.059 0.041 0.065 0.065 Proline
57 55 152 0.102 0.301 0.034 0.068 Serine
77 75 143 0.120 0.139 0.125 0.106
Threonine 83 119 96 0.086 0.108 0.065
0.079 Tryptophan 108 137 96 0.077 0.013
0.064 0.167 Tyrosine 69 147 114 0.082
0.065 0.114 0.125 Valine 106 170 50 0.062
0.048 0.028 0.053
Download
Source http//prowl.rockefeller.edu/aainfo/chou.h
tm
14
Further improvements
  • 1978 Garnier improved the method by using
    statistically significant pair-wise interactions
    as a determinant of the statistical significance.
    This improved the success rate to 62
  • 1993 Levin improved the prediction level by using
    multiple sequence alignments.
  • The reasoning is as follows.
  • Conserved regions in a multiple sequence
    alignment provides a strong evolutionary
    indicator of a role in the function of the
    protein.
  • Those regions are also likely to have conserved
    structure, including secondary structure and
    strengthen the prediction by their joint
    propensities.
  • This improved the success rate to 69.

15
Neural network based methods
  • In 1993, Qian Sejnowski and Holey and Karplus
    introduced first neural network based method.
  • Sequence information is sent to the neural
    network and the output is classified as helix,
    beta, or other secondary structures
  • See next figure

16
(No Transcript)
17
Encoding the amino acids
  • Amino acid residues are coded as 21 bit binary
    vectors.
  • For predicting secondary structure of a residue,
    this information about residue and its neighbour
    is sent to the neural network.
  • For known structural data, network is trained and
    validated.

18
Other advanced methods of prediction
  • PHD Predict Protein
  • 1994 Rost and Sander combined neural networks
    with multiple sequence alignments. The success
    rate is 72.
  • http//www.embl-heidelberg.de/predictprotein/predi
    ctprotein.html
  • Jpred (Cuff and Barton)
  • http//www.compbio.dundee.ac.uk/www-jpred/
  • Predator (Frishman D, Argos P )
  • http//npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?p
    age/NPSA/npsa_preda.html
  • PSIPRED (DT Jones)
  • http//bioinf.cs.ucl.ac.uk/psipred/

19
Solvent accessibility of amino acid residues
  • This is another important property of amino acids
    in proteins, which we want to predict.
  • Solvent accessibility is defined as the Area
    around the surface of a residue, which is exposed
    to water (or any solvent).
  • Higher solvent accessibility or accessible
    surface area (ASA) indicates greater chance of
    interactions with DNA, Ligands etc. and being in
    the active sites.

20
Accessible surface area or solvent accessibility
21
Relative solvent accessibility
  • Total ASA of an amino acid is normalised to
    percentage scale.
  • Scaling is different for 20 types of amino acids.
  • ASA of extended state (Gly-X-Gly or Ala-X-Ala)
    are used for scaling.
  • Sometimes this relative ASA is used to say if a
    residue is exposed or buried. E.g. if ASA is more
    than 25, it may be called exposed, and if less
    than 25 it may be called buried.
  • Different values of threshold (other than 25)
    are used by different people.

22
Solvent accessibility prediction methods
  • PHD server described above gives ASA predictions
    also. It devided residues into buried and exposed
    categories at 16 threshold and gives a
    prediction.
  • Real value prediction method based on neural
    network was developed by us (Ahmad and Sarai
    2003), which can make a prediction upto 18 mean
    absolute error (better than any other prediction
    method available).
  • http//gibk26.bse.kyutech.ac.jp/shandar/netasa/rv
    p-net/
  • This is the only server which also provides
    graphical outputs.

23
A graphical prediction of Solvent accessibility
by RVP-Net. Shandar Ahmad and Akinori Sarai, 2003
24
Measuring prediction accuracy
  • Different scales of prediction are used.
  • Single residue accuracy or Qindex
  • (Qhelix, Qstrand, Qcoil, Q3) gives percentage of
    residues predicted correctly as helix, strand,
    coil or for all three conformational states. The
    definition of Qindex is as follows.
  • For a single conformational state
  • number of residues correctly predicted in
    state i
  • Qi --------------------------------------------
    ----- ----------- 100,
  • number of residues observed in state
    i
  • where i is either helix, strand or coil.

25
Other scores
  • R Sxy / Sxx Syy Where Sxy ? (x xo)
    (y-yo)
  • Sxx ? ? (x xo)2
  • Syy ? ? (y yo)2
  • Subscript o represents mean value of the
    corresponding variable.
  • Sensitivity TP/ (TPFN)
  • Specificity TN/(TNFP) (T-True, F-False,
    P-Positive, N-Negative)

26
Segment overlap SOV score
  • SOV Segment OVerlap quantity measure for a
    single conformational state
  • 1 SUM MINOV(S1S2)
    DELTA(S1S2)
  • SOV(i) --- SUM -------------------------
    -- LEN(S1)
  • N(i) SUM MAXOV(S1S2)
  • S(i)
  • Where
  • S1 and S2 are the observed and predicted
    secondary structure segments (in state i, which
    can be either H, E or C) LEN(S1) is the number
    of residues in the segments
  • S1 MINOV(S1S2) is the length of actual overlap
    of S1 and S2, i.e. the extent for which both
    segments have residues in state i, for example H
  • MAXOV(S1S2) is the length of the total extent
    for which either of the segments S1 or S2 has a
    residue in state i DELTA(S1S2) is the integer
    value defined as being equal to the
    MIN(MAXOV(S1S2)- MINOV(S1S2)) MINOV(S1S2)
    INT(LEN(S1)/2) INT(LEN(S2)/2)
  • THE SUM is taken over S, all the pairs of
    segments S1S2, where S1 and S2 have at least
    one residue in state i in common N(i) is the
    number of residues in state i

27
Higher level predictions of protein structure
  • This includes prediction of
  • Structure class as defined by SCOP or CATH
  • Folds, as defined by SCOP and CATH
  • Complete three dimensional structure.

28
Prediction of protein structure calss
  • Some secondary structure prediction servers also
    predict classe e.g.
  • http//www.cmpharm.ucsf.edu/jmc/pred2ary/
  • (Chandonia and Karplus)
  • All helix, all beta classes are easier to predict
    than a/b and ab structural classes.

29
Protein fold prediction
  • Approaches to fold prediction may be classified
    into two categories
  • Sequence to sequence prediction Based on getting
    the best alignments with known structures and
    predicting fold.
  • Sequence to structure methods Structure is
    encoded as a sequence of residue environments.
    Score is assigned to each residue and finally
    score is added to detect the probability of a
    given fold.

30
Best fold predictors
  • CASP is a biannual meeting for evaluating
    structure prediction. Following methods were
    found to be the best in 2002.
  • Krzysztof Ginalski and Leszek Rychlewski
  • Nucleic Acids Research, 2003, Vol. 31, No. 13
    3291-3292
  • http//BioInfo.PL/Meta
  • This is a metserver working on 3D-Jury method. It
    collects predictions from many servers and
    develops a consensus model based prediction.

31
(No Transcript)
32
Pcons Consensus predictor
  • Earlier version of Meta server, but with slight
    difference in building the final prediction
  • http//www.sbc.su.se/arne/pmodeller/

33
ROSETA predictor by Baker
  • This is an ab-initio method of structure
    prediction.
  • When there is no significant alignment available,
    this is the only way to predict.
  • Performs better than all other predictors.
  • No online predictions, but group website is here
  • http//depts.washington.edu/bakerpg/highlights1.ht
    ml

34
Three dimensional structure prediction
  • Methods are based on
  • Ab-intio, Molecular dynamics and Monte Carlo
    methods of energy minimization.
  • Comparative modelling using sequence alignments
    with known structures.
  • Combination of the above two methods.

35
Some prediction methods/ predicted model databases
  • Modeller
  • http//www.salilab.org/modeller/modeller.html
  • SwissModel
  • http//www.expasy.org/swissmod/
  • FAMSBASE
  • http//famsbase.bio.nagoya-u.ac.jp/famsbase/
  • GenTHREADER and PSIPRED
  • http//bioinf.cs.ucl.ac.uk/psipred/
  • UCLA/DOE Fold Server (Includes DASEY, in case no
    alignments are found).
  • http//fold.doe-mbi.ucla.edu/

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com