6B -1 - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

6B -1

Description:

The Prediction of Protein Structures Amino Acids ( ) Amino Acids ( ) Protein ( ) Primary Structure ( ) of Protein ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 82
Provided by: cbyang
Category:
Tags:

less

Transcript and Presenter's Notes

Title: 6B -1


1
The Prediction of Protein Structures
2
Amino Acids (???)
???????????,?20?
3
Amino Acids (???)??

4
Protein (???)??

5
Primary Structure (????) of Protein
  • Primary structure primary sequence of amino
    acids
  • ?????(?????)??????

6
Secondary Structure (????) of Protein
  • Secondary structure
  • ?-helix
  • ?-sheet
  • loop

7
Tertiary Structure (????) of Protein
  • ?????????

8
Quaternary Structure (????) of Protein
  • ?????????

9
?????
?? http//elearning.bioinfo.ntu.edu.tw/
10
???????
?? http//elearning.bioinfo.ntu.edu.tw/
11
Relation between Structures
  • Sequence ? structure ? function

12
Reason for Prediction
  • Why do we need protein structure prediction?
  • Biological technique
  • X-ray Crystallography (X-ray ???)
  • Nuclear Magnetic Resource(NMR)(????)
  • Expensive, time-consuming and limit to small or
    medium protein( 700 residues)
  • ? Computational strategies

13
Prediction Competition
  • Advance the methods of identifying protein
    structure from sequence
  • CASP(Critical Assessment of Techniques for
    Protein Structure Prediction )
  • http//predictioncenter.org
  • Every 2 years(1994 now)
  • CASP6(Gaeta, Italy, Dec. 2004)
  • CASP7(Pacific Grove, USA, Nov. 2006)

14
(No Transcript)
15
Accuracy Measurement
  • RMSD(Root Mean Square Deviation )

Distance RMSD
16
Prediction of Protein Structures
  • Ab Initio Methods(?????)
  • Thermodynamics (?????)
  • Without reference from other known structures.
  • Homology Modeling(?????)
  • Knowledge-based modeling
  • Sequence similarity
  • More accurate

17
Previous Works
  • PHDthreader(http//www.embl-heidelberg.de/predictp
    rotein)
  • lt 30 of the predicted first hits are true remote
    homologues
  • Ab initio method
  • SWISS-MODEL(http//expasy.hcuge.ch/swissmod/SWISS-
    MODEL.html)
  • An automated knowledge-based protein modeling
    server
  • InsightII(http//www.accelrys.com/products/insight
    /index.html)(Charged)
  • Protein structure prediction
  • Paircoil(http//ostrich.lcs.mit.edu/cgi-bin/score)
  • Prediction of coiled coil regions
  • List of other methods or programs
  • http//restools.sdsc.edu/biotools/biotools9.html

18
Properties of Ab Initio Methods
  • Score functions
  • HMM(Hidden Markov Model)
  • electrostatics(??), VdW(????) and H-bonds(??) and
    others.
  • Hydrophobic(???) and hydrophilic(???)
  • ? Protein folding problem

19
Homology Modeling
  • General presumption
  • Little changes on protein sequence would also
    alter little changes on structure.
  • Protein identity gt 30
  • General procedure
  • Database searching and template selection (????)
  • Energy minimization(?????)
  • Rationality evaluation(?????)

20
General Procedure of Protein Structure Prediction
on Homology Model
  • Input S1SSKCSRLKTFPQNACVYHK
  • Output The backbone conformation model of S1.
  • Step 1 Select a template.
  • S2SVYCSSLACSDHN
  • Step 2 Perform sequence alignment.
  • S1SSKCSRLKTFPQNACVYHK
  • S2SVYCSSL------ ACSDHN

21
  • Step 3 Find the structurally conversed regions.
    Copy the coordinators of structurally conversed
    regions from S2 to S1.

22
(No Transcript)
23
  • Step 4 Apply the folding algorithm to position
    the residues that lose of sequence similarity.
  • LKTFPQNA 10011001

24
  • Step 5
  • - Find the the structure-known proteins with
    70 or higher sequence similarity.
  • - Construct a segment of B-spline curve for
    every four points.

25
Final Conformation
26
Template Search on Protein Databases
  • PDB(Protein Data Bank)
  • http//www.rcsb.org/pdb/
  • Swiss-prot
  • http//tw.expasy.org/sprot/
  • Classification
  • CATH(Class, Architecture, Topology and Homologous
    superfamily)
  • http//cathwww.biochem.ucl.ac.uk/latest/
  • SCOP(Structural Classification of Proteins)
  • http//scop.mrc-lmb.cam.ac.uk/scop/index.html

27
(No Transcript)
28
Template Selection Methods (Tools)
  • How to select?
  • Sequence alignment
  • ClustalW, Blastp and others
  • Secondary structure predictionAl-Lazikani et
    al.
  • ? Structural reserved blocks (??????)

29
PAM250 Score Matrix
30
Blosum62 Matrix
31
Protein Folding Problem
  • Given the primary structure of a protein, to
    compute its 3-dimensional structure.
  • H-P model was Proposed by Dill in 1985 Dill85
  • Minimizing the total free energy
  • The characteristic of each of 20 amino acids
  • H (hydrophobic, non-polar) 1
  • (hating water, ???)
  • P (hydrophilic, polar) 0
  • (loving water, ???)
  • The amino acid sequence of a protein can be
    viewed as a binary sequence of Hs (1s) and Ps
    (0s).

32
Example of H-P Model
  • Input sequence 011001001110010

0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
1
0
1
1
1
0
0
0
1
0
0
0
0
Score 5
Score 3
33
Protein Folding on H-P Model
  • The protein folding on H-P model Given a
    sequence of 1s (Hs) and 0s (Ps), to find a
    self-avoiding paths embedded in either a 2D or 3D
    lattice such that the number of pairs of adjacent
    1s is maximized.
  • NP-complete even for 2D lattice Hart97.

34
U-Fold Algorithm for HP
  • Find a suitable point where to split the string
    into two substrings.
  • Example 0100101001110101000010
  • 0100----101001
  • 01000010101--1

35
Ant Colony Optimization System
  • The ant colony optimization (ACO) algorithm was
    presented by Dorigo et al. in 1991.

36
General Lattice Model
Square Lattice Model
Triangular Lattice Model
37
Experiments of Different Models
1b1u 1a6n 118l 102l 1b8k
Cubic 12.08891 13.35721 13.01421 13.98656 17.50644
FCC 10.18907 12.09836 12.39913 11.93452 15.06346
FCC Face Center Cubic Model
  • Measured by RMSD(Å)
  • Data source PDB
  • Folding by genetic algorithm

38
Structure Alignment by Curve Fitting
  • B-spline curves

39
Curve Matching
  • Curve matching - measure function

40
  • - Apply the curve alignment.
  • Our score function of the curve alignment

41
Additional Constraints
  • Improvement on the HP model
  • Prediction results are not successful enough
  • Consideration of hydrophobicity is not enough.
  • Other features should also be considered
  • Secondary structure elements (SSEs)
  • a helix
  • b sheet
  • Electrostatic attractions
  • Disulfide bonds

42
Electrostatic Attractions and Disulfide Bonds
  • Electrostatic attractions
  • Disulfide bond formed between two Cs

43
Probabilistic Disulfide Bonds
  • Folding with the constraint of disulfide bonds.

44
Experiments for Disulfide Bonds
  • Experiments of folding with disulfide constraints

45
Secondary Structures
  • Conformations of a helix
  • Distance between ith amino acid
  • and (i4)th amino acid

46
Secondary Structures
  • Conformations of b sheet

47
Further Improvement--Sliced Lattice Model
  • The origin lattice models cannot work well.
  • Slice the lattice into little lattices.

48
Sliced Lattice Model
49
Global Folding
50
Experimental Materials
  • Database PDB (http//www.rcsb.org/pdb/)
  • April 17, 2005
  • 20,380 proteins
  • Data of CASP6 (http//predictioncenter.llnl.gov/)
  • 2004
  • Alignment Blastp (http//www.ncbi.nlm.nih.gov/)
  • Sequence identity lt 90
  • Blosum-62

51
Experiment Results
  • Target protein 1LIN (146)

Template Protein Sequence Similarity RMSD(03) RMSD(04) RMSD(05)
1CFD 100 7.34 - -
1TNW 69 18.72 13.37 10.56
1IQ5 55 15.15 9.18 7.35
1DTL 52.9 10.22 7.48 6.17
5PAL 36.4 12.18 8.43 5.89
Measured by RMSD
52
Experiment Results
  • Target protein 1QG1(104)

Template Protein Sequence Similarity RMSD(03) RMSD(04) RMSD(05)
1JYQ 90.4 4.15 - 4.24
1JYU 90.4 13.89 - 10.89
1SHA 46.7 4.82 4.82 3.65
1SHD 45.2 8.89 6.77 5.55
5PDR 24.4 10.55 8.0 6.76
Measured by RMSD
53
Experimental Results of CASP6
  • Compared with Chen03

of proteins 77
of positive improvement 59
of negative improvement 12
Average improvement 21.44
Average sequence length 208(53435)
Average template identity 36
Average template similarity 21
54
Compared with Palu et al.
  • Palu et al.Palu04, without template
  • FCC lattice model

55
Comparing with Zheng et al.
  • Zheng et al.
  • Zheng02
  • Homology
  • Lattice model

56
An Example of Our Results
  • PDB code 7RSA, Length124, RMSD 1.48Å

Our result
Real structure
57
Protein Structure Prediction System
  • target protein 7RSA
  • Step 1 Prepare

58
Protein Structure Prediction
Systemhttp//par.cse.nsysu.edu.tw/main.html
  • Step 2 Predict

59
Protein Structure Prediction System
  • Step 3 Display result

60
Protein Structure Prediction System
  • Step 3 Display result

61
Protein Structure Prediction System
  • Step 3 Display result

62
Protein Structure Prediction System
  • Step 4 Compare

Our result
Real structure
RMSD
63
Protein Structure Prediction System
  • Step 4 Compare

Our result
Real structure
64
Protein Side Chain Packing
65
Amino Acids Side-chain
  • Elements of protein
  • Three groups

Lysine (LYS)
Side-chain ?
66
Protein Structure Prediction
  • Input 1D sequence
  • Output 3D structure
  • 3D backbone structure in general
  • Protein structure
  • Backbone structure side-chain structure

ACE GLY ASP VAL GLU LYS GLY LYS LYS ILE PHE VAL
GLN
67
Backbone and Side Chain
Backbone
Side-chain
Protein SAV1595, Journal of Biomolecular NMR
(2004) 29 391394
68
Protein Side Chain Packing Problem
  • PSCPP
  • Given the fixed backbone of the protein
  • For each residue of backbone other than Glycine,
    there is a set of possible rotamers.
  • Problem Choose one suitable rotamer for each
    residue, such that the total energy of the
    protein is minimized.
  • The PSCPP is NP-hard.

69
Graph Model of PSCPP Problem
  • Let R r1, r2, . . . , rn be the set of
    residues of the target protein.
  • Let an undirected graph G (V, E) represent the
    side chain of a protein.
  • Vi vi,j vi,j does not collide with each
    backbone atoms .
  • Then we have V ?Vi and E (vi,j ,
    vi1,k)vi,j does not collide with vi1,k.

rotamer
70
Dihedral Angles
  • Side-chain Atoms
  • C?, C?, O?.
  • Dihedral Angles Iupa70
  • f Ci-1-Ni-Cai-Ci
  • j Ni-Cai-Ci-Ni1
  • X1 Ni-Cai-Cbi-Oi

71
The Rotamer Library
  • The accuracy of side chain prediction depends
    primarily on the quality of rotamer library.
  • Our rotamer library is a coordinate rotamer
    library, which reserves the bond lengths and bond
    angles that do not appear in the standard rotamer
    library.
  • The source of our rotamer library is based on 850
    proteins, which are the same as the
    backbone-dependent rotamer library proposed by
    Dunbrack and Karplus. Dunb93

72
Example of the Rotamer Library
  • A.A. f ? X1 Prob.3-D Coordinate

73
Formulas of ACO for PSCPP
  • Pheromone probability formula
  • Pheromone update formula
  • 0 ltrlt 1, is the rate of the pheromone evaporation

74
ACO Prediction for PSCPP
  • Input A backbone coordinate data.
  • Output The route with near minimum score.
  • Step1 Set parameters and initialize pheromone
    trails.
  • Step 2 Each ant k chooses one rotamer u of
    residue i according to the probability function
    pk(s, u) for all 1 i n, u? Vi.
  • Step3 Update the pheromone trails.
  • Step 4 If current best solution has not exceeded
    some percent after some predefined generations or
    the number of generations has reached the
    predefined value, return the route with minimum
    score otherwise, go to Step 2.

75
The Score Function
  • Features in ACO score functions
  • The disulfide bonds
  • S1 BonS ? (disulfide bonds),
  • The hydrogen bonds
  • S2 BonH ? (hydrogen bonds),
  • The charge-charge interactions
  • S3 BonC ? ((different charge pairs)- (same
    charge pairs)),
  • The van der Waals interactions
  • S4 BonV ? ? Ei,j
  • Energy score function E S1 S2 S3 S4

76
Experiments
  • Two test sets
  • 25 proteins from Xiang and Honig 2001
  • 5 proteins from Canutescu et al. 2003
  • Cutoff value
  • 20 Xie06, R3
  • If X1 is within 20 of corresponding angle in
    the real structure, the prediction angle would be
    considered correct.
  • Comparing with SCWRL 3.0 Canu03 and R3 Xie06

77
Parameters in Experiments
  • Weights of features in score function
  • Parameters used in ACO Algorithm

Parameter Value
Population 50
Generation 300600
a 1.0
b 1.0
Initial Pheromone 1.0
Feature Value
BonS 0.5S4
BonH 5
BonC 2
BonV 1
78
Experimental Results (First Case)
NO. Target Protein Target Protein Our Method SCWRL 3.0 R3 Method
NO. Protein Length X1 X1 X1
1 1AAC 85 87.1 84.7/95 76.5/86
2 1AHO 54 85.2 68.5/67 64.8/65
3 1B9O 112 70.5 68.8/73 66.1/77
4 1C5E 71 81.7 81.7/86 73.2/82
5 1C9O 53 84.9 66.0/72 71.7/70
6 1CC7 66 80.3 68.2/83 63.6/79
7 1CEX 146 85.6 76.7/82 75.3/77
8 1CKU 60 81.7 76.7/82 68.3/80
Column 5-6 I UPAC-IUB rules / Xie and
Sahinidiss (R3) result
79
Experimental Results (First Case)
NO. Target Protein Target Protein Our Method SCWRL 3.0 R3 Method
NO. Protein Length X1 X1 X1
9 1CTJ 61 77.0 68.9/79 70.5/80
10 1CZ9 111 70.3 64.0/73 64.0/76
11 1CZP 83 79.5 77.1/86 73.5/81
12 1D4T 89 77.5 76.4/86 67.4/82
13 1IGD 50 82.0 68.0/74 54.0/68
14 1MFM 118 75.4 68.6/80 70.3/81
15 1PLC 82 72.0 67.1/72 70.7/71
16 1QJ4 221 71.5 72.9/84 67.9/80
17 1QQ4 143 83.9 73.4/78 71.3/78
80
Experimental Results (First Case)
NO. Target Protein Target Protein Our Method SCWRL 3.0 R3 Method
NO. Protein Length X1 X1 X1
18 1QTN 134 86.6 74.6/82 67.9/78
19 1QU9 99 79.8 71.7/81 73.7/78
20 1RCF 142 79.6 83.8/86 81.7/80
21 1VFY 63 79.4 69.8/76 71.4/75
22 2PTH 151 82.1 78.8/83 78.1/84
23 3LZT 105 73.3 78.1/86 69.5/82
24 5P2L 144 78.5 70.8/78 63.2/71
25 7RSA 109 75.2 65.1/75 61.5/67
Column 5-6 IUPAC-IUB rules / Xie and Sahinidiss
(R3) result
81
Experimental Results (Second Case)
NO. Target Protein Target Protein Our Method SCWRL 3.0 R3 Method
NO. Protein Length X1 X1 X1
1 1A8I 704 73.4 71.3 / 80 64.1 / 75
2 1B0P 978 70.8 62.3 / 69 - / 66
3 1BU7 399 74.9 70.4 / 78 64.4 / 72
4 1GAI 386 73.6 72.8 / 81 66.6 / 72
5 1XWL 496 71.5 66.7 / 73 61.5 / 72
Column 5-6 IUPAC-IUB rules / Xie and Sahinidiss
(R3) result
Write a Comment
User Comments (0)
About PowerShow.com