Title: 6B -1
1The Prediction of Protein Structures
2Amino Acids (???)
???????????,?20?
3Amino Acids (???)??
4Protein (???)??
5Primary Structure (????) of Protein
- Primary structure primary sequence of amino
acids - ?????(?????)??????
6Secondary Structure (????) of Protein
- Secondary structure
- ?-helix
- ?-sheet
- loop
7Tertiary Structure (????) of Protein
8Quaternary Structure (????) of Protein
9?????
?? http//elearning.bioinfo.ntu.edu.tw/
10???????
?? http//elearning.bioinfo.ntu.edu.tw/
11Relation between Structures
- Sequence ? structure ? function
12Reason for Prediction
- Why do we need protein structure prediction?
- Biological technique
- X-ray Crystallography (X-ray ???)
- Nuclear Magnetic Resource(NMR)(????)
- Expensive, time-consuming and limit to small or
medium protein( 700 residues) - ? Computational strategies
-
13Prediction Competition
- Advance the methods of identifying protein
structure from sequence - CASP(Critical Assessment of Techniques for
Protein Structure Prediction ) - http//predictioncenter.org
- Every 2 years(1994 now)
- CASP6(Gaeta, Italy, Dec. 2004)
- CASP7(Pacific Grove, USA, Nov. 2006)
14(No Transcript)
15Accuracy Measurement
- RMSD(Root Mean Square Deviation )
Distance RMSD
16Prediction of Protein Structures
- Ab Initio Methods(?????)
- Thermodynamics (?????)
- Without reference from other known structures.
- Homology Modeling(?????)
- Knowledge-based modeling
- Sequence similarity
- More accurate
17Previous Works
- PHDthreader(http//www.embl-heidelberg.de/predictp
rotein) - lt 30 of the predicted first hits are true remote
homologues - Ab initio method
- SWISS-MODEL(http//expasy.hcuge.ch/swissmod/SWISS-
MODEL.html) - An automated knowledge-based protein modeling
server - InsightII(http//www.accelrys.com/products/insight
/index.html)(Charged) - Protein structure prediction
- Paircoil(http//ostrich.lcs.mit.edu/cgi-bin/score)
- Prediction of coiled coil regions
- List of other methods or programs
- http//restools.sdsc.edu/biotools/biotools9.html
18Properties of Ab Initio Methods
- Score functions
- HMM(Hidden Markov Model)
- electrostatics(??), VdW(????) and H-bonds(??) and
others. - Hydrophobic(???) and hydrophilic(???)
- ? Protein folding problem
19Homology Modeling
- General presumption
- Little changes on protein sequence would also
alter little changes on structure. - Protein identity gt 30
- General procedure
- Database searching and template selection (????)
- Energy minimization(?????)
- Rationality evaluation(?????)
20General Procedure of Protein Structure Prediction
on Homology Model
- Input S1SSKCSRLKTFPQNACVYHK
- Output The backbone conformation model of S1.
- Step 1 Select a template.
- S2SVYCSSLACSDHN
- Step 2 Perform sequence alignment.
- S1SSKCSRLKTFPQNACVYHK
- S2SVYCSSL------ ACSDHN
21- Step 3 Find the structurally conversed regions.
Copy the coordinators of structurally conversed
regions from S2 to S1.
22(No Transcript)
23- Step 4 Apply the folding algorithm to position
the residues that lose of sequence similarity. -
- LKTFPQNA 10011001
24- Step 5
- - Find the the structure-known proteins with
70 or higher sequence similarity. - - Construct a segment of B-spline curve for
every four points.
25Final Conformation
26Template Search on Protein Databases
- PDB(Protein Data Bank)
- http//www.rcsb.org/pdb/
- Swiss-prot
- http//tw.expasy.org/sprot/
- Classification
- CATH(Class, Architecture, Topology and Homologous
superfamily) - http//cathwww.biochem.ucl.ac.uk/latest/
- SCOP(Structural Classification of Proteins)
- http//scop.mrc-lmb.cam.ac.uk/scop/index.html
27(No Transcript)
28Template Selection Methods (Tools)
- How to select?
- Sequence alignment
- ClustalW, Blastp and others
- Secondary structure predictionAl-Lazikani et
al. - ? Structural reserved blocks (??????)
29PAM250 Score Matrix
30Blosum62 Matrix
31Protein Folding Problem
- Given the primary structure of a protein, to
compute its 3-dimensional structure. - H-P model was Proposed by Dill in 1985 Dill85
- Minimizing the total free energy
- The characteristic of each of 20 amino acids
- H (hydrophobic, non-polar) 1
- (hating water, ???)
- P (hydrophilic, polar) 0
- (loving water, ???)
- The amino acid sequence of a protein can be
viewed as a binary sequence of Hs (1s) and Ps
(0s).
32Example of H-P Model
- Input sequence 011001001110010
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
1
0
1
1
1
0
0
0
1
0
0
0
0
Score 5
Score 3
33Protein Folding on H-P Model
- The protein folding on H-P model Given a
sequence of 1s (Hs) and 0s (Ps), to find a
self-avoiding paths embedded in either a 2D or 3D
lattice such that the number of pairs of adjacent
1s is maximized. - NP-complete even for 2D lattice Hart97.
34U-Fold Algorithm for HP
- Find a suitable point where to split the string
into two substrings. - Example 0100101001110101000010
- 0100----101001
- 01000010101--1
35Ant Colony Optimization System
- The ant colony optimization (ACO) algorithm was
presented by Dorigo et al. in 1991.
36General Lattice Model
Square Lattice Model
Triangular Lattice Model
37Experiments of Different Models
1b1u 1a6n 118l 102l 1b8k
Cubic 12.08891 13.35721 13.01421 13.98656 17.50644
FCC 10.18907 12.09836 12.39913 11.93452 15.06346
FCC Face Center Cubic Model
- Measured by RMSD(Å)
- Data source PDB
- Folding by genetic algorithm
38Structure Alignment by Curve Fitting
39Curve Matching
- Curve matching - measure function
40- - Apply the curve alignment.
- Our score function of the curve alignment
41Additional Constraints
- Improvement on the HP model
- Prediction results are not successful enough
- Consideration of hydrophobicity is not enough.
- Other features should also be considered
- Secondary structure elements (SSEs)
- a helix
- b sheet
- Electrostatic attractions
- Disulfide bonds
42Electrostatic Attractions and Disulfide Bonds
- Electrostatic attractions
- Disulfide bond formed between two Cs
43Probabilistic Disulfide Bonds
- Folding with the constraint of disulfide bonds.
44Experiments for Disulfide Bonds
- Experiments of folding with disulfide constraints
45Secondary Structures
- Conformations of a helix
- Distance between ith amino acid
- and (i4)th amino acid
46Secondary Structures
47Further Improvement--Sliced Lattice Model
- The origin lattice models cannot work well.
- Slice the lattice into little lattices.
48Sliced Lattice Model
49Global Folding
50Experimental Materials
- Database PDB (http//www.rcsb.org/pdb/)
- April 17, 2005
- 20,380 proteins
- Data of CASP6 (http//predictioncenter.llnl.gov/)
- 2004
- Alignment Blastp (http//www.ncbi.nlm.nih.gov/)
- Sequence identity lt 90
- Blosum-62
51Experiment Results
- Target protein 1LIN (146)
Template Protein Sequence Similarity RMSD(03) RMSD(04) RMSD(05)
1CFD 100 7.34 - -
1TNW 69 18.72 13.37 10.56
1IQ5 55 15.15 9.18 7.35
1DTL 52.9 10.22 7.48 6.17
5PAL 36.4 12.18 8.43 5.89
Measured by RMSD
52Experiment Results
Template Protein Sequence Similarity RMSD(03) RMSD(04) RMSD(05)
1JYQ 90.4 4.15 - 4.24
1JYU 90.4 13.89 - 10.89
1SHA 46.7 4.82 4.82 3.65
1SHD 45.2 8.89 6.77 5.55
5PDR 24.4 10.55 8.0 6.76
Measured by RMSD
53Experimental Results of CASP6
of proteins 77
of positive improvement 59
of negative improvement 12
Average improvement 21.44
Average sequence length 208(53435)
Average template identity 36
Average template similarity 21
54Compared with Palu et al.
- Palu et al.Palu04, without template
- FCC lattice model
55Comparing with Zheng et al.
- Zheng et al.
- Zheng02
- Homology
- Lattice model
56An Example of Our Results
- PDB code 7RSA, Length124, RMSD 1.48Å
Our result
Real structure
57Protein Structure Prediction System
- target protein 7RSA
- Step 1 Prepare
58 Protein Structure Prediction
Systemhttp//par.cse.nsysu.edu.tw/main.html
59 Protein Structure Prediction System
60 Protein Structure Prediction System
61 Protein Structure Prediction System
62 Protein Structure Prediction System
Our result
Real structure
RMSD
63 Protein Structure Prediction System
Our result
Real structure
64Protein Side Chain Packing
65Amino Acids Side-chain
- Elements of protein
- Three groups
Lysine (LYS)
Side-chain ?
66Protein Structure Prediction
- Input 1D sequence
- Output 3D structure
- 3D backbone structure in general
- Protein structure
- Backbone structure side-chain structure
ACE GLY ASP VAL GLU LYS GLY LYS LYS ILE PHE VAL
GLN
67Backbone and Side Chain
Backbone
Side-chain
Protein SAV1595, Journal of Biomolecular NMR
(2004) 29 391394
68Protein Side Chain Packing Problem
- PSCPP
- Given the fixed backbone of the protein
- For each residue of backbone other than Glycine,
there is a set of possible rotamers. - Problem Choose one suitable rotamer for each
residue, such that the total energy of the
protein is minimized. - The PSCPP is NP-hard.
69Graph Model of PSCPP Problem
- Let R r1, r2, . . . , rn be the set of
residues of the target protein. - Let an undirected graph G (V, E) represent the
side chain of a protein. - Vi vi,j vi,j does not collide with each
backbone atoms . - Then we have V ?Vi and E (vi,j ,
vi1,k)vi,j does not collide with vi1,k.
rotamer
70Dihedral Angles
- Side-chain Atoms
- C?, C?, O?.
- Dihedral Angles Iupa70
- f Ci-1-Ni-Cai-Ci
- j Ni-Cai-Ci-Ni1
- X1 Ni-Cai-Cbi-Oi
71The Rotamer Library
- The accuracy of side chain prediction depends
primarily on the quality of rotamer library. - Our rotamer library is a coordinate rotamer
library, which reserves the bond lengths and bond
angles that do not appear in the standard rotamer
library. - The source of our rotamer library is based on 850
proteins, which are the same as the
backbone-dependent rotamer library proposed by
Dunbrack and Karplus. Dunb93
72Example of the Rotamer Library
- A.A. f ? X1 Prob.3-D Coordinate
73Formulas of ACO for PSCPP
- Pheromone probability formula
- Pheromone update formula
- 0 ltrlt 1, is the rate of the pheromone evaporation
74ACO Prediction for PSCPP
- Input A backbone coordinate data.
- Output The route with near minimum score.
- Step1 Set parameters and initialize pheromone
trails. - Step 2 Each ant k chooses one rotamer u of
residue i according to the probability function
pk(s, u) for all 1 i n, u? Vi. - Step3 Update the pheromone trails.
- Step 4 If current best solution has not exceeded
some percent after some predefined generations or
the number of generations has reached the
predefined value, return the route with minimum
score otherwise, go to Step 2.
75The Score Function
- Features in ACO score functions
- The disulfide bonds
- S1 BonS ? (disulfide bonds),
- The hydrogen bonds
- S2 BonH ? (hydrogen bonds),
- The charge-charge interactions
- S3 BonC ? ((different charge pairs)- (same
charge pairs)), - The van der Waals interactions
- S4 BonV ? ? Ei,j
- Energy score function E S1 S2 S3 S4
76Experiments
- Two test sets
- 25 proteins from Xiang and Honig 2001
- 5 proteins from Canutescu et al. 2003
- Cutoff value
- 20 Xie06, R3
- If X1 is within 20 of corresponding angle in
the real structure, the prediction angle would be
considered correct. - Comparing with SCWRL 3.0 Canu03 and R3 Xie06
77Parameters in Experiments
- Weights of features in score function
- Parameters used in ACO Algorithm
Parameter Value
Population 50
Generation 300600
a 1.0
b 1.0
Initial Pheromone 1.0
Feature Value
BonS 0.5S4
BonH 5
BonC 2
BonV 1
78Experimental Results (First Case)
NO. Target Protein Target Protein Our Method SCWRL 3.0 R3 Method
NO. Protein Length X1 X1 X1
1 1AAC 85 87.1 84.7/95 76.5/86
2 1AHO 54 85.2 68.5/67 64.8/65
3 1B9O 112 70.5 68.8/73 66.1/77
4 1C5E 71 81.7 81.7/86 73.2/82
5 1C9O 53 84.9 66.0/72 71.7/70
6 1CC7 66 80.3 68.2/83 63.6/79
7 1CEX 146 85.6 76.7/82 75.3/77
8 1CKU 60 81.7 76.7/82 68.3/80
Column 5-6 I UPAC-IUB rules / Xie and
Sahinidiss (R3) result
79Experimental Results (First Case)
NO. Target Protein Target Protein Our Method SCWRL 3.0 R3 Method
NO. Protein Length X1 X1 X1
9 1CTJ 61 77.0 68.9/79 70.5/80
10 1CZ9 111 70.3 64.0/73 64.0/76
11 1CZP 83 79.5 77.1/86 73.5/81
12 1D4T 89 77.5 76.4/86 67.4/82
13 1IGD 50 82.0 68.0/74 54.0/68
14 1MFM 118 75.4 68.6/80 70.3/81
15 1PLC 82 72.0 67.1/72 70.7/71
16 1QJ4 221 71.5 72.9/84 67.9/80
17 1QQ4 143 83.9 73.4/78 71.3/78
80Experimental Results (First Case)
NO. Target Protein Target Protein Our Method SCWRL 3.0 R3 Method
NO. Protein Length X1 X1 X1
18 1QTN 134 86.6 74.6/82 67.9/78
19 1QU9 99 79.8 71.7/81 73.7/78
20 1RCF 142 79.6 83.8/86 81.7/80
21 1VFY 63 79.4 69.8/76 71.4/75
22 2PTH 151 82.1 78.8/83 78.1/84
23 3LZT 105 73.3 78.1/86 69.5/82
24 5P2L 144 78.5 70.8/78 63.2/71
25 7RSA 109 75.2 65.1/75 61.5/67
Column 5-6 IUPAC-IUB rules / Xie and Sahinidiss
(R3) result
81Experimental Results (Second Case)
NO. Target Protein Target Protein Our Method SCWRL 3.0 R3 Method
NO. Protein Length X1 X1 X1
1 1A8I 704 73.4 71.3 / 80 64.1 / 75
2 1B0P 978 70.8 62.3 / 69 - / 66
3 1BU7 399 74.9 70.4 / 78 64.4 / 72
4 1GAI 386 73.6 72.8 / 81 66.6 / 72
5 1XWL 496 71.5 66.7 / 73 61.5 / 72
Column 5-6 IUPAC-IUB rules / Xie and Sahinidiss
(R3) result