Title: Functional Site Prediction Selects Correct Protein Models
1Functional Site Prediction Selects Correct
Protein Models
Vijayalakshmi Chelliah vchelli_at_nimr.mrc.ac.uk Di
vision of Mathematical Biology National Institute
for Medical Research Mill Hill, London
Sixth International Conference on Bioinformatics
InCoB2007HKUST, Hong Kong 27th 30th August
2007
2- Functional site prediction - applications
- To predict function of the protein (Pazos
sternberg, 2004 PNAS 10114754-9) - In protein protein docking To select the
near-native docked solution. (Chelliah et al.,
2006 JMB 3571669-82). - In sequence-structure homology recognition and to
improve alignment accuracy (chelliah et al.,
2005 Proteins 61722-31)
3Gene sequence
Protein sequence
Predict structure De-novo/ab-initio
Xray/NMR
Protein structure
Protein structure
select correct models
Functional site prediction
4Overview
- De-novo protein structure prediction method
(decoy generation) - Functional site prediction method
- Evaluating models
- Conclusions
5De-novo protein structure prediction method
SEQUENCE ALIGNMENT
IDEAL FORMS
Predicted Res. burial
Predicted sec. structure
Fold Generation and scoring
Taylor (2002). Nature. 416657-660
Secondary structure stick level
Top 1/3
C? models
Threading
Top 100N
Residue level
Refinement
STRUCTURE PATTERNS
Top 100N
Main-chain level
Top 200 models
6Functional site prediction method
- Biochemically important residues are typically
found in close proximity and are also highly
conserved. - Functional site prediction is done using
CRESCENDO (gives scores for each residue
position).
Chelliah, V., L. Chen, et al. (2004). J Mol
Biol 342(5) 1487-504.
7CRESCENDO Functional site prediction method
Environment specific substitution table
Alignment position 1 2 3 4 5
6..
(sp1sp2sp3spspN)/N Expected substitution
pattern for each amino acid (q) at tth position
sp1 sp2 sp3 sp4 sp- sp- spN
Multiple sequence alignment of the homologous
sequences structure based sequence alignment
Observed substitution pattern for each amino acid
(p) at tth position
Divergent score between the observed (p) and
expected (q) substitution table
Overington et al., (1992). Protein Science
1216-26
8Assumptions
- Correct or near-native like models will have the
critical residues important for binding
(identified by CRESCENDO) to be in close
proximity to each other. - i.e. Functional residues in the correct models
form clusters - Functional residues in the incorrect models
might be scattered. - Can correct and incorrect models be distinguished
by looking at how the functional residues are
packed in the models? -
9Clustering of models
200 decoy models
Classify based on fold types
F1
F2
F3
F4
Fn
----
SAP Cluster rmsd- 2 Å PID 60 cut-off
----
Average C? coordinate of models of each cluster
is used to find the pair-wise distance between
residues. Taylor (1999). Prot. Sci. 8654-665.
10Model score
- Pair-wise distance and product of CRESCENDO
scores between each pair of residues (that are at
least 8 residues apart in the linear sequence)
are calculated. - The number (in ) of pair of residues that are
within the spatial distance of 12 Å, in the top
40 pairs (based on product of CRESCENDO scores)
was calculated. - The percentage scores were added in each step (in
steps of 5 pairs) to get the final score of the
models.
11Good and poor models of same fold type
2trxA- 34 clusters (with 2Å rmsd and 60 PID)
were obtained from 81 correct models
Why clustering between models of same
type needed? Function site prediction differs
between models of same type due to a)
difference in loop conformation, b) beta
strand or helix shift even by a single residues.
So, even correct folds might have poor models
(based on site prediction).
123chy
1
C-term
H1
H5
N-term
S4
2
S3
S2
S1
S5
3
H2
H4
H3
Helix and strand order H1(1,5)S2(2,1,3,4,5)H3
(2,3,4)
13Proximity plot3chy Best model in each foldtype
native
Correct model
14Decoy fold distribution for 3chy
Fold type Strand and helix order No. of models in each fold type in 200 models No. of cluster with 2Å rmsd 60 PID cut-off Score of the best model
native H1(1,5)S2(2,1,3,4,5)H3(2,3,4) - - 330.96
F1 H1(1,5)S2(2,1,3,4,5)H3(2,3,4) 161 61 314.76
F2 H1(1,5)S2(2,1,3,4,5)H3(2,3,4) 3 2 202.21
F3 H1(1,5)S2(2,3,1,4,5)H3(2,3,4) 16 11 145.19
F4 H1(1,3,4)S2(2,1,3,4,5)H3(2,5) 2 2 150.83
F5 H1(1,4)S2(2,3,1,4,5)H3(2,3,5) 1 1 108.62
F6 H1(1,3,5)S2(2,1,4,3,5)H3(2,4) 11 7 250.20
F7 H1(1,5)S2(2,1,3,4,5)H3(2,3,4) 5 4 260.29
F8 H1(1,5)S2(2,1,3,4,5)H3(2,3,4) 1 1 67.24
15Summary plot 3chy
16PDB (length) Top 4 ranking models rmsd (PID ) Top 4 ranking models rmsd (PID ) Top 4 ranking models rmsd (PID ) Top 4 ranking models rmsd (PID )
PDB (length) Rank-1 Rank-2 Rank-3 Rank-4
3chy (128) 3.6 (70.6) 3.8 (63.0) 4.8 (65.6) 8.83 (21.8)
1cozA (126) 6.5 (63.1) 9.0 (77.3) 14.2 (61.3) 7.9 (80.3)
2trxA (108) 4.7 (100.0) 12.9 (77.6) 14.1 (100.0) 5.6 (100.0)
1f4pA (148) 5.9 (80.1) 5.3 (82.9) 5.8 (100.0) 14.6 (100.0)
1di0A (147) 4.6 (82.5) 16.2 (71.4) 16.1 (96.5) 5.8 (53.6)
17PDB (length) Top 4 ranking models rmsd (PID ) Top 4 ranking models rmsd (PID ) Top 4 ranking models rmsd (PID ) Top 4 ranking models rmsd (PID )
PDB (length) Rank-1 Rank-2 Rank-3 Rank-4
1v9w (130) 13.4 (100.0) 11.3 (76.2) 6.9 (77.7) 6.3 (100.0)
1rlj (135) 13.7 (94.8) 4.9 (88.0) 11.2 (94.3) 13.7 (100.0)
1kjnA (159) 3.4 (26.3) 5.0 (59.7) 5.0 (62.6) 9.4 (5.8)
1vq1A (178) 8.5 (80.3) 9.5 (100.0) 7.1 (89.5) 7.9 (94.3)
1uxoA (186) 13.7 (90.7) 11.4 (100.0) 8.9 (100.0) 11.8 (94.6)
1t57A (186) 14.8 (32.0) 9.8 (77.2) 14.9 (96.2) 9.9 (92.2)
1vk2A (187) 16.4 (100.0) 14.7 (100.0) 14.5 (97.3) 15.9 (94.2)
18Thioredoxin 2trxA
correct
incorrect
incorrect
H5
Rank 1
Rank 4
Rank 10 (last)
19Conclusions
- The requirement of proteins to form functional
sites - used to select the correct protein fold. - In larger proteins, difficult due to the
conformation of longer loop - The competing incorrect folds - mostly strand
swapped models. - Discriminates between incorrect fold and correct
efficiently when the direction of secondary
structure element that contain functional
residues is altered and when the fold is messy.
20Thanks to
- Dr Willie Taylor
- National Institute for Medical Research,
- Mill Hill,
- London, UK.
- Prof Sir Tom Blundell
- Department of Biochemistry,
- University of Cambridge,
- Cambridge, UK.