Title: PREDICTING LOCAL STRUCTURAL CANDIDATES
1PREDICTING LOCAL STRUCTURAL CANDIDATES FROM
SEQUENCE
Cristina Benros, Alexandre G. de Brevern and
Serge Hazout Equipe de Bioinformatique Génomique
et Moléculaire (EBGM) INSERM E0346, Université
Denis Diderot - Paris 7, case 7113 2, place
Jussieu, 75251 Paris, FRANCE (benros_at_ebgm.jussieu.
fr)
- We describe a new method for local protein
structure prediction from sequence. - It is based on the building of a structural
profile that characterizes a library of protein
structural fragments. - Our goal is to propose long structural fragments.
2METHODS
1) How is built the structural profile ?
2) What is the principle of the prediction
method ?
3METHODS The protein fragment databank
The Protein 3D structures are encoded by a
structural alphabet
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Protein 3D structure databank (1120 structures)
1D representation
16 Protein Blocks (PBs)
de Brevern AG, Etchebest C, Hazout S. Bayesian
probabilistic approach for predicting backbone
structures in terms of protein blocks, Proteins
200041271-287.
4METHODS The protein fragment databank
The 1D representation is cut into overlapping
fragments of 7 PBs, i.e. 11 amino acids
1D representation
Databank of 277,618 protein fragments
de Brevern AG, Etchebest C, Hazout S. Bayesian
probabilistic approach for predicting backbone
structures in terms of protein blocks, Proteins
200041271-287.
5METHODS Learning of the protein fragment
databank
The Hybrid Protein Model approach
? Iterative process resulting in a multiple
structural alignment of the fragments
? The final result corresponds to a structural
profile characterizing a library of
representative fragments
? Analysis of the amino acid sequence specificity
Benros C, de Brevern AG, Hazout S. Hybrid Protein
Model (HPM) A method for building a library of
overlapping local structural prototypes.
Sensitivity study and improvements of the
training. IEEE Int Work 200353-70.
6METHODS Proposition of local structural
candidates
Structural profile based method
? Computation of a scoring matrix (1) between the
amino acid query sequence (2) and the HPM
structural profile (3)
? Search for optimal trajectories (4), i.e.
structural sub-profiles (5)
? Extraction from the databank of structural
candidates that strongly match the structural
sub-profiles
7RESULTS Local structure prediction evaluation
Evaluation with the rmsd criterion
The table gives the proportion of sites with at
least 1 candidate at less than 2.5 Å (3 Å,
respectively) from to true local structure, when
1, 3 or 5 candidates (cd) at most are proposed.
The distribution of the Ca rmsd of pairs of
11-residue fragments randomly chosen has a mean
of 4.47 Å 1.09. The test set is composed of 262
proteins (56,143 fragments).
Evaluation with other criteria
8RESULTS Example of structural candidates
? Structural candidates proposed for E. coli Che
Y (PDB code 3chy)
(a)
Candidates confidence region ()
True local structure (22-32)
1) 1.20 Å
mmnopac
mmnopac
(b)
True local structure (47-57)
Candidates confidence region ()
iabdcdd
9RESULTS Example of structural candidates
? Structural candidates proposed for E. coli Che
Y (PDB code 3chy)
(c)
True local structure (39-49)
Candidates confidence region ()
2) 3.53 Å
1) 1.54 Å
mmmmmnn
mmmmmmk
mmmmmmn
(d)
(d)
Candidates confidence region (- -)
1) 2.47 Å
lcfklmm
(c)
True local structure (59-69)
iafklmm
10CONCLUSION
? New methodology to predict protein local
structure from sequence by proposing structural
candidates of long size (11-residue
fragments). ? 60.4 (44.3) of the sites have
at least one candidate at less than 3 Å (2.5 Å)
of the true local structure when 5 candidates at
most are proposed. ? Optimization with the
improvement of the sequence structure
relationship. ? Contribution to ab initio
global structure prediction as structural
constraints or by combinatorial assembly.