Diapositive 1 - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Diapositive 1

Description:

(Haddock exploration) Detection of one of the best solutions in the set among 200 conformations. ... (Haddock exploration) ... Haddock: 5 CAPRI targets analysed ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 37
Provided by: virtualg
Category:

less

Transcript and Presenter's Notes

Title: Diapositive 1


1
A new protein-protein docking scoring function
based on Voronoï tessellation
Yeast Structural Genomics IBMMC CNRS UMR 8619
Université Paris-Sud Orsay LEBS CNRS UPR 9063
Bât 34 Gif sur Yvette
2
Systematic searches for protein-protein complexes
in Yeast Thousands of protein-protein complexes
have been identified by genetic and biochemistry
techniques. Thousands of complexes
obtained in which a large part corresponds to
false positives. ? Need for efficient algorithms
to analyse those protein-protein complexes.
3
Protein-protein docking algorithms the scoring
issue
Protein-protein docking procedure steps -
exploration - scoring In this study -
exploration is done using Dock from Wodak Janin
(1) - scoring procedure is done using no
biological information
(1) Janin J. Wodak S. Biopolymers 1985
4
Voronoï tessellation
5
Voronoï tessellation
6
Voronoï tessellation as the dual of Delaunay
triangulation
7
Voronoï tessellation
Complexity for 3D building - Naive algorithm
O(n3) - Incremental randomized algorithm (CGAL)
O(n2) Computation time 10000 points in 3D
less than a second
http//www.cgal.org
8
Voronoi cells of amino acid residues
Two residues are Voronoï neighbors if their
Voronoï cells share a face.
Voronoï cell of a leucine (leu 64 - complex 1BTH)
Schematic view of Voronoï neighbourhood
definition
Neighbourhood definition is not based on distance.
9
Building solvent sphere
Complex 1BTH and its "solvent sphere"
10
Voronoï protein-protein interface definition
Voronoï interface
A residue is at the Voronoï interface between
two subunits if at least one of its Voronoï
neighbours belongs to another subunit and none of
its Voronoï neighbours is of solvent type.
11
Measurements
12
Distances
All distance distributions between cell
centroids at the interface were computed. Those
may be fitted on classical statistical
functions. Usually neighbourhood between
residues is defined from distance and cannot be
used as a parameter. Here neighborhood is defined
from Voronoï diagram and is not based on
distance.
Number
Number
Distance
Distance
Distance distribution between leucine and valine
at the interface
Distance distribution between lysine and aspartic
acid at the interface
13
Searching for a scoring function
Need for a learning dataset - exhaustive
extraction of non-redundant binary
protein-protein complexes dataset on the whole
PDB release ? 102 binary complexes -
generation of non-natives conformations for
learning Validation - ten-fold
cross-validation for each machine learning
procedure
14
Learning attributes
  • Input variables
  • - Voronoï interface area 1
  • At the interface
  • - number of amino acids 1
  • - amino acid frequencies 20
  • - mean Voronoï volumes 20
  • - amino acids pair frequencies 210
  • - mean pair distances 210
  • Total 462 variables

15
Grouping of input parameters
According to physico-chemical
properties 6 groups - 21 (67/2)
possible distances 84 (11202121) input
variables for learning procedures
16
Machine learning procedures
Logistic function Support Vector Machines
(SVM) (wont be described here) ROc based
GEnetic LearneR (ROGER)
17

First Model Logistic Function
Parameters are discriminating
18
ROC curve for logistic function
19
Relative performance Logistic function and mean
square deviation
20
ROGER (1) ROc based GEnetic learneR
  • Genetic algorithm
  • Goal learning a ranking function by optimization
    of the area under the ROC curve (AUC)
  • Input set of putative complexes described by
    values of measures of interest
  • Output a rank for each putative complex
    indicating the relevance

(1) M. Roche, J. Azé, Y Kodratoff, M.Sebag,
ECAI04, 2004
21
ROGER ROc based GEnetic learneR
Genetic algorithm
Initialisation
Parents (20 non-linear functions)
Replacement Best of 20200
Stop ? (fitness AUC, iterations...)
200 Children
Selection Based on fitness
Mutation Crossover
scoring function
- 21 independent runs - 10 fold Cross-Validation
22
Relative performance ROGER and Logistic function
23
The CAPRI Experiment
http//www.capri.ebi.ac.uk
24
Procedure used on CAPRI targets
In this study - exploration is done using
Dock from Wodak Janin (1) and Haddock from
Dominguez, Boelens and Bonvin(2) - scoring
procedure is done using no biological
information - scoring function was obtained
with ROGER (3)
(1) J. Janin J S. Wodak (1985) Biopolymers
24509 (2) C. Dominguez, R Boelens and A. Bonvin
(2003) J. Am. Chem. Soc. 1251731 (3) M. Roche,
J. Azé, Y Kodratoff, M.Sebag (2004) ECAI04
25
CAPRI Target 11 Cohesin-dockerin complex Rank
3 (Dock exploration)
Percentage of interface residues predicted 43
26
CAPRI Target 11 Cohesin-dockerin complex Rank
1-5 (Dock exploration)
27
CAPRI Target 11 Cohesin-dockerin complex Rank
1-5 (Dock exploration)
28
CAPRI Target 11 Cohesin-dockerin complex Rank
4 (Haddock exploration)
Percentage of interface residues predicted 87
29
CAPRI Target 11 Cohesin-dockerin complex Rank
1-5 (Haddock exploration)
30
Conclusion
  • To compare scoring performance, 10 classes were
    defined depending of the fraction of native
    contacts.
  • Dock 10 CAPRI targets analysed
  • - For 9 targets (9-15,7-19) one of the best
    class solution of the set was ranked number 1.
  • - For 1 target (16) solution ranked number 1 was
    belonging to the 2nd best class.
  • Haddock 5 CAPRI targets analysed
  • - For 3 targets (11,13,14) one of the best class
    solution of the set was ranked in top 4. Top 50
    contains very few false positives.
  • - For 2 targets (12,15) one of the best class
    solution of the set was ranked in top 2. But top
    50 contains false positives.

31
Perspectives
Among all those targets, improve detection of
false positives in top 50. First results of
scoring function are very promising. Try other
fitness functions. Protein-protein complexes
atomic refinement needed. Address the ligand
problem.
32
Acknowledgments
Anne Poupon Herman van Tilbeurgh Joël
Janin
Jérôme Azé
Aalt-Jan van Dijk Alexandre Bonvin
33
First Model Logistic Function
Research of the influence of the explicative
variables xi on the response variable Y, Y that
only have two possible values, 1 or 0 (i.e. true
or false) Influence of xi on success rate
Regression Logistic regression model
Each variable influence is known This
logistic function is equivalent to a perceptron,
i.e. a neural network without hidden
layer Estimation of variable weights by
maximum likelihood with R software
probability Cross-validation
34
A second model
Measure of mean square deviations Each
variable xi measured on a complex is compared to
its mean value on true complexes reference
set. That means
35
Recall and sensitivity definitions
36
Volumes
Mean Voronoï volumes for the amino acids
Correlation between core Voronoï volumes and
Pontius volumes ?0.93 ? linear
relationship Mean Voronoï volumes of interface
cells are significantly larger than Voronoï
volumes of inner residues.
buried residues interface residues Pontius
Volumes (Ų)
Amino acid
Interface residues, buried residues and Pontius
(1) volumes of amino acids
(1) Pontius et al., J. Mol. Biol., 1996 Nov
22264(1)121-36
Write a Comment
User Comments (0)
About PowerShow.com