Prediction Enhancement of ProteinWater Binding Conservation through Evolutionary Computation - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Prediction Enhancement of ProteinWater Binding Conservation through Evolutionary Computation

Description:

Applying weights to measured features can improve the accuracy of a k-nearest ... We use it to optimize feature weights and remove unnecessary features for both ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 13
Provided by: michaelr95
Category:

less

Transcript and Presenter's Notes

Title: Prediction Enhancement of ProteinWater Binding Conservation through Evolutionary Computation


1
Prediction Enhancement of Protein-Water Binding
Conservation through Evolutionary Computation
  • Michael Peterson, Travis Doom, Michael Raymer
  • Abstract
  • The design of drugs to combat various
    diseases is an extremely expensive and
    time-consuming process. Potentially,
    computational ligand screening will reduce the
    time and expense associated with drug lead
    discovery. Correctly predicting sites of water
    conservation on a protein surface can
    significantly increase the accuracy of ligand
    screening efforts. Traditional classification
    methods make correct predictions with
    approximately 60 accuracy. The goal of our
    research is to improve prediction accuracy by
    applying evolutionary computing (EC) to
    traditional methods of data classification. We
    present a method that improves accuracy by
    applying EC feature selection and extraction
    techniques to k-nearest neighbor and naïve Bayes
    classifiers. In order to facilitate this
    research, a versatile EC engine was developed in
    Java. Despite Javas object oriented nature, few
    general-purpose Java-based EC engines exist. Our
    engine, with several unique features, will
    therefore be useful to the EC community, and will
    be available via the World Wide Web.

2
I. Protein-Ligand Binding
  • When a ligand binds to the surface of a protein,
    water molecules near that position will either be
    conserved or displaced.
  • Our goal is to accurately predict sites of water
    conservation with high accuracy.

Protein surface
Water molecule
Ligand
3
II. Protein-Water Measurements
  • Within a Set of 30 Proteins, There Are
  • 8 Features Measured for Each Water Molecule
  • Temperature factor (BVAL)
  • Atomic Density (ADN)
  • Atomic Hydrophilicity (AHP)
  • Hydrogen bonds to protein (HBDP)
  • Hydrogen bonds to water (HBDW)
  • Mobility (MOB)
  • ABVAL (Avg. B-val of protein atom neighbors)
  • NBVAL (Net B-val of protein atom neighbors)

4
III. Feature Weighted knn-classification
a.
Applying weights to measured features can improve
the accuracy of a k-nearest neighbor classifier.
Weights can be optimized by a genetic algorithm.
P
W
W
W
Feature 1
P
W
P
P
W
W
P
P
P
Feature 2
P
b.
W
W
W
Feature 1
P
W
P
W
W
P
P
P
P
Feature 2
Scale Extended
5
IV. A Parameterized Discriminant for a Bayes
Classifier
  • A confidence value for each class P is computed
    for each class.
  • The class with the greatest value for P is
    selected.
  • When C1 C2 .. Cd 1, the discriminant
    function is equivalent to the naïve bayes
    classifier.

The values for the coefficients, C1..Cd, are
supplied by an evolutionary computation optimizer.
6
Va. A Simple Evolutionary Algorithm
Over a number of generations, the values in a
population of individuals are optimized via
operations abstracted from natural selection.
?
Increasing
Fitness
Evaluation
Selection
Genetic Operators
7
V. GA/Classifier Hybrid Architecture
Mask Vector Masks used to hide features during
classification as a method of feature selection.
Genetic Algorithm
W1W2...W8 M1 M2...M8
KNN Classifier
W1W2...W8 M1 M2...M8
W1W2...W8 M1 M2...M8
Population of feature weight mask sets
W1W2...W8 M1 M2...M8
W2
...
...
W1
Fitness Based on the number of correct predictions
using the weight vector the number of masked
features
Weight Vector Weights to use for each
feature axis during classification.
8
VI. Original EC Engine
  • In order to optimize weights masks, we
    implemented a new EC engine with several useful
    features.
  • Chromosome consists of a feature vector and an
    optional mask vector. Mask bits are used to
    block features from being passed to the fitness
    function.

F1 F2 Fn M1 M2 Mn
  • Groups feature vector may consist of groups of
    real, integer, or boolean values. Each group may
    have its own mutation rate and reproduction
    method.
  • Mutation Each group may have its own mutation
    rate and method, and the mask vector has a
    separate mutation rate and method. Random, range
    based, and variance mutations are permitted.
  • Reproduction 1-point, 2-point, or uniform
    crossover is permitted, either on a per group
    basis, or across the entire feature chromosome.
    These methods also apply to the mask chromosome.

9
VII. Original EC Engine
  • Our EC engine is implemented in Java, adding
    versatility portability.
  • The user provides the fitness function, which
    can be implemented in either Java or C/C.
  • Our engine is ideally suited for feature
    selection extraction problems, such as this
    protein-water binding example.
  • We use it to optimize feature weights and remove
    unnecessary features for both knn bayes
    classifiers.
  • The engine can be used for many problems which
    genetic algorithms have been been applied in the
    past.

10
VIII. Conserved vs. Non-Conserved
0.000
0.000
0.000
0.000
0.000
0.683
59.83
60.68
60.25
3.40
2.48
69
0.000
0.000
0.000
0.336
0.000
0.000
0.000
0.664
This data is the result of previous work on
this problem by Mike Raymer, using a knn
classifier with a genetic algorithm implemented
by the University of California, San Diego.
11
VIIIb. Determinants of Solvation
  • A Continuum of Favorability
  • Binding
  • Atomic hydrophilicity
  • Atomic density
  • Hydrogen bonding potential
  • Conservation
  • Low B-Value, High Occupancy
  • Many hydrogen bonds to protein atoms

12
IX. Other EC Applications
  • Structural Bioinformatics
  • Prediction of metal binding sites
  • Location of protein active sites
  • Prediction of drug lead activity QSAR
  • Gene Classification Discovery
  • Prediction of gene function from microarray data
  • Ligand Screening and Docking
  • SLIDE, Volker Schnecke, MSU
  • Crystallographic Solvent Fitting
  • Xfit Duncan McRee, Scripps
Write a Comment
User Comments (0)
About PowerShow.com