Title: Enhancing Search Space Diversity in MultiObjective Evolutionary Drug Molecule Design using Niching
1Enhancing Search Space Diversity in
Multi-Objective Evolutionary Drug Molecule Design
using Niching
M.T.M Emmerich1 T. Bäck1,3
A. Aleman1 A.P. IJzerman2 E. van der Horst2
J.W. Kruisselbrink1 A. Bender2
1. Leiden Institute of Advanced Computer Science
(LIACS) 2. Leiden/Amsterdam Center for Drug
Research (LACDR) 3. NuTech Solutions, Inc.
2Scope drug design and development
- Search for molecular structures with specific
pharmacological or biological activity that
influence the behavior of certain targeted cells - Objectives Maximization of potency of drug (and
minimization of side-effects) - Constraints Stability, synthesizability,
drug-likeness, etc. - A huge search space 1020-1060 drug-like
molecules - Aim provide the medicinal chemist a set of
molecular structures that can be promising
candidates for further research
3Molecule Evolution
- Normal evolution cycle
- Graph based mutation and recombination operators
- Deterministic elitistic (µ?) parent selection
(NSGA-II)
Fragments extracted from From Drug Databases
4Molecule Evolution
5Fitness
- Objectives
- activity predictors based on support vector
machines - f1 activity predictor based on ECFP6
fingerprints - f2 activity predictor based on AlogP2 Estate
Counts - f3 activity predictor based on MDL
- Constraints
- a fuzzy constraint score based on Lipinskis rule
of five and bounds on the minimal energy
confirmation
6Desirability indexes for modeling fuzzy
constraints
- The degree of satisfaction can be measured on a
scale between 0 and 1 - Constraints can be modeled in the form of
desirability values
7Diversity for Molecule Evolution
- A normal search yields very similar molecular
structures - Aim for a set of diverse candidate structures
because - Vague objective functions may result in finding
structures that fail in practice - The chemist desires a set of promising structures
rather than only one single solution - Explicit methods are required to enforce
diversity in the search space i.e. niching
8Typical output of a normal evolutionary search
All molecules are variations of the same theme!
9Niching in Multi-Objective EA
- Explicitly aim for diversity in the decision
space - Different than aiming for diversity in the
objective space - Points that lie far apart in the objective space
do not necessarily also lie far apart in the
decision space
10Niching-based NSGA-II
A Niching-based NSGA-II algorithm as proposed by
Shir et al.
11Dynamic Niche Identification
B.L. Miller, Shaw, M.J. Genetic algorithms with
dynamic niche sharing for multimodal function
optimization, Proceedings of IEEE International
Conference on EC, May 1996, Pages 786-791
12Similarity in Molecular Spaces
- How to define a similarity measure for the
graph-like molecular structures? - Idea use molecular fingerprints
- Molecules are represented by bitstrings
identifying certain structural properties - A 1 at position i denotes the presence of
property i in the molecule, and 0 at position i
denotes the absence of property i
13Distance based on fingerprints
- The distance between two molecules A and B can be
based on the four terms - a the number of properties only present in A
- b the number of properties only present in B
- c the number of properties present in both A and
B - d the number of properties not present in A and
B - One possible distance measure can be created
using the Jaccard coefficient (also known as
Tanimoto coefficient)
The Jaccard distance fullfills the triangular
equation, as opposed to for example the
cosine-distance!
14Triangle inequality
15Triangle inequality
- Why do we want to have a dissimilarity
(distance) measure that obeys the triangle
inequality? - If we have very similar molecules, say molecule A
is similar to B and molecule A is also similar to
C, - then we want to be able to say that B is similar
to C.
16Triangle inequality
17Molecule Evolution with Niching
18Experiments
- Aim
- Compare the niching-based NSGA-II method with
the normal NSGA-II method - Two test-cases
- Find ligands for the Neuropeptide Y2 receptor
(NPY2) - Find inhibitors for the Lipoxygenase (LOX)
- Two objectives
- Aggregated fitness score based on activity
predictors - Aggregated constraints score function
19Experimental setup
- 5 runs for each method on each test-case
- 1000 generations per runs
- Normal NSGA-II
- 50 parents
- 150 offspring
- Niching-based NSGA-II
- 10 niches
- 5 parents per niche
- 150 offspring
- niche radius set to 0.85 (empirically set)
20Average Pareto Fronts
NPY2
LOX
21Average distance between the individuals in the
final populations
NPY2
LOX
22Output sets of a NPY2 run without and with niching
23Output sets of a LOX run without and with niching
24Multi-dimensional Scaling Plots
25The chemists view on the output
- Regarding the niching
- The molecules found with the niching method are
clearly more diverse than the molecules found by
the non-niching approach - In general
- The molecules look reasonable overall, but
- Most molecules still possess unstable and/or
toxic features that are not easy to synthesize in
practice - Similar types of uncommon features seem to appear
26Conclusions and Outlook
- Conclusions
- Applying niching using the Jaccard distance based
on molecular fingerprints and is a way to enhance
search space diversity in molecule evolution - It yields more diverse sets of molecules than a
normal evolutionary algorithm for molecule
evolution - Future research
- Applying these methods on other (more
sophisticated) models as well - In vitro testing of selected molecules found
using this method - Incorporate more sophisticated measures for
testing the synthesizability of candidate
molecules
27Thank you!
Alexander Aleman Natural Computing Group LIACS,
Universiteit Leiden e-mail alexander.aleman_at_gmail
.com