Machine%20Learning%20in%20Drug%20Design - PowerPoint PPT Presentation

About This Presentation

Title:

Machine%20Learning%20in%20Drug%20Design

Description:

Predicting Molecular Activity: Learning from Structure. Drugs Typically Are... Predicting Molecular Activity: Learning from Structure. Places to use ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 58

Provided by: pagesC

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Machine%20Learning%20in%20Drug%20Design

1
Machine Learning in Drug Design

David Page
Dept. of Biostatistics and Medical Informatics
and Dept. of Computer Sciences

2
Collaborators

Michael Waddell
Paul Finn
Ashwin Srinivasan
John Shaughnessy
Bart Barlogie

Frank Zhan
Stephen Muggleton
Arno Spatola
Sean McIlwain
Brian Kay

3
Outline

Overview of Drug Design
How Machine Learning Fits Into the Process
Target Search Single Nucleotide Polymorphisms
(SNPs)
Machine Learning from Feature Vectors
Decision Trees
Support Vector Machines
Voting/Ensembles
Predicting Molecular Activity Learning from
Structure

4
Drugs Typically Are

Small organic molecules that
Modulate disease by binding to some target
protein
At a location that alters the proteins behavior
(e.g., antagonist or agonist).
Target protein might be human (e.g., ACE for
blood pressure) or belong to invading organism
(e.g., surface protein of a bacterium).

5
Example of Binding
6
So To Design a Drug
Identify Target Protein
Knowledge of proteome/genome
Relevant biochemical pathways
Crystallography, NMR Difficult if Membrane-Bound
Determine Target Site Structure
Synthesize a Molecule that Will Bind
Imperfect modeling of structure Structures may
change at binding And even then
7
Molecule Binds Target But May

Bind too tightly or not tightly enough.
Be toxic.
Have other effects (side-effects) in the body.
Break down as soon as it gets into the body, or
may not leave the body soon enough.
It may not get to where it should in the body
(e.g., crossing blood-brain barrier).
Not diffuse from gut to bloodstream.

8
And Every Body is Different

Even if a molecule works in the test tube and
works in animal studies, it may not work in
people (will fail in clinical trials).
A molecule may work for some people but not
others.
A molecule may cause harmful side-effects in some
people but not others.

9
Outline

Overview of Drug Design
How Machine Learning Fits Into the Process
Target Search Single Nucleotide Polymorphisms
(SNPs)
Machine Learning from Feature Vectors
Decision Trees
Support Vector Machines
Voting/Ensembles
Predicting Molecular Activity Learning from
Structure

10
Places to use Machine Learning

Finding target proteins.
Inferring target site structure.
Predicting who will respond positively/negatively.

11
Places to use Machine Learning

Finding target proteins.
Inferring target site structure.
Predicting who will respond positively/negatively.

12
Healthy vs. Disease
Healthy
Diseased
13
If We Could Sequence DNA Quickly and Cheaply, We
Could

Sequence DNA of people taking a drug, and use ML
to identify consistent differences between those
who respond well and those who do not.
Sequence DNA of cancer cells and healthy cells,
and use ML to detect dangerous mutations
proteins these genes code for may be useful
targets.
Sequence DNA of people who get a disease and
those who dont, and use ML to determine genes
related to succeptibility proteins these genes
code for may be useful targets.

14
Problem Cant Sequence Quickly

Can quickly test single positions where variation
is common Single Nucleotide Polymorphisms
(SNPs).
Can quickly test degree to which every gene is
being transcribed Gene Expression Microarrays
(e.g., Affymetrix Gene Chips).
Can (moderately) quickly test which proteins are
present in a sample (Proteomics).

15
Outline

Overview of Drug Design
How Machine Learning Fits Into the Process
Target Search Single Nucleotide Polymorphisms
(SNPs)
Machine Learning from Feature Vectors
Decision Trees
Support Vector Machines
Voting/Ensembles
Predicting Molecular Activity Learning from
Structure

16
Example of SNP Data
17
Problem SNPs are not Genes

If we find a predictive SNP, it may not be part
of a gene we can only infer that the SNP is
near a gene that may be involved in the
disease.
Even if the SNP is part of a gene, it may be
another nearby gene that is the key gene.

18
Problem Even SNPs are Costly

Typically cannot use all known SNPs.
Can focus on a particular chromosome and area if
knowledge permits that.
Can use a scattering of SNPs, since SNPs that are
very close together may be redundant use one SNP
per haplotype block, or region where
recombination is rare.

19
Why Machine Learning?

There may be no single SNP in our data that
distinguishes disease vs. healthy.
Still may be possible to have some combination of
SNPs to predict. Can gain insight from this
combination.

20
Outline

Overview of Drug Design
How Machine Learning Fits Into the Process
Target Search Single Nucleotide Polymorphisms
(SNPs)
Machine Learning from Feature Vectors
Decision Trees
Support Vector Machines
Voting/Ensembles
Predicting Molecular Activity Learning from
Structure

21
Decision Trees in One Picture
22
(No Transcript)
23
Naïve Bayes in One Picture
Age
SNP 3000
SNP 1
SNP 2
. . .
24
Voting Approach

Score SNPs using information gain.
Choose top 1 scoring SNPs.
To classify a new case, let these SNPs vote
(majority or weighted majority vote).
We use majority vote here.

25
Task Predict Early Onset DiseaseFrom SNP Data

Only 3000 SNPs, coarsely sampled over entire
genome.
80 patients (examples), 40 with early onset.
Using technology from Orchid.
Can a predictor be learned that performs
significantly better than chance on unseen data?

26
Results

Use all data, only top 1 of features, or only
top 10 of features (according to decision trees
purity measure).
Use Trees, SVMs, Voting.
SVMs with top 10 achieve 71 accuracy.
Significantly better than chance (50).

27
Lessons

Feature selection is important for performance.
Methodology note for machine learning
specialists must repeat this entire process on
each fold of cross-validation or results will be
overly-optimistic.
SNP approach is promising get funding to measure
more SNPs.
More work on SVM comprehensibility.

28
Outline

Overview of Drug Design
How Machine Learning Fits Into the Process
Target Search Single Nucleotide Polymorphisms
(SNPs)
Machine Learning from Feature Vectors
Decision Trees
Support Vector Machines
Voting/Ensembles
Predicting Molecular Activity Learning from
Structure

29
Places to use Machine Learning

Finding target proteins.
Inferring target site structure.
Predicting who will respond positively/negatively.

30
Typical Practice when Target Structure is Unknown

Test many molecules (1,000,000) to find some that
bind to target (ligands).
Infer (induce) shape of target site from 3D
structural similarities.
Shared 3D substructure is called a pharmacophore.
Perfect example of a machine learning task with
spatial target.

31
An Example of Structure Learning
Inactive
Active
32
Inductive Logic Programming

Represents data points in mathematical logic
Uses Background Knowledge
Returns results in logic

33
The Logical Representation of a Pharmacophore
34
Background Knowledge I

Information about atoms and bonds in the
molecules
atm(m1,a1,o,3,5.915800,-2.441200,1.799700).
atm(m1,a2,c,3,0.574700,-2.773300,0.337600).
atm(m1,a3,s,3,0.408000,-3.511700,-1.314000).
bond(m1,a1,a2,1).
bond(m1,a2,a3,1).

35
Background knowledge II

Definition of distance equivalence
dist(Drug,Atom1,Atom2,Dist,Error)-
number(Error),
coord(Drug,Atom1,X1,Y1,Z1),
coord(Drug,Atom2,X2,Y2,Z2),
euc_dist(p(X1,Y1,Z1),p(X2,Y2,Z2),Dist1),
Diff is Dist1-Dist,
absolute_value(Diff,E1),
E1 lt Error.
euc_dist(p(X1,Y1,Z1),p(X2,Y2,Z2),D)-
Dsq is (X1-X2)2(Y1-Y2)2(Z1-Z2)2,
D is sqrt(Dsq).

36
Central Idea Generalize by searching a lattice
37
Conformational model

Conformational flexibility modelled as multiple
conformations
Sybyl randomsearch
Catalyst

38
Pharmacophore description

Atom and site centred
Hydrogen bond donor
Hydrogen bond acceptor
Hydrophobe
Site points (limited at present)
User definable
Distance based

39
Example 1 Dopamine agonists

Agonists taken from Martin data set on QSAR
society web pages
Examples (5-50 conformations/molecule)

40
Pharmacophore identified

Molecule A has the desired activity if
in conformation B molecule A contains a
hydrogen acceptor at C, and
in conformation B molecule A contains a basic
nitrogen group at D, and
the distance between C and D is 7.05966 /-
0.75 Angstroms, and
in conformation B molecule A contains a
hydrogen acceptor at E, and
the distance between C and E is 2.80871 /-
0.75 Angstroms, and
the distance between D and E is 6.36846 /-
0.75 Angstroms, and
in conformation B molecule A contains a
hydrophobic group at F, and
the distance between C and F is 2.68136 /-
0.75 Angstroms, and
the distance between D and F is 4.80399 /-
0.75 Angstroms, and
the distance between E and F is 2.74602 /-
0.75 Angstroms.

41
Example II ACE inhibitors

28 angiotensin converting enzyme inhibitors taken
from literature
D. Mayer et al., J. Comput.-Aided Mol. Design, 1,
3-16, (1987)

42
Experiment 1

Attempt to identify pharmacophore using original
Mayer et al. Data (final conformations).
Initial failed attempt traced to bugs in
background knowledge definition.
4 pharmacophores found with corrected code
(variations on common theme)

43
ACE pharmacophore

Molecule A is an ACE inhibitor if
molecule A contains a zinc-site B,
molecule A contains a hydrogen acceptor C,
the distance between B and C is 7.899 /-
0.750 A,
molecule A contains a hydrogen acceptor D,
the distance between B and D is 8.475 /-
0.750 A,
the distance between C and D is 2.133 /-
0.750 A,
molecule A contains a hydrogen acceptor E,
the distance between B and E is 4.891 /-
0.750 A,
the distance between C and E is 3.114 /-
0.750 A,
the distance between D and E is 3.753 /-
0.750 A.

44
Pharmacophore discovered
Zinc site H-bond acceptor
45
Experiment 2

Definition of zinc ligand added to background
knowledge
based on crystallographic data
Multiple conformations
Sybyl RandomSearch

46
Experiment 2

Original pharmacophore rediscovered plus one
other
different zinc ligand position
similar to alternative proposed by Ciba-Geigy

47
Example III Thermolysin inhibitors

10 inhibitors for which crystallographic data is
available in PDB
Conformationally challenging molecules
Experimentally observed superposition

48
Key binding site interactions
Asn112-NH
OC Asn112
S2
Arg203-NH
S1
OC Ala113
Zn
49
Interactions made by inhibitors
50
Pharmacophore Identification

Structures considered 1HYT 1THL 1TLP 1TMN 2TMN
4TLN 4TMN 5TLN 5TMN 6TMN
Conformational analysis using Best conformer
generation in Catalyst
98-251 conformations/molecule

51
Thermolysin Results

10 5-point pharmacophore identified, falling into
2 groups (7/10 molecules)
3 acceptors, 1 hydrophobe, 1 donor
4 acceptors, 1 donor
Common core of Zn ligands, Arg203 and Asn112
interactions identified
Correct assignments of functional groups
Correct geometry to 1 Angstrom tolerance

52
Thermolysin results

Increasing tolerance to 1.5Angstroms finds common
6-point pharmacophore including one extra
interaction

53
Example IV Antibacterial peptides

Dataset of 11 pentapeptides showing activity
against Pseudomonas aeruginosa
6 actives lt64mg/ml IC50
5 inactives

54
Pharmacophore Identified
A Molecule M is active against Pseudomonas
Aeruginosa if it has a conformation B such
that M has a hydrophobic group C, M has a
hydrogen acceptor D, the distance between C and
D in conformation B is 11.7 Angstroms M has a
positively-charged atom E, the distance between
C and E in conformation B is 4 Angstroms the
distance between D and E in conformation B is 9.4
Angstroms M has a positively-charged atom
F, the distance between C and F in conformation
B is 11.1 Angstroms the distance between D and F
in conformation B is 12.6 Angstroms the distance
between E and F in conformation B is 8.7
Angstroms Tolerance 1.5 Angstroms
55
(No Transcript)
56
Ongoing ILP developments (pharmacophores)