Machine%20Learning%20in%20Drug%20Design - PowerPoint PPT Presentation

About This Presentation
Title:

Machine%20Learning%20in%20Drug%20Design

Description:

Predicting Molecular Activity: Learning from Structure. Drugs Typically Are... Predicting Molecular Activity: Learning from Structure. Places to use ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 58
Provided by: pagesC
Category:

less

Transcript and Presenter's Notes

Title: Machine%20Learning%20in%20Drug%20Design


1
Machine Learning in Drug Design
  • David Page
  • Dept. of Biostatistics and Medical Informatics
    and Dept. of Computer Sciences

2
Collaborators
  • Michael Waddell
  • Paul Finn
  • Ashwin Srinivasan
  • John Shaughnessy
  • Bart Barlogie
  • Frank Zhan
  • Stephen Muggleton
  • Arno Spatola
  • Sean McIlwain
  • Brian Kay

3
Outline
  • Overview of Drug Design
  • How Machine Learning Fits Into the Process
  • Target Search Single Nucleotide Polymorphisms
    (SNPs)
  • Machine Learning from Feature Vectors
  • Decision Trees
  • Support Vector Machines
  • Voting/Ensembles
  • Predicting Molecular Activity Learning from
    Structure

4
Drugs Typically Are
  • Small organic molecules that
  • Modulate disease by binding to some target
    protein
  • At a location that alters the proteins behavior
    (e.g., antagonist or agonist).
  • Target protein might be human (e.g., ACE for
    blood pressure) or belong to invading organism
    (e.g., surface protein of a bacterium).

5
Example of Binding
6
So To Design a Drug
Identify Target Protein
Knowledge of proteome/genome
Relevant biochemical pathways
Crystallography, NMR Difficult if Membrane-Bound
Determine Target Site Structure
Synthesize a Molecule that Will Bind
Imperfect modeling of structure Structures may
change at binding And even then
7
Molecule Binds Target But May
  • Bind too tightly or not tightly enough.
  • Be toxic.
  • Have other effects (side-effects) in the body.
  • Break down as soon as it gets into the body, or
    may not leave the body soon enough.
  • It may not get to where it should in the body
    (e.g., crossing blood-brain barrier).
  • Not diffuse from gut to bloodstream.

8
And Every Body is Different
  • Even if a molecule works in the test tube and
    works in animal studies, it may not work in
    people (will fail in clinical trials).
  • A molecule may work for some people but not
    others.
  • A molecule may cause harmful side-effects in some
    people but not others.

9
Outline
  • Overview of Drug Design
  • How Machine Learning Fits Into the Process
  • Target Search Single Nucleotide Polymorphisms
    (SNPs)
  • Machine Learning from Feature Vectors
  • Decision Trees
  • Support Vector Machines
  • Voting/Ensembles
  • Predicting Molecular Activity Learning from
    Structure

10
Places to use Machine Learning
  • Finding target proteins.
  • Inferring target site structure.
  • Predicting who will respond positively/negatively.

11
Places to use Machine Learning
  • Finding target proteins.
  • Inferring target site structure.
  • Predicting who will respond positively/negatively.

12
Healthy vs. Disease
Healthy
Diseased
13
If We Could Sequence DNA Quickly and Cheaply, We
Could
  • Sequence DNA of people taking a drug, and use ML
    to identify consistent differences between those
    who respond well and those who do not.
  • Sequence DNA of cancer cells and healthy cells,
    and use ML to detect dangerous mutations
    proteins these genes code for may be useful
    targets.
  • Sequence DNA of people who get a disease and
    those who dont, and use ML to determine genes
    related to succeptibility proteins these genes
    code for may be useful targets.

14
Problem Cant Sequence Quickly
  • Can quickly test single positions where variation
    is common Single Nucleotide Polymorphisms
    (SNPs).
  • Can quickly test degree to which every gene is
    being transcribed Gene Expression Microarrays
    (e.g., Affymetrix Gene Chips).
  • Can (moderately) quickly test which proteins are
    present in a sample (Proteomics).

15
Outline
  • Overview of Drug Design
  • How Machine Learning Fits Into the Process
  • Target Search Single Nucleotide Polymorphisms
    (SNPs)
  • Machine Learning from Feature Vectors
  • Decision Trees
  • Support Vector Machines
  • Voting/Ensembles
  • Predicting Molecular Activity Learning from
    Structure

16
Example of SNP Data
17
Problem SNPs are not Genes
  • If we find a predictive SNP, it may not be part
    of a gene we can only infer that the SNP is
    near a gene that may be involved in the
    disease.
  • Even if the SNP is part of a gene, it may be
    another nearby gene that is the key gene.

18
Problem Even SNPs are Costly
  • Typically cannot use all known SNPs.
  • Can focus on a particular chromosome and area if
    knowledge permits that.
  • Can use a scattering of SNPs, since SNPs that are
    very close together may be redundant use one SNP
    per haplotype block, or region where
    recombination is rare.

19
Why Machine Learning?
  • There may be no single SNP in our data that
    distinguishes disease vs. healthy.
  • Still may be possible to have some combination of
    SNPs to predict. Can gain insight from this
    combination.

20
Outline
  • Overview of Drug Design
  • How Machine Learning Fits Into the Process
  • Target Search Single Nucleotide Polymorphisms
    (SNPs)
  • Machine Learning from Feature Vectors
  • Decision Trees
  • Support Vector Machines
  • Voting/Ensembles
  • Predicting Molecular Activity Learning from
    Structure

21
Decision Trees in One Picture
22
(No Transcript)
23
Naïve Bayes in One Picture
Age
SNP 3000
SNP 1
SNP 2
. . .
24
Voting Approach
  • Score SNPs using information gain.
  • Choose top 1 scoring SNPs.
  • To classify a new case, let these SNPs vote
    (majority or weighted majority vote).
  • We use majority vote here.

25
Task Predict Early Onset DiseaseFrom SNP Data
  • Only 3000 SNPs, coarsely sampled over entire
    genome.
  • 80 patients (examples), 40 with early onset.
  • Using technology from Orchid.
  • Can a predictor be learned that performs
    significantly better than chance on unseen data?

26
Results
  • Use all data, only top 1 of features, or only
    top 10 of features (according to decision trees
    purity measure).
  • Use Trees, SVMs, Voting.
  • SVMs with top 10 achieve 71 accuracy.
    Significantly better than chance (50).

27
Lessons
  • Feature selection is important for performance.
  • Methodology note for machine learning
    specialists must repeat this entire process on
    each fold of cross-validation or results will be
    overly-optimistic.
  • SNP approach is promising get funding to measure
    more SNPs.
  • More work on SVM comprehensibility.

28
Outline
  • Overview of Drug Design
  • How Machine Learning Fits Into the Process
  • Target Search Single Nucleotide Polymorphisms
    (SNPs)
  • Machine Learning from Feature Vectors
  • Decision Trees
  • Support Vector Machines
  • Voting/Ensembles
  • Predicting Molecular Activity Learning from
    Structure

29
Places to use Machine Learning
  • Finding target proteins.
  • Inferring target site structure.
  • Predicting who will respond positively/negatively.

30
Typical Practice when Target Structure is Unknown
  • Test many molecules (1,000,000) to find some that
    bind to target (ligands).
  • Infer (induce) shape of target site from 3D
    structural similarities.
  • Shared 3D substructure is called a pharmacophore.
  • Perfect example of a machine learning task with
    spatial target.

31
An Example of Structure Learning
Inactive
Active
32
Inductive Logic Programming
  • Represents data points in mathematical logic
  • Uses Background Knowledge
  • Returns results in logic

33
The Logical Representation of a Pharmacophore
34
Background Knowledge I
  • Information about atoms and bonds in the
    molecules
  • atm(m1,a1,o,3,5.915800,-2.441200,1.799700).
  • atm(m1,a2,c,3,0.574700,-2.773300,0.337600).
  • atm(m1,a3,s,3,0.408000,-3.511700,-1.314000).
  • bond(m1,a1,a2,1).
  • bond(m1,a2,a3,1).

35
Background knowledge II
  • Definition of distance equivalence
  • dist(Drug,Atom1,Atom2,Dist,Error)-
  • number(Error),
  • coord(Drug,Atom1,X1,Y1,Z1),
  • coord(Drug,Atom2,X2,Y2,Z2),
  • euc_dist(p(X1,Y1,Z1),p(X2,Y2,Z2),Dist1),
  • Diff is Dist1-Dist,
  • absolute_value(Diff,E1),
  • E1 lt Error.
  • euc_dist(p(X1,Y1,Z1),p(X2,Y2,Z2),D)-
  • Dsq is (X1-X2)2(Y1-Y2)2(Z1-Z2)2,
  • D is sqrt(Dsq).

36
Central Idea Generalize by searching a lattice
37
Conformational model
  • Conformational flexibility modelled as multiple
    conformations
  • Sybyl randomsearch
  • Catalyst

38
Pharmacophore description
  • Atom and site centred
  • Hydrogen bond donor
  • Hydrogen bond acceptor
  • Hydrophobe
  • Site points (limited at present)
  • User definable
  • Distance based

39
Example 1 Dopamine agonists
  • Agonists taken from Martin data set on QSAR
    society web pages
  • Examples (5-50 conformations/molecule)

40
Pharmacophore identified
  • Molecule A has the desired activity if
  • in conformation B molecule A contains a
    hydrogen acceptor at C, and
  • in conformation B molecule A contains a basic
    nitrogen group at D, and
  • the distance between C and D is 7.05966 /-
    0.75 Angstroms, and
  • in conformation B molecule A contains a
    hydrogen acceptor at E, and
  • the distance between C and E is 2.80871 /-
    0.75 Angstroms, and
  • the distance between D and E is 6.36846 /-
    0.75 Angstroms, and
  • in conformation B molecule A contains a
    hydrophobic group at F, and
  • the distance between C and F is 2.68136 /-
    0.75 Angstroms, and
  • the distance between D and F is 4.80399 /-
    0.75 Angstroms, and
  • the distance between E and F is 2.74602 /-
    0.75 Angstroms.

41
Example II ACE inhibitors
  • 28 angiotensin converting enzyme inhibitors taken
    from literature
  • D. Mayer et al., J. Comput.-Aided Mol. Design, 1,
    3-16, (1987)

42
Experiment 1
  • Attempt to identify pharmacophore using original
    Mayer et al. Data (final conformations).
  • Initial failed attempt traced to bugs in
    background knowledge definition.
  • 4 pharmacophores found with corrected code
    (variations on common theme)

43
ACE pharmacophore
  • Molecule A is an ACE inhibitor if
  • molecule A contains a zinc-site B,
  • molecule A contains a hydrogen acceptor C,
  • the distance between B and C is 7.899 /-
    0.750 A,
  • molecule A contains a hydrogen acceptor D,
  • the distance between B and D is 8.475 /-
    0.750 A,
  • the distance between C and D is 2.133 /-
    0.750 A,
  • molecule A contains a hydrogen acceptor E,
  • the distance between B and E is 4.891 /-
    0.750 A,
  • the distance between C and E is 3.114 /-
    0.750 A,
  • the distance between D and E is 3.753 /-
    0.750 A.

44
Pharmacophore discovered
Zinc site H-bond acceptor
45
Experiment 2
  • Definition of zinc ligand added to background
    knowledge
  • based on crystallographic data
  • Multiple conformations
  • Sybyl RandomSearch

46
Experiment 2
  • Original pharmacophore rediscovered plus one
    other
  • different zinc ligand position
  • similar to alternative proposed by Ciba-Geigy

47
Example III Thermolysin inhibitors
  • 10 inhibitors for which crystallographic data is
    available in PDB
  • Conformationally challenging molecules
  • Experimentally observed superposition

48
Key binding site interactions
Asn112-NH
OC Asn112
S2
Arg203-NH
S1
OC Ala113
Zn
49
Interactions made by inhibitors
50
Pharmacophore Identification
  • Structures considered 1HYT 1THL 1TLP 1TMN 2TMN
    4TLN 4TMN 5TLN 5TMN 6TMN
  • Conformational analysis using Best conformer
    generation in Catalyst
  • 98-251 conformations/molecule

51
Thermolysin Results
  • 10 5-point pharmacophore identified, falling into
    2 groups (7/10 molecules)
  • 3 acceptors, 1 hydrophobe, 1 donor
  • 4 acceptors, 1 donor
  • Common core of Zn ligands, Arg203 and Asn112
    interactions identified
  • Correct assignments of functional groups
  • Correct geometry to 1 Angstrom tolerance

52
Thermolysin results
  • Increasing tolerance to 1.5Angstroms finds common
    6-point pharmacophore including one extra
    interaction

53
Example IV Antibacterial peptides
  • Dataset of 11 pentapeptides showing activity
    against Pseudomonas aeruginosa
  • 6 actives lt64mg/ml IC50
  • 5 inactives

54
Pharmacophore Identified
A Molecule M is active against Pseudomonas
Aeruginosa if it has a conformation B such
that M has a hydrophobic group C, M has a
hydrogen acceptor D, the distance between C and
D in conformation B is 11.7 Angstroms M has a
positively-charged atom E, the distance between
C and E in conformation B is 4 Angstroms the
distance between D and E in conformation B is 9.4
Angstroms M has a positively-charged atom
F, the distance between C and F in conformation
B is 11.1 Angstroms the distance between D and F
in conformation B is 12.6 Angstroms the distance
between E and F in conformation B is 8.7
Angstroms Tolerance 1.5 Angstroms
55
(No Transcript)
56
Ongoing ILP developments (pharmacophores)
  • Continue to extend method validation
  • Extending to combinatorial mixtures
  • Quantitative models
  • Mixing different datatypes in background
    knowledge
  • Developing graphical front-end

57
Ongoing developments (Other)
  • Analysis of HTS datasets
  • Analysis of drug-likeness
  • Derivation of new descriptors
  • eg Empirical binding functions
Write a Comment
User Comments (0)
About PowerShow.com