Molecular Diversity - A Shell Game? - PowerPoint PPT Presentation

About This Presentation
Title:

Molecular Diversity - A Shell Game?

Description:

Molecular Diversity - A Shell Game? Experiments in Measuring Molecular Diversity ... aromatic / aliphatic dipolarity. geometry (shape, chirality, charge ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 36
Provided by: johnbl1
Category:

less

Transcript and Presenter's Notes

Title: Molecular Diversity - A Shell Game?


1
Molecular Diversity - A Shell Game? Experiments
in Measuring Molecular Diversity C. John
BlankleyDaylight User Group Meeting MUG
97 Laguna Beach, CAFebruary 25-28, 1997
Parke-Davis Pharmaceutical Research Division of
Warner Lambert Company2800 Plymouth RoadAnn
Arbor, MI 48105
2
Basic Concepts
  • What do we mean by molecular diversity?
  • Structural Diversity Property Diversity
  • templates / scaffolds / backbones lipophilicit
    y
  • functional groups / fragments acid/base
  • bridges / bioisosteres H-bonding
  • aromatic / aliphatic dipolarity
  • geometry (shape, chirality, charge
  • connectivity, spatial disposition) size

3
Basic Concepts (cont.)
  • Parameters/metrics/descriptors
  • continuous, discrete, categorical
  • Structural Descriptors Property Descriptors
  • topological indices log P
  • molecular fingerprints pKa
  • atom / group / fragment counts molecular
    orbital indices
  • molecular dimensions charge (volume,
    area, moments) spectroscopic data
  • distances between key groups molecular
    fields (atom pairs, pharmacophores)
  • composite descriptors (principal properties)
  • similarity/dissimilarity metrics

4
Issues in diversity
  • basis for comparison
  • perspective
  • macro or micro
  • expansionary or inclusionary expanding space,
    filling holes, increasing density
  • biologically relevant vs. chemical - correlative
  • how much diversity is necessary, possible,
    desirable (random, bias)
  • concordance between quantitative and qualitative
    notions
  • tailor to information available and purpose
    required

5
Types of structural diversity
  • Global or macro diversityimplies neither
    consistent or significant similar features
  • Local or micro diversity ??varietyimplies a
    consistent common feature(s)
  • template or scaffold
  • common functional group

6
Small (lt ca. 500) datasets of practical interest
  • Building block datasets (functional group based)
  • Combinatorial arrays (template based)
  • FAQs
  • select a diverse subset(s) for screening
  • is one dataset more diverse than another
  • what subset will represent the diversity of a
    library
  • what compounds will increase/extend the diversity
    of an existing set

7
Other small datasets
  • SAR datasets
  • defined by potency/selectivity
  • defined by activity for a enzyme/receptor family
    or subtype
  • Benchmark datasets
  • miscellaneous drugs
  • miscellaneous chemical compounds
  • 20 natural amino acids
  • 400 natural dipeptides

8
Questions of a structural diversity measure
  • How does it perform for different types of
    diversity? suitability sensitivity
  • How does it accord with measures of property
    diversity?
  • Is there a saturationeffect?
  • How does it behave on partition or combination of
    datasets?
  • How can it be validated?
  • How does it accord with chemists visual
    perceptions?
  • Can it capture unperceived aspects of diversity?

9
Questions for a given dataset
  • Which class?
  • Extent of common feature in dataset
  • Variation around common feature
  • template variation
  • appendage variation
  • Outliers and their influence

10
Possibilities for quantitation
  • Statistics on bit counts
  • univariate measures
  • comparisons to mean e.g., modal
  • Statistics on fragment counts
  • Statistics on dissimilarities
  • all or partial pairwise
  • comparison to mean e.g., modal, centroid
  • Parametric on topological indices

11
Some proposed database diversity measures
  • mean pairwise dissimilarity (Willett et al.)
  • self-similarity (Tripos)(mean similarity to
    1st nearest neighbor)
  • maximum bits set (E. Martin et al.)(??union
    bit set or modal fingerprint _at_ t 0)
  • diversity density (E. Martin et al.)( bits
    per molecular mass)

12
Stigmata
  • extracts similarity of bit strings with flexible
    threshold
  • low bits at high stringency - not much in common
  • high bits at low stringency - much variety
  • plateau bits at intermediate stringencies -
    significant common element
  • similarity to common element - large range
    signals diversity of dataset

13
Diversity measures and modal fingerprints
  • modal fingerprint ? degree of similarity
  • existing metrics in Stigmata (Daylight
    fingerprints)
  • modp, msimalabRminfp, Rmaxfp
  • new metrics
  • alab_av (regional or partial similarity(?))Rfp
    Rmaxfp / Rminfp

14
Extend threshold analysis
  • t ? 0 (2 compounds)
  • maximal modal (? non-unique bits) ( total
    bits set)
  • concept of maximal, median and minimal modal

15
Relative vs. absolute
  • Rminfp (Rifp(Rm)) f(modp, msim)
  • 1/Rm modp modp/msim - 1
  • bits set for modal fp
  • bcom bits (i) x Rm(i)
  • Thus
  • maxbcom modal bits _at_ t ? 0
  • medbcom modal bits _at_ t 0.5
  • minbcom modal bits _at_ t 1.0
  • Rtmax maxbcom / minbcom

16
Other measures derived from bits or similarities
  • Average bits
  • Mean similarities (msim) at t 0, 0.5, 1.0
  • fraction of pairwise similarities gt 0.85 or lt
    0.50
  • mean distance from dataset centroid
  • standard deviations or coefficients of variation
  • normalize by
  • dataset size
  • average bits
  • molecular mass

17
dataset type1 type2 N av_mwt source BIOLOGICAL
kappa act p 59 409.46 CIPSLINEnonkappa act n 9
7 441.95 CIPSLINE pipopiate act p 48 382.84
opiate act p 32 383.37 piperidine act SS 18 37
4.76 peptide_op act p 24 612.41 D2_ag act p 33
249.94 Seeman et al.D2_antag act n 25 373.97 See
man et al.renin_hisleu act SS 112 752.66 CIPSLINE
ci976 sar p 74 410.47 Roth et al.
acathet sar p 41 453.82 White et al. BUILDING
BLOCK bbd_phnco bbd T 129 185.51 ACD bbd_ar
ncs bbd SS 202 216.40 ACD aa20 bbd T 20 136.92
bbd_allaa bbd T 651 218.63 ACD COMBINATORIAL
dipeptides comb T 400 255.83 dhydantoin comb T
40 242.47 deWitt et al. benzodiaz1 comb SS 40 312.
37 deWitt et al. benzodiaz2 comb SS 160 382.43 El
lman et al. REFERENCE newtopdrugs_95 misc n 56 39
5.35 Med. Ad. Newsintrodrugs_9295 misc n 141 403.
51 Ann. Repts. Med.Chem. ACDrandom2 misc n 51 181.
29 ACD ACDrandom3 misc n 51 221.37 ACD
18
modal bit compound bit dataset max
median min Rtmax min max mean
Rfp BIOLOGICAL kappa 1052 169 43 24.47 163
734 249.30 4.50nonkappa 1467 164 7 209.33 88
689 305.42 7.83 pipopiate 1219 276 47 25.94 1
38 689 378.63 4.99 opiate 451 404 88 11.69 1
64 689 439.71 4.20 piperidine 570 169 75 7.6
0 138 368 249.61 2.67 peptide_op 425 170 40 1
0.63 88 329 206.96 3.74D2_ag 606 145 43 14.09
80 390 203.82 4.87D2_antag 841 115 33 25.48 16
5 327 235.01 1.98renin_hisleu 998 278 179 5.58
249 466 337.04 1.87ci976 371 138 68 5.46 117
222 152.50 1.90acathet 695 195 93 7.47 167 303
245.80 1.81 BUILDING BLOCK bbd_phnco 43
3 63 61 7.10 61 136 92.48 2.23bbd_arncs 883 64
45 19.64 61 286 107.37 4.69aa20 155 43 34 4
.56 34 171 67.00 5.03bbd_allaa 1785 55 34 52.5
0 34 429 127.10 12.62 COMBINATORIAL dipe
ptides 448 78 53 8.45 53 269 126.53 5.08dhydan
toin 437 129 71 6.15 71 323 176.70 4.55benzodi
az1 586 220 183 3.20 208 410 284.06 1.97benzod
iaz2 533 294 222 2.40 229 431 308.92 1.88 REFER
ENCE newtopdrugs_95 1629 80 4 407.75 52
640 254.85 12.31introdrugs_9295 1983 84 4 495.
75 44 709 265.80 16.11ACDrandom2 592 19 0 gt100
0 23 166 73.75 7.22ACDrandom3 1067 38 0 gt1000
23 497 130.08 21.61
19
average msim centroid
mean dataset t 0 t 0.5 t
1 distance BIOLOGICAL kappa 0.23 0.56
0.18 0.34 nonkappa 0.21 0.42 0.03 0.48
pipopiate 0.31 0.50 0.15 0.40 opiate 0.
42 0.62 0.24 0.26 piperidine 0.39 0.54
0.32 0.36 peptide_op 0.45 0.68 0.21 0.23
D2_ag 0.32 0.55 0.25 0.35 D2_antag 0.26
0.34 0.15 0.53 renin_hisleu 0.33 0.75 0.
54 0.12 ci976 0.40 0.75 0.45 0.14 acat
het 0.34 0.64 0.38 0.25 BUILDING
BLOCK bbd_phNCO 0.21 0.68 0.67 0.20 b
bd_arNCS 0.12 0.61 0.45 0.28 aa20 0.35 0
.70 0.60 0.25 bbd_allaa 0.07 0.41 0.35 0
.49 COMBINATORIAL dipeptides 0.28 0.62 0
.48 0.27 dhydantoin 0.40 0.58 0.47 0.28
benzodiaz1 0.48 0.76 0.66 0.12 benzodiaz2
0.58 0.81 0.73 0.07 REFERENCE newtopdr
ugs_95 0.15 0.23 0.02 0.68 introdrugs_9295
0.22 0.24 0.04 0.68 ACDrandom2 0.11 0.19
0.00 0.74 ACDrandom3 0.11 0.20 0.00 0.7
5
20
dataset mps ss_nn1 mdd mfrgs_BCI BIOLOGICAL
kappa 0.45 0.83 0.62 117.69 nonkappa 0.33
0.77 0.74 110.47 pipopiate 0.40 0.82 0.9
8 122.52 opiate 0.53 0.86 1.13 133.13
piperidine 0.45 0.72 0.66 229.94 peptide_o
p 0.57 0.79 0.35 277.92 D2_ag 0.46 0.88 0
.80 78.48 D2_antag 0.32 0.71 0.64 195.64
renin_hisleu 0.67 0.90 0.45 112.26 ci976 0
.67 0.94 0.37 48.01 acathet 0.55 0.89 0.54
120.10 BUILDING BLOCK bbd_phnco 0.58 0.
88 0.51 23.87 bbd_arncs 0.49 0.89 0.50 32.
87 aa20 0.57 0.81 0.48 78.05 bbd_allaa 0
.31 0.87 0.57 21.79 COMBINATORIAL dipept
ides 0.51 0.96 0.49 7.77 dhydantoin 0.52 0
.92 0.72 56.83 benzodiaz1 0.68 0.94 0.92 5
9.68 benzodiaz2 0.75 0.98 0.81 16.64 REFERE
NCE newtopdrugs_95 0.19 0.48 0.72 238.7
3 introdrugs_9295 0.19 0.50 0.70 142.89 AC
Drandom2 0.16 0.49 0.44 115.45 ACDrandom3 0
.15 0.38 0.58 156.53
21
Principal Components (n 23, k 18)
EigenValue 9.19 4.35 1.85 1.15 0.44 Percent 5
1.04 24.14 10.27 6.40 2.44 CumPercent 51.04 75.1
8 85.45 91.86 94.30
22
Rotated Factor Pattern (dissimilarities and ln)
N -0.04 -0.12 -0.95 -0.06 av_mwt 0.16 0.12 0.17
0.94 Rfp -0.84 0.06 -0.21 -0.14 ln_Rtmax -0.98
-0.01 -0.01 -0.01 mdsim0 -0.77
-0.27 -0.38 -0.10 mdsim5 -0.97
0.04 0.02 -0.08 mdsim1 -0.88 0.15 0.23 0.19 ln_
mxbcom -0.65 0.34 -0.42 0.45 ln_mdbcom 0.56 0.63
0.14 0.50 ln_mnbcom 0.93 0.18 -0.12 0.17 min
bits 0.60 0.40 0.15 0.54 maxbits -0.39 0.77
-0.06 0.39 avbits 0.14 0.81 0.18 0.51 mpds -0.9
8 0.02 -0.02 -0.10 msds_nn1 -0.91
-0.08 0.30 -0.02 mcentr_dist -0.98
0.00 0.04 -0.11 mdd 0.03 0.97
0.10 -0.18 ln_mBCI -0.45 0.12 0.72 0.41
23
Datasets Plotted vs. First Two Rotated Factors
24
Datasets plotted vs. mean modal dissimilarity
(t0.5) and average bits
25
msds_nn1
Relative diversity by dataset type (groups along
x axis)
ACDrandom3
newtopdrugs
ACDrandom2
introdrugs9295
y
D2antag
piperidine
nonkappa
x
z
peptide_op
aa20
pipopiate
kappa
opiate
bbd_allaa
D2ag
bbd_phnco
acathet
bbd_arncs
renin_hisleu
dhydantoin
ci976
benzodiaz1
dipeptides
benzodiaz2
26
mean centr_dist
Relative diversity by dataset type (groups along
x axis)
ACDrandom3
ACDrandom2
y
newtopdrugs
introdrugs9295
D2antag
bbd_allaa
nonkappa
pipopiate
x
z
piperidine
D2ag
kappa
bbd_arncs
dhydantoin
dipeptides
opiate
acathet
aa20
peptide_op
bbd_phnco
ci976
benzodiaz1
renin_hisleu
benzodiaz2
27
ln_mxbcom
Relative diversity by dataset type (groups along
x axis)
introdrugs9295
y
bbd_allaa
newtopdrugs
nonkappa
pipopiate
ACDrandom3
kappa
renin_hisleu
bbd_arncs
D2antag
x
z
acathet
D2ag
ACDrandom2
benzodiaz1
piperidine
benzodiaz2
opiate
dipeptides
dhydantoin
bbd_phnco
peptide_op
ci976
aa20
28
Directions
  • behavior of metrics e.g.
  • on combining or subsetting datasets
  • other similarity functions - same or
    different?
  • calibration of metrics with chemists perception
  • other fingerprints
  • BCI, MACCS,Tripos
  • 3D information
  • use of modal similarities as dataset parameters
    for correlation or classification
  • how to discover the congruence between molecular
    similarity and biological function

29
Conclusions to date
  • Different metrics can capture different aspects
    of structural diversity
  • One metric will not suffice to provide adequate
    discrimination for all different types of
    diversity
  • Modal fingerprints and similarities may prove to
    be useful additions to the measurement of
    diversity

30
Acknowledgments
  • Parke-Davis
  • Biomolecular Structure and Drug Design
  • Christine Humblet
  • Daylight CIS. Inc
  • Norah Shemetulskis
  • David Weininger
  • Jeremy Yang

31
Correlations among diversity measures (n 23)
32
Correlations (cont.)
33
Datasets plotted vs. mean pairwise dissimilarity
and average bits
34
Datasets plotted vs. mean centroid distance and
average bits
35
?
Write a Comment
User Comments (0)
About PowerShow.com