JKlustor clustering chemical libraries presented by - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

JKlustor clustering chemical libraries presented by

Description:

Well-known and most commonly used clustering methods in cheminformatics ... 8 centroids (cluster representative element) corresponded to the 8 activity classes ... – PowerPoint PPT presentation

Number of Views:118

Avg rating:3.0/5.0

Slides: 26

Provided by: chem2

Category:

more less

Transcript and Presenter's Notes

Title: JKlustor clustering chemical libraries presented by

1
JKlustor clustering chemical librariespresent
ed by maintained by Miklós Vargyas
Last update 25 March 2010
2
JKlustor

Chemical clustering by similarity and structure

3
JKlustor
Description of the product
JKlustor performs similarity and structure based
clustering of compound libraries and focused sets
in both hierarchical and non-hierarchical fashion.
Availability

part of Jchem
IJC (parts)
server version (accessible via API)
batch application programs
HTML user interface
one desktop application with GUI
GUI is available as an applet

4
Summary of key features
Summary of key features

Wide range of methods
Unsupervised, agglomerative clustering
Hierarchical and non-hierarchical methods
Similarity based and structure based techniques
Flexible search options
Tanimoto and Euclidean metrics, weighting
Maximum common substructure identification
chemical property matching including atom type,
bond type, hybridization, charge
Interactive display
interactive hierarchy browser (dendrogram viewer)
SAR-table
R-table
Efficient
performance of tools varies between linear and
quadratic scale

5
Benefits

Versatile
Choose the most appropriate method to the
clustering problem
Combine methods to achieve best results
Use your trusted molecular descriptors in
similarity calculation
Easy integration in corporate discovery pipelines
Cluster chemical files directly no need to import
structures in database
Intuitive
Cluster formation is self-explanatory

6
Similarity based clustering

Hierarchical
Ward
Non-hierarchical
Sphere exclusion
k-means
Jarvis-Patrick

7
Ward Clustering Features

Ward's minimum variance method results in tight,
well separated clusters
Murtagh's reciprocal nearest neighbor (RNN)
algorithm to speed it up
quadratic scaling of running time (with respect
to number of input structures)
memory consumption scales linearly
best used with smaller sets (like focused
libraries), copes with lt 100K structures

8
Sphere Exclusion Clustering Features

based on fingerprints and/or other numerical data
running time linear with respect to number of
input structures
memory scales sub-linearly
can easily cope with 1Ms of structures
suitable for diverse subset selection

9
k-means Clustering Features

based on fingerprints and/or other numerical data
minimises variance within each clusters
number of clusters can directly be controlled
finds the centre of natural clusters in the input
data
running time scales exponentially with respect to
number of input structures
can cope with lt100Ks of structures

10
Jarp Clustering Features

variable-length Jarvis-Patrick clustering
based on fingerprints and/or other numerical data
takes structures/fingerprint and data values from
either files or form database tables
running time scales better than quadratic but
worse than linear (with respect to number of
input structures)
memory scales linearly
Jarp can cope with 100Ks of structures
depending on data and parameters may create large
number of singletons

11
Ward Clustering Example

8 different sets of know active compounds mixed
together
5-HT3-antagonists
ACE inhibitors
angiotensin 2 antagonists
D2 antagonists
delta antagonists
FTP antagonists
mGluR1 antagonists
thrombin inhibitors
ChemAxons 2D Pharmacophore fingerprint was
generated
Fingerprints of the mixture were clustered by
Ward
9 clusters were formed
8 centroids (cluster representative element)
corresponded to the 8 activity classes
1 was a singleton
All 8 real clusters contained structures only
from the activity class of the centroid (over 95
true positive classification)

12
Ward Clustering Example
Centroids
13
Ward Clustering Example
Cluster of the D2 antagonists
14
Structure based clustering

Non-hierarchical
Bemis-Mucko frameworks
Hierarchical
LibraryMCS

15
Bemis-Murcko frameworks
16
Bemis-Murcko frameworks
17
Bemis-Murcko frameworks features

based on structure of molecules
cluster formation is apparent, visual, meets
human expectations
running time linear with respect to number of
input structures
memory scales sub-linearly
can easily cope with 1Ms of structures
suitable for quick overview of very large sets
spots scaffold hops

18
LibraryMCS
Identifies the largest subgraph shared by several
molecular structures
19
LibraryMCS Hierarchical MCS
20
SAR table view
21
R-group decomposition
22
LibraryMCS features

based on structure of molecules
cluster formation is apparent, visual, meets
human expectations
running time near-linear with respect to number
of input structures
can cope with 100K-200K of structures
suitable for very thorough analysis
spots scaffold hops
substituent-activity (property analysis)

23
LibraryMCS integration at Abbott
Clustering for the masses, presented by Derek
Debe at ChemAxons US UGM, Boston, 2008
24
Clustering performance comparison
25
Jklustor roadmap