JKlustor clustering chemical libraries presented by - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

JKlustor clustering chemical libraries presented by

Description:

Well-known and most commonly used clustering methods in cheminformatics ... 8 centroids (cluster representative element) corresponded to the 8 activity classes ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 26
Provided by: chem2
Category:

less

Transcript and Presenter's Notes

Title: JKlustor clustering chemical libraries presented by


1
JKlustor clustering chemical librariespresent
ed by maintained by Miklós Vargyas
Last update 25 March 2010
2
JKlustor
  • Chemical clustering by similarity and structure

3
JKlustor
Description of the product
JKlustor performs similarity and structure based
clustering of compound libraries and focused sets
in both hierarchical and non-hierarchical fashion.
Availability
  • part of Jchem
  • IJC (parts)
  • server version (accessible via API)
  • batch application programs
  • HTML user interface
  • one desktop application with GUI
  • GUI is available as an applet

4
Summary of key features
Summary of key features
  • Wide range of methods
  • Unsupervised, agglomerative clustering
  • Hierarchical and non-hierarchical methods
  • Similarity based and structure based techniques
  • Flexible search options
  • Tanimoto and Euclidean metrics, weighting
  • Maximum common substructure identification
  • chemical property matching including atom type,
    bond type, hybridization, charge
  • Interactive display
  • interactive hierarchy browser (dendrogram viewer)
  • SAR-table
  • R-table
  • Efficient
  • performance of tools varies between linear and
    quadratic scale

5
Benefits
  • Versatile
  • Choose the most appropriate method to the
    clustering problem
  • Combine methods to achieve best results
  • Use your trusted molecular descriptors in
    similarity calculation
  • Easy integration in corporate discovery pipelines
  • Cluster chemical files directly no need to import
    structures in database
  • Intuitive
  • Cluster formation is self-explanatory

6
Similarity based clustering
  • Hierarchical
  • Ward
  • Non-hierarchical
  • Sphere exclusion
  • k-means
  • Jarvis-Patrick

7
Ward Clustering Features
  • Ward's minimum variance method results in tight,
    well separated clusters
  • Murtagh's reciprocal nearest neighbor (RNN)
    algorithm to speed it up
  • quadratic scaling of running time (with respect
    to number of input structures)
  • memory consumption scales linearly
  • best used with smaller sets (like focused
    libraries), copes with lt 100K structures

8
Sphere Exclusion Clustering Features
  • based on fingerprints and/or other numerical data
  • running time linear with respect to number of
    input structures
  • memory scales sub-linearly
  • can easily cope with 1Ms of structures
  • suitable for diverse subset selection

9
k-means Clustering Features
  • based on fingerprints and/or other numerical data
  • minimises variance within each clusters
  • number of clusters can directly be controlled
  • finds the centre of natural clusters in the input
    data
  • running time scales exponentially with respect to
    number of input structures
  • can cope with lt100Ks of structures

10
Jarp Clustering Features
  • variable-length Jarvis-Patrick clustering
  • based on fingerprints and/or other numerical data
  • takes structures/fingerprint and data values from
    either files or form database tables
  • running time scales better than quadratic but
    worse than linear (with respect to number of
    input structures)
  • memory scales linearly
  • Jarp can cope with 100Ks of structures
  • depending on data and parameters may create large
    number of singletons

11
Ward Clustering Example
  • 8 different sets of know active compounds mixed
    together
  • 5-HT3-antagonists
  • ACE inhibitors
  • angiotensin 2 antagonists
  • D2 antagonists
  • delta antagonists
  • FTP antagonists
  • mGluR1 antagonists
  • thrombin inhibitors
  • ChemAxons 2D Pharmacophore fingerprint was
    generated
  • Fingerprints of the mixture were clustered by
    Ward
  • 9 clusters were formed
  • 8 centroids (cluster representative element)
    corresponded to the 8 activity classes
  • 1 was a singleton
  • All 8 real clusters contained structures only
    from the activity class of the centroid (over 95
    true positive classification)

12
Ward Clustering Example
Centroids
13
Ward Clustering Example
Cluster of the D2 antagonists
14
Structure based clustering
  • Non-hierarchical
  • Bemis-Mucko frameworks
  • Hierarchical
  • LibraryMCS

15
Bemis-Murcko frameworks
16
Bemis-Murcko frameworks
17
Bemis-Murcko frameworks features
  • based on structure of molecules
  • cluster formation is apparent, visual, meets
    human expectations
  • running time linear with respect to number of
    input structures
  • memory scales sub-linearly
  • can easily cope with 1Ms of structures
  • suitable for quick overview of very large sets
  • spots scaffold hops

18
LibraryMCS
Identifies the largest subgraph shared by several
molecular structures
19
LibraryMCS Hierarchical MCS
20
SAR table view
21
R-group decomposition
22
LibraryMCS features
  • based on structure of molecules
  • cluster formation is apparent, visual, meets
    human expectations
  • running time near-linear with respect to number
    of input structures
  • can cope with 100K-200K of structures
  • suitable for very thorough analysis
  • spots scaffold hops
  • substituent-activity (property analysis)

23
LibraryMCS integration at Abbott
Clustering for the masses, presented by Derek
Debe at ChemAxons US UGM, Boston, 2008
24
Clustering performance comparison
25
Jklustor roadmap
  • In the development pipeline
  • Bemis-Murcko generalisations
  • IJC integration
  • KNIME integartion
  • New GUI
  • Manual clustering
  • Multiple class membership
  • Disconnected MCS (MOS)
  • Planned
  • PipelinePilot integration
  • Spotfire integration
  • JChemBase, JChemCartridge integration
  • JC4XLS integration
  • Blue sky
  • Multitouch gestures
  • LibraryMCS for 1M compound libraries
Write a Comment
User Comments (0)
About PowerShow.com