Title: Java Solutions for Cheminformatics
1- Java Solutions for Cheminformatics
March 2005
2About Us
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
3History
- Formed
-
- 1998 Budapest, Hungary
- Skills base
- Chemistry,
- Software development,
- Predictive tools
- Aim
- Platform independent software for chemistry
- Highlights
- 1998 Custom projects
- 1999 Java tools for sketching/viewing structures
- 2000 Structure database support
- 2001 Clustering and diversity analysis
- 2003 Pharmacophore screening, property
predictions, reaction processing, fragmenting - 2004 Cartridge technology, virtual synthesis,
improved SMARTS support
4People
- Developers 17 (7 Phd, 10 MSc)
- Technical expertise
- Cheminformatics
- Synthetic and physico-chemistry
- Virtual drug design
- Java
- Web technology
- Business Support 3 (1 MSc, 2 BSc)
- Commercial expertise
- Negotiation contracting
- Relationship management
- Collaboration steering and development
- Strategic marketing
- Mutually benefitial (win win) business
relationships
5Selected Application Areas
Global licenses
Custom development projects
Value added constructions
Websites/portal front and back end
Educational
6Product development
1998
2003
2002
2000
2001
1999
2004
SMILES, SMARTS, PDB, Rgroups, isotopes,
shortcuts, Marvin Beans
Ball and stick, JPG, PNG, SVG, CutPaste with
Isis/ChemDraw, 2D cleaning, (de)aromatization,
reactions
SDF, RDF, XYZ animations, CML, templates,
compressed formats, Swing, 3D rendering
Partial charge, pKa, logP, logD, 3D generation,
radicals, Sgroups
Marvin
Marvin file format, enhanced stereo, enhanced
SMARTS support, shapes, text boxes, multiple
groups, TPSA, Donor/Acceptor...
Mac support, signed applets, Java Web Start, atom
mapping
Applets, Molfiles, stereo support, Windows, Unix
JChem
reaction searching, reaction processing,
pharmacophore analysis. screening,
standardization, fragmentation
cartridge, enhanced stereo searching, recursive
SMARTS, chemical expressions, virtual synthesis
clustering, diversity
Oracle, MySQL, SQLServer, Access, hashed
fingerprints, substructure and similarity
searching
DB2, PostgreSQL, Rgroup searching
Structure Database and Cheminformatics toolkit
7Current Products Overview
8Multiple Deployment Formats
- Applications
- Java Applets
- Signed Java Applets
- Java Web Start
- Java Beans
- Plugins
- JSP
9Why ChemAxon?
- Sophisticated virtual chemistry technology
- Platform independence and Web (Java)
- High performance tools (speed, capacity)
- Client oriented development
- Comprehensive API for the developers
- Detailed documentation
- Competitive prices
- Fast and reliable support
10Product Support
- Developers supporting developers
- Fast response to support question max. 24 hour
response (fast solution also!) - Final and beta releases available online.
- Detailed documents available online and extensive
help bundled within software - Skilled and relevant human support quality
(direct developer to developer) - Product development based on support requests
11Molecule Drawing and Visualization
About Us Molecule Drawing and Visualization Struct
ure Searching Structure Standardization Molecular
Predictions Chemical Expressions Screening Cluster
ing Fragment Analysis Virtual Synthesis Current
Developments
12Operating Systems
- 100 pure java
- Windows
- 95, 98, Me, NT, 2000, XP
- Macintosh
- OS 9, OS X
- Unix
- Linux, Solaris, Irix, etc.
13Web Browsers
- Internet Explorer
- Netscape
- Mozilla
- Safari
- Opera
14Marvin
- Various file formats
- Isotopes, charges, radicals
- Alias, pseudo atoms
- Templates
- Abbreviated groups
- Reactions
- Atom maps
- R-groups
- Stereo bonds, stereo configurations (R/S, E/Z)
- Enhanced stereo(ABS/AND/OR)
- SMARTS properties (atoms, bonds, recursive
SMARTS) - Chemical error checking
- Generic atoms and bonds
- Atom lists and not lists
- 2D cleaning
- 3D cleaning
- Various 3D models
- Shapes, text boxes
- Plugins
15Various File Formats
16Isotopes, Charges, Radicals
17Templates
18Abbreviated Groups
19R-groups
20Reactions
21Rendered 3D displays with MarvinSpace
22Structure Cleaning
CC(C)NCC(O)COC1C2CC(C)NC2CCC1
topology
3D
2D
23Structure Searching
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
24JChem Base Features
- Rapid fingerprint-based database scanning
- Sophisticated graph-based searching
- Integration with databases
- Oracle
- MS SQL Server
- DB2
- MYSQL
- PostgreSQL
- InterBase
- Access
- Custom standardization
- JChem Cartridge for searching in Oracle
- JSP integration
25Import with JChem Base Manager
26Query Features
- Exact structure
- Substructure
- Atom lists and notlists
- Explicit hydrogens
- Generic atoms
- Generic bonds
- SMARTS atom properties
- Aliphatic, aromatic
- Hydrogen count
- Connection count
- Valence
- Ring count
- Smallest ring size
- Recursive SMARTS
- Stereo atoms
- Stereo bonds
- R-group queries
- R-groups
- Occurence
- if / then conditions
- RestH
- Reaction search
- Transformation recognition
- Component identification
- Stereospecific reactions (inversion, retention)
- Diastereomers
- Enhanced stereo groups (Abs, And, Or)
27JChem Base JSP Integration
Thin client support only a web browser and Java
required
28Cartridge Technology
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
29JChem Cartridge for Oracle
JChem Cartridge for Oracle
Oracle can be extended to support chemical
database operations using the JChem Cartridge for
Oracle Examples Substructure search displaying
ID, SMILES codes, and molweight SELECT cd_id,
cd_smiles, cd_molweight FROM my_structuresWHERE
jc_contains(cd_smiles, 'CC(O)Oc1ccccc1C(O)O')
1 Finding benzene derivatives conforming the
Lipinskis rule of five SELECT count() FROM
my_structures WHERE jc_compare(structure,
'c1ccccc1','sep!ts!ctFilter(mass() lt 500)
(logP() lt 5) (donorCount() lt 5)
(acceptorCount() lt 10)') 1
30JChem Cartridge for Oracle
JChem Cartridge for Oracle
Example Oracle search returning similar
structures with logP gt1, which were acquired
after April 14th, 2002. MarvinView below.
31Structure Standardization
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
32Standardization
- Explicit hydrogens
- Aromatic bonds
- Mesomers
- Tautomers
- Counterions
33Standardization Example
after
before
34Molecular Predictions
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
35Calculator Plugins
- Available Calculations
- Elemental analysis
- Charge distribution
- Polarizability
- pKa
- logP
- logD
- Polar surface area
- Huckel Analysis
- H-bond donor-acceptor
- Major microspecies
- Refractivity
- Calculation Interface
- Marvin GUI
- Command line
- Chemical Terms
- API
36Elemental Analysis
37Polar Surface Area
38Partial Charge Distribution
39Partial Charge Distribution Calculation
Partial Equalization of Orbital
Electronegativities (PEOE) Orbital
electronegativity defined by Mulliken Orbital
electronegativity of atom i ciatbtqictqi2
qi partial charge Partial charge of atom i is
iteratively calculated based on Gasteigers
method ci(0) at, qi(0) 0 qi(n1) qi (n)
S(0.5)n(ci- ck)/ max(ci, ck) k index of a
neighbor of atom i
40Polarizability
41logP
42logP Example
logP Sfi
fI atomic logP increment
43Validation of the logP prediction
44logD
45logD Example
1(1) 2(2) 3-(3)
12(4) 13-(5) 23-(6)
k4
k1
k7
k5
123(0)
123-(7)
k2
k6
k3
logD is computed using micro ionization constants
(ki), micro partition coefficients (pi), and pH
46pKa
47pKa Plugin - Microconstants
Micro ionization constants (logk) are calculated
from regression equations that have three types
of calculated parameters
Intramolecular interactions
Partial charges
logk
Polarizabilities
48pKa Plugin - Macroconstants
Macro ionization constants (pKa) are calculated
from the microconstants (logk)
Ionization scheme
1- 1-2 123 2 1-3- 1-23- 3- 23-
49Hydrogen Bonds in pKa Calculation
Dlogk a (qi - qk) b a,b regression parameters
Intramolecular hydrogen bonds are also taken into
account
50Validation of the pKa prediction
51Chemical Expressions
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
52Chemical Terms
Elements of the language
- structure matching functions (describing
functional groups, reaction sites, similarity) - property calculations (partial charge
distribution, pKa, logP, electrophility) - arithmetic and logic-operators
Chemical Terms examples
53Applications of Chemical Terms
virtual synthesis reaction and synthesis rules
CT
pharmacophore analysis pharmacophore definitions
drug design goal functions
structure searching advanced query expressions
54Screening
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
55Pharmacophore Mapping
atom type colors
pharmacophore type colors
hydrophobic (h) aromatic (r) acceptor
(a) acceptor / donor (a/d) donor / cationic
(d/c) donor / aromatic (d/r)
56Topological Pharmacophore Fingerprint
57Hypothesis Fingerprints
58Dissimilarity Metrics
- Euclidean
- standard
- normalized
- weighted
- asymmetric
- Tanimoto
- standard
- scaled
- asymmetric
59Screening Optimization
- 10,000 test compounds
- (from NCI)
- 50 active compounds
- (ß-adrenoreceptor antagonists)
300 optimization
1/3 training set
TRAINING
1/3 query set
9,700 validation
1/3 spikes
VALIDATION
60Screening Validationß2-adrenoreceptor antagonists
- All compounds 9,700
- Known active compounds 18
- minimum hypothesis
61Active Hit Distributionß2-adrenoreceptor
antagonists
Mixing 18 active compounds with random 9,700 NCI
molecules. Sorting by pharmacophore similarity.
62Screening Validation
63Optimized ScreeningJSP Example
64Optimized ScreeningJSP Example Hits
65Clustering
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
66JKlustor
67Ward Clustering Features
- Ward's minimum variance method
- Murtagh's reciprocal nearest neighbor (RNN)
algorithm - O(n2) time complexity
- O(n) memory complexity
68Ward Pharmacophore Clustering Example
- 8 active compound sets
- 5-HT3-antagonists
- ACE inhibitors
- angiotensin 2 antagonists
- D2 antagonists
- delta antagonists
- FTP antagonists
- mGluR1 antagonists
- thrombin inhibitors
69Ward Centroids
70A Ward Cluster D2 antagonists
71Maximum Common Substructure Clustering
72Drug Design
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
73RECAP fragmentation example
74Virtual Synthesis
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
75The Ideal Virtual Reaction
- Generic (simple)
- the equation describes the transformation only
- few hundred generic reactions can form the basic
armory of a preparative chemist - Specific (complex)
- chemo-, recognizes reactive and inactive
functional groups - regio-, "knows" directing rules
- stereo-, inversion/retention
- Customizable
- to improve reaction model quality
76Reaction Modeling
- Processing selective "smart" reactions
- Batch mode (sequential or combinatorial
combinations) - Reverse direction
- High performance (speed and capacity)
- Customizable Reaction Engine!
77Chemoselective Reaction Definition
REACTIVITY !match(ratom(3), "6N,O,S1N,O,S
", 1) !match(ratom(3), "N,O,S1C,P,SN,O
,S", 1)
78Reactants
369 isocyanates and isothiocyanates
2920 amines, alcohols and thiols
79Chemoselective Reaction Products
1,264,391 single site products
80Regioselectivity (Markovnikov, Zaitsev)
Addition reaction definition with the Markovnikov
rule.
SELECTIVITY hcount(ratom(2))
An elimination reaction definition with Zaitsevs
rule.
SELECTIVITY -hcount(ratom(2))
81Regioselective Reaction Example
Chlorine migration example in four steps by
consecutive elimination and addition reactions.
82Regioselectivity (SeAr)
Reaction definition of aromatic electrophile
bromination of the benzene ring. The expression
defines a regioselectivity rule for the major
product.
SELECTIVITY -charge(ratom(1)) TOLERANCE 0.0045
83Regioselectivity (SeAr) Products
The virtual bromination of toluene with the above
reacton definition results the ortho and para
isomer as main product
and bromine is directed into the meta position
in case of nitro-benzene.
84Regioselectivity (SeAr) Example Products
1,198 monobrominated main products (tolerance is
set to zero)
85Virtual Synthesis
- Multiple steps
- Flexible compound dispatching
- Synthesis rules
- Synthesis tree building
- Memory, file and database mode
- Graphical synthesis browser
- Building block coloring
- Customizable Synthesis Engine!
86Synthesis Example
esterification
Derek S. Tan, Michael A. Foley, Matthew D. Shair,
Stuart L. Schreiber, J. Am. Chem. Soc., 1998,
120, 8565-8566
87Synthesis Definition
88Synthesis Browser
89Current Developments
About Us Molecule Drawing and Visualization Struct
ure Searching Cartridge Technology Structure
Standardization Molecular Predictions Chemical
Expressions Screening Clustering Fragment
Analysis Virtual Synthesis Current Developments
90Recent Developments
- Automatic searching of low-energy conformers
- Improved Oracle cartridge
- Structure searching combined with chemical
calculations - Exhaustive Synthesis for metabolism applications
- R-group decomposition
- Maximum common substructure search in molecule
pairs and in libraries
91Current Developments
- MarvinSpace, an OpenGL based 3D molecule and
surface visualisation engine for small and
macromolecules - Instant JChem Base, a desktop and enterprise
chemical database client with form builder - IUPAC naming plugin
- Isoelectric point plugin
- Random Synthesis for building up a diverse
virtual space of synthetically feasible compounds - Extension of the reaction library
- Further descriptors in the Topology Analysis
plugin
92Future Plans
- Metabolic transformation library
- Diverse database of synthetically accessible
compounds - Search in Markush compounds
- Peptide builder
- Fragment-based activity analysis of compound
libraries - AnalogMaker (fragment based random evolutionary
analog design) - Retrosynthesis
93Visit us
- Home page
- www.chemaxon.com
- Forum
- www.chemaxon.com/forum
- Animated demos and tutorials
- www.chemaxon.com/demos
- Presentations and posters
- www.chemaxon.com/conf
94Thank you for your attention
Máramaros köz 3/a Budapest,
1037Hungaryinfo_at_chemaxon.comwww.chemaxon.com