Title: Szilrd Drnt
1 Scientific technical presentationJChem
Base
Feb 2008
2Contents
- Introduction
- Structural overview
- Compatibility
- Administration
- JChem tables
- Fingerprints
- Structural search
- Structure cache
- Standardization
- Search options
- JSP example
- API examples
- Performance
- Future plans
3Introduction to JChem Base
High performance Java based tools for -
storage/ - search / - retrieval of chemical
structures and associated data The components
can be integrated into web-based or standalone
applications in association with other ChemAxon
tools.
4Structural overview
Web browser
Application
Web application
JChem Base API Chemical logic Structure cache
JDBC driver Standard interface to the RDBMS
RDBMS (e.g. Oracle, MySQL, etc.) Storage and
security
5Compatibility and integration
6Administration with JChemManager
7The property table
- The property table stores information about JChem
structure tables, including - Fingerprint parameters
- Custom standardization rules
- Other table options and information
- Database-related licence keys
- More than one property table can be used, each
property table represents a particular JChem
environment.
8The structure of JChem tables
9Table types
- Molecules specific structures e.g. single
molecules, mixtures, salts, polimers - Reactions single step reactions
- Any structures all types of structures are
allowed, but no structure type-specific searching
10Chemical Hashed Fingerprints
- Chemical Hashed Fingerprints encode structural
patterns in bit strings - If structure A is a substructure of structure B,
every bit in Bs fingerprint will be set that is
set in structure As fingerprint - Tanimoto similarity of hashed fingerprints can be
used for diversity analysis and similarity search
11Structural search in database
- Two stage method provides optimal performance
- Rapid pre-screening reduces the number
of possible hit candidates - Chemical Hashed Fingerprints are used
for substructure and superstructure searches - Hash code is used for duplicate
filtering (usually during compound
registration) - Graph search algorithm is used to determine the
final hit list
12Structure Cache
- Contains Fingerprints for screening and ChemAxon
Extended SMILES for ABAS - Instant access to the structures for the search
process - Reduced load on the database server
- Incremental update ensures minimum overhead after
changes in the table - Small memory footprint due to
- SMILES compression
- Optimized storage technique
- Approximately 100MB memory needed for 1 million
typical drug-like structures (using default, 512
bit long fingerprints)
13Standardization
- Default standardization includes
- Hydrogen removal
- Aromatization
- Custom standardization can be specified for each
table by specifying an XML configuration file at
table creation or in the Regenerate dialog of
JChem Manager (jcman)
14Custom Standardization Example
JChem Cartridge http//www.chemaxon.com/JChem_Cart
ridge.ppt
15Database search options
16JSP example application
- Open source, customizable
- Features
- Substructure, Superstructure, Exact and
Similarity search - Molecular Descriptor similarity search with
descriptor coloring - Substructure hit alignment and coloring, inverse
hit list - Chemical Terms filter
- Import / Export
- Export of hits
- Insert / Modify / Delete structures
17API example connecting to a database
ConnectionHandler ch new chemaxon.jchem.db.Conne
ctionHandler() ch.setDriver(oracle.jdbc.driver.
OracleDriver) ch.setUrl(jdbcoraclethin_at_local
host1521mydb) ch.setPropertyTable(JChemProper
ties) ch.setLoginName(scott) ch.setPassword("
tiger") ch.connect() // the java.sql.Connection
object is available if needed Connection
conch.getConnection() // closing the
connection ch.close()
18API example database import
Importer importer new chemaxon.jchem.db.Importer
() importer.setConnectionHandler(conh) importer.
setInput(sample.sdf) // importer.setInput(is)
// alternatively a stream can also be
specified importer.setTableName(SCOTT.STRUCTURES
)
importer.setHaltOnError(false) importer.setDupli
cateImportAllowed(false) //can filter
duplicates // specifying SDFile field - table
field pairs String fieldPairs
DB_Field1SDF_Field1 DB_Field2SDF_Field2 impo
rter.setFieldConnections(fieldPairs) int
importedCount importer.importMols() System.out.
println( Imported importedCount
structures )
19API example database export
Exporter exporter new chemaxon.jchem.db.Exporter
() exporter.setConnectionHandler(conh) exporter
.setTableName(structures) //data fields to
be exported with the structure exporter.setFieldL
ist(cd_id cd_formula name comments) String
fileNameoutput.sdf OutputStream osnew
FileOutputStream(fileName) exporter.setOutputStre
am(os) exporter.setFormat(sdf)
int exportedCount
exporter.writeAll() System.out.println(Exported
exportedCount structures)
20API example database search
JChemSearch searcher new chemaxon.jchem.db.JChem
Search() searcher.setConnectionHandler(ch) searc
her.setSearchType(JChemSearch.SUBSTRUCTURE) search
er.setQueryStructure(c1ccccc1) searcher.setStru
ctureTable(SCOTT.STRUCTURES) // a query that
returns cd_id values can be used for
prefiltering Searcher.setFilterQuery( SELECT
cd_id FROM structures, biodata WHERE
structures.cd_id biodata.cd_id AND
biodata.toxicity esult(true) // otherwise runs in a separate
thread searcher.run() // getting the results as
cd_id values int resultssearcher.getResults()
21API example inserting a structure
// ConnectionHandler, mode, table name and data
field names UpdateHandler uh new
chemaxon.jchem.db.UpdateHandler( ch,
UpdateHandler.INSERT, structures, comment,
stock) uh.setStructure(c1ccccc1) // the
structure // specifying data field
values uh.setValueForAdditionalColumn(1, some
text) uh.setValueForAdditionalColumn(2, new
Double(8.5)) uh.setDuplicateFiltering(true)
// filtering duplicate structures int
iduh.execute(true) // getting back the
cd_id of the inserted structure if ( id 0 )
System.out.println(Inserted, cd_id value
id) else System.out.println(Already
exists with cd_id value (-id)) //
storing update information, the database
connection remains open uh.close()
22Performance (1)
- Compound registration
- Substructure search in a table of 3 million
compounds - JChem Base 3.2, Dual Xeon 3GHz, 2GB RAM Oracle
9.2.0.7.0
23Performance (2)
Similarity searchTanimoto 0.9 JChem
Base 3.2, Dual Xeon 3GHz, 2GB RAM Oracle
9.2.0.7.0
24Future plans
- Additional layer JChem Server (later also as
grid) - Tables for storing query structures
- Tables for storing general (Markush) structures
- Partial clean option for hit alignment
25Summary
- ChemAxons JChem Base API provides sophisticated
high performance tools for the developer to deal
with chemical structures and associated data. - Building on the JChem API is convenient,
because - Our various tools integrate seamlessly
- Both high and low level API classes are available
- Responsive developer-to-developer support
26Links
- JChem home page
- www.jchem.com
- Live demos
- www.jchem.com/examples
- API documentation
- www.jchem.com/doc/api
- Brochure
- www.chemaxon.com/brochures/JChemBase.pdf
27Visit other technical presentations
MarvinSketch/View http//www.chemaxon.com/MarvinSk
etch_View.ppt MarvinSpace http//www.chemaxon.com
/MarvinSpace.ppt Calculator Plugins
http//www.chemaxon.com/Calculator_Plugins.ppt J
Chem Base http//www.chemaxon.com/JChem_Base.ppt
JChem Cartridge http//www.chemaxon.com/JChem_Cart
ridge.ppt Standardizer http//www.chemaxon.com/St
andardizer.ppt Screen http//www.chemaxon.com/S
creen.ppt JKlustor http//www.chemaxon.com/JKlust
or.ppt Fragmenter http//www.chemaxon.com/Fragmen
ter.ppt Reactor http//www.chemaxon.com/Reactor.
ppt