Title: AMBIT Chemoinformatics Software for Data Management
1AMBIT Chemoinformatics Software for Data
Management
- Joanna Jaworska Nina Jeliazkova
- PG Brussels, Ideaconsult Ltd.,
- Belgium Bulgaria
2Introduction why Ambit ?
- Limited free, publicly accessible,
methodologically transparent software was
identified as one of the roadblocks for
broadening use of in-silico methods (ICCA
Workshop in Setubal 2002, OECD) - Realization that efficient use of existing
information on chemicals requires better ways for
- Storage
- standardized formats, computer automated
verification of structures, capability to store
large amounts of data - Taking advantage of rapidly evolving field of
data mining and extraction of relevant information
3IT strategy
- Ambit - building blocks for Decision Support
System - High emphasis on
- interoperability for plug and play
- Flexibility modular design
- Transparency
- Open source, relying on open standards. Open
source software lowers the user barrier,
facilitates the dissemination activities and
enables the reproducibility of models and results - The cheminformatics functionality relies on the
open source Java library The Chemistry
Development Kit http//cdk.sourceforge.net/ - The software is based on MySQL database
(www.mysql.com), which is the most popular open
source relational database. - Chemical Markup Language (CML)
- acknowledged method of encoding chemical data in
XML - Is being adopted by a large number of chemical
organisations, from government, through
commercial to academia. - The choice of CML for the internal format makes
the database independent of the software which is
able to access it, in contrast to some
proprietary solutions.
4Ambit - Overview
- AMBIT software is a set of libraries and tools,
providing various cheminformatics functionalities
for data management. - The AMBIT system consists of a database and
functional modules allowing a variety of flexible
searches and mining of the data stored in the
database. - The unique feature of AMBIT is the ability to
store multifaceted information about chemical
structures and provide a searchable interface
linking these diverse components.
5Ambit overview
- The AMBIT database
- stores chemical structures, their identifiers
such as CAS, INChI numbers attributes such as
molecular descriptors, experimental data together
with test descriptions, and literature
references. The database can also store QSAR
models. In addition the software can generate a
suite of 2D and 3D molecular descriptors. - can be searched by identifiers, attribute value
or range, experimental data value or range, user
defined structure and substructure, structural
similarity - AMBIT database contains over 450 000 chemical
compounds with data imported from over a dozen
databases http//ambit.acad.bg/ambit/stats/.
The number of compounds is growing all the time
and one the of systems great strengths is that
any dataset can be imported for comparison and
analysis. AMBITDatabaseTools 1.10 allows the user
to create a local database and to import his own
sets of chemical compounds. - AMBIT Discovery performs chemical grouping and
assesses the applicability domain of a QSAR
offering a variety of methods including using
different approaches to similarity assessments
statistical that rely on descriptor space
approaches based on mechanistic understanding
and approaches based on structural similarity. - ToxTree ToxTree is a flexible user friendly
application which integrates structure based
(classification) schemes. Currently 3 schemes are
available Verhaaar for fish toxicity, Cramer for
human acute toxicity, BfR rules for skin
irritation. ToxTree implements a plug-in
mechanism, allowing to be extended by modules
developed at a future time, without recompiling
the application. ToxTree and AMBIT modules can be
integrated one within another. - Toxmatch stand alone application for pairwise
similarity assessments with intention for
read-across. - QSAR database under development. Will store
information in QMRF. Large effort on
standardization
6AMBIT Database Today
Not restricted to these datasets!Any dataset can
be imported. (e.g. DSSTox, AQUIRE, LLNA )
7AMBIT Database Schema
8Experimental results repository
9Ambit database
- Two user interfaces to the database
- Online
- Standalone
- Online
- a more restricted interface
- Standalone
- Full interface
- Can be used for storing managing confidential
data - Common
- Can link with other databases and pull
information via webservices
10AMBIT database functionalities
- Storage information about chemicals name and
structure, descriptors, experimental data and
QSAR models - Example with a tailored template BCF golden
database LRI project ( EURAS) Q2 2007 - QSAR database with QMRF ( ECB funded)
- Conversion
- Different computer formats of structure,
CAS-structure - Calculation
- Variety of descriptors
- The available list is growing thanks to
contributions to CDK - Search
- identification search (CAS, SMILES, chemical
name) - Descriptor search
- Experimental data search
- Substructure and similarity search
- Complex searches with multiple criteria
(standalone)
11(No Transcript)
12(No Transcript)
13What kind of searches are desired ?
- Detailed analyses for pairwise similarity
- Similarity of a compound to compounds in the
database - Similarity of a compounds to a reference set
- Similarity of a set of compounds to compounds in
the database - Grouping based on chemical class
14Ambit online
- Searching for basic information
15AMBIT OnlineSimilarity search replace with
new search results !!!
16AMBIT OnlineQuery result
17Links to other databases(example KEGG)
18Link to Aquire
19Information about QSAR models
20Ambit Database Tools 1.20Standalone
applicationavailable at http//ambit.acad.bg/down
loads
21Ambit converter (Batch search)
- Ambit converter can open CML, CSV, HIN, ICHI,
INCHI, MDL MOL, MDL SDF, MOL2, PDB, SMI, TXT and
XYZ file types - Ambit converter can save SDF, MOL, CSV, TXT,
SMI file types. - CAS-SMILES conversion based on a database lookup
- Descriptors calculation
- Cramer rules,
- Verhaar scheme
22Ambit Database Tools 1.20
- Import to Database
- Compounds several file formats
- Descriptors SDF, CSV, TXT
- Experimental data SDF, CSV, TXT
- QSAR models SDF, CSV, TXT
- Database processing
- Calculate SMILES/Fingerprints/Atom environments
necessary in order to perform substructure and
similarity search. Should be invoked after
importing compounds into database - several file formats
- Descriptors calculation
- Distances calculation used to speed up distance
between heavy atoms query
23Ambit Database Tools 1.20
- perform a CAS RN search in the database (submenu
"Search -gt CAS RN search") - perform a SMILES search in the database (submenu
"Search -gt SMILES") - perform a molecular formula search in the
database (submenu ("Search -gt Molecular
formula") - define structure,descriptor,distance-based and
experimental data criteria and perform searches
in the database database
The user can select between the different
datasets existing in the AMBIT database.
Subsequent searches will be performed only
within the selected dataset
24AMBIT User InterfaceExample Search by structure
- Exact search
- Substructure search
- Similarity search
- Fingerprints
- Atom environments
25AMBIT User InterfaceExample Search by
descriptors
26AMBIT User InterfaceExample Search by
experimental data
27Similarity based on toxicity mechanismVerhaar
scheme Verhaar H.J.M., Van Leeuven C., Hermens
J.L.M.,Classifying Environmental Pollutants. 1
Structure-Activity Relationships for Prediction
of Aquatic Toxicity, Chemosphere, Vol.25, No.4,
pp.471-491, 1992
- 34 rules
- 5 classes
- Class 1. Narcosis or baseline toxicity
- Class 2 Less inert compounds
- Class 3 Unspecific reactivity
- Class 4 Compounds and groups of compounds acting
by a specific mechanism - Class 5 Not possible to classify according to
these rules
28(No Transcript)
29Chemical similarity assessment using the database
- Exact substructure search based on 2D
- Structural Similarity search (various methods)
- Criteria on descriptors
- Based on mechanistic understanding ( Verhaar
scheme)
30Another view on Similarity assessments with
Toxmatch and Discovery
- Discovery
- similarity to a set (summary representation)
- Toxmatch
- pairwise similarities
- Similarity to a set (nearest neighbours)
31Thank you
Questions?