Title: CEFIC LRI Tools Ambit 1'21
1CEFIC LRI Tools Ambit 1.21
- Nina Jeliazkova
- Ideaconsult Ltd. Sofia, Bulgaria
2Outline
- Ambit overview
- Demo
- Finding basic information about a query compound
in the database - Complex query in the database retrieve data
meeting multiple criteria from Ambit database - Import data from EURAS Gold standard
Bioconcentration database
3Introduction why Ambit ?
- Limited free, publicly accessible,
methodologically transparent software was
identified as one of the roadblocks for
broadening use of in-silico methods (ICCA
Workshop in Setubal 2002, OECD) - Realization that efficient use of existing
information on chemicals requires better ways for
- Storage
- standardized formats, computer automated
verification of structures, capability to store
large amounts of data - Taking advantage of rapidly evolving field of
data mining and extraction of relevant information
4IT strategy
- Ambit - building blocks for Decision Support
System - High emphasis on
- interoperability for plug and play
- Flexibility modular design
- Transparency
- Open source, relying on open standards. Open
source software lowers the user barrier,
facilitates the dissemination activities and
enables the reproducibility of models and results - The cheminformatics functionality relies on the
open source Java library The Chemistry
Development Kit http//cdk.sourceforge.net/ - The software is based on MySQL database
(www.mysql.com), which is the most popular open
source relational database. - Chemical Markup Language (CML)
- acknowledged method of encoding chemical data in
XML - Is being adopted by a large number of chemical
organisations, from government, through
commercial to academia. - The choice of CML for the internal format makes
the database independent of the software which is
able to access it, in contrast to some
proprietary solutions.
5IT strategy
- Desktop installation MySQL database and
standalone application (AmbitDatabaseTools) on
the same PC - Intranet installation MySQL database on a server
and standalone application (AmbitDatabaseTools)
on the user PCs - Internet installation My SQL Database and web
server (JSP and Servlets), Web browser as user
interface
6Ambit overview
- The AMBIT database
- stores chemical structures, their identifiers
such as CAS, INChI numbers attributes such as
molecular descriptors, experimental data together
with test descriptions, and literature
references. The database can also store QSAR
models. In addition the software can generate a
suite of 2D and 3D molecular descriptors. - can be searched by identifiers, attribute value
or range, experimental data value or range, user
defined structure and substructure, structural
similarity - AMBIT database contains over 450 000 chemical
compounds with data imported from over a dozen
databases http//ambit.acad.bg/ambit/stats/.
The number of compounds is growing all the time
and one the of systems great strengths is that
any dataset can be imported for comparison and
analysis. AMBITDatabaseTools 1.21 allows the user
to create a local database and to import his own
sets of chemical compounds. - AMBIT Discovery performs chemical grouping and
assesses the applicability domain of a QSAR
offering a variety of methods including using
different approaches to similarity assessments
statistical that rely on descriptor space
approaches based on mechanistic understanding
and approaches based on structural similarity. - ECB QMRF inventory a tailored version of Ambit
database (under development). Will store
information in QMRF. Large effort on
standardization
7AMBIT Database Today
Not restricted to these datasets! Any dataset can
be imported. (e.g. DSSTox, AQUIRE, LLNA )
8AMBIT Database Schema
9AMBIT Online Similarity search
10AMBIT OnlineQuery result
11Links to other databases example KEGG
12Information about QSAR models
13Search AQUIRE database online
14Search EURAS Bioconcentration database online
15Ambit Database Tools 1.21
Standalone application available at
http//ambit.acad.bg/downloads
- AMBITDatabase main window consists of following
areas - Task bar on the left
- Molecule browser (top right)
- Molecule data tabs (bottom right)
- Fast SMILES entry panel (top)
- Status bar at the bottom.
16Demo
- Finding basic information about a query compound
in the database - Complex query in the database retrieve data
meeting multiple criteria from Ambit database - Import data from EURAS Gold standard
Bioconcentration database
17Exercise 1. Finding basic information about a
query compound in the database
- Launch AmbitDatabaseTools 1.20
- Start menu/ All Programs/ CEFIC-LRI/Ambit 1.20
Ambit database tools main screen. Various tasks
can be started from the menu options at the left
panel. This exercise uses Search / CAS RN menu to
lookup for compound with specific CAS RN
18Exercise 1a. Lookup by CAS RN
- An input box appears
- Enter 66-25-1 and click OK.
- The result appears in top panel (Molecule
browser) - Click on 3D tab to view the 3D structure
- Further processing save, calculate descriptors,
etc.
19Exercise 1b. Retrieve descriptors
- The objective of this exercise is to retrieve
values of several descriptors from the database.
The descriptors we are interested are - LogP
- Crossectional diameter
- Maximum diameter
- Molecular weight
- Use Molecule/Advanced data retrieval menu
20Exercise 1b. Retrieve descriptors
- The following window appears
- Check Read descriptors row
- The following window appears
- Check following descriptors
- XLogPDescriptor
- WeightDescriptor
- CrossectionalDiameterDescriptor
- MaximumDiameterDescriptor
21Exercise 1b. Retrieve descriptors
- The results appear in Descriptors tab
- Further processing save, etc.
22Exercise 1c. Retrieve AQUIRE data
- Use Molecule/AQUIRE menu to retrieve toxicity
data for hexaldehyde - The results can be observed in bottom panel,
EXPERIMENTAL data tab. Click on each row to view
more details. - Save to a file using File/Save menu (sdf, csv,
xls, txt)
23SDF file for hexaldehyde
- CDK 6/23/07,1323 19 18 0 0 0 0 0 0 0
0999 V2000 -0.0187 1.5258 0.0104 C 0 0
0 0 0 0 0 0 0 0 0 0 - 0.0021 -0.0041 0.0020 C 0 0 0 0 0
0 0 0 0 0 0 0 - 1.4167 2.0553 -0.0004 C 0 0 0 0 0
0 0 0 0 0 0 0 - -1.4333 -0.5336 0.0129 C 0 0 0 0 0
0 0 0 0 0 0 0 - 1.3963 3.5622 0.0079 C 0 0 0 0 0
0 0 0 0 0 0 0 -
- 6 18 1 0 0 0 0
- 6 19 1 0 0 0 0
- M END
- gt ltNSCgt
- 2596
- gt ltCrossSectionalDiameterDescriptor Angstromgt
- 2.4897
- gt ltXLogPDescriptorgt
- 1.7530
- gt ltMaximumDiameterDescriptor Angstromgt
24XLS file for hexaldehyde
25Exercise 2. Complex queries Use Ambit database
to retrieve data that meet multiple criteria
- Use Search options /options menu to configure
desired searches - Switch to Similarity tab and set 0,7 for Tanimoto
threshold (we will be searching for structures
with Tanimoto similarity gt 0.7)
26Exercise 2a. Similarity search
- Use Search/Structure search menu to invoke
advanced query window - Draw dimetylphtalate as shown at the figure
- Click Similarity button
- Browse the 7 compounds found (in Molecule
Browser) - Go to Search/options and lower threshold to 0.6
- Use Search/Structure search/Similarity again with
the same compound
27Exercise 2a. Similarity search
- Now there are 156 compounds with Tanimoto
similarity gt 0.6 - We will be using Molecule/Save as dataset menu
to store the query results into the database - Hint you can store query results directly into
database, without loading into Molecule Browser,
by setting Search Options/Result destination
DATABASE and then performing the query
28Database and datasets - background
- There can be many Ambit databases running on one
MySQL server - Within Ambit database the chemical compounds can
be grouped in many subsets. - Typically, one database consists of multiple
subsets (datasets), corresponding to the origin
of the data (e.g. the file used to import the
compounds) - The search results can be marked as a separate
subset within Ambit database - The search can be performed within entire Ambit
database or just on a selected subset. - This allows to use results of one query as a
input to another and restrict the set of
structures step by step
- Database server (MySQL)
- Ambit Database 1 (e.g. ambit)
- Dataset 1 (200 000 structures from NCI)
- Dataset 2 (600 structures from DSSTox EPA Fathead
Minnow) - Dataset 3 (AQUIRE)
- Dataset 4 (DSSTox carcinogenic potency data)
- Dataset 5 (EURAS Bioconcentration factor data)
- Dataset 6 (my similarity search results)
- Ambit Database 2 (e.g. test_database)
-
- Ambit Database N (e.g. my_secret_dataset)
- Other (non-Ambit) databases
29Exercise 2a. Similarity search
- Use Molecule/Save as dataset menu to store the
query results into the database - In the dialog box (as at right), add button
to add a new entry for the dataset. - Type in the name for the dataset (e.g.
Similarity search Tanimoto gt 0.6) - Click OK
30Exercise 2a. Similarity search
- Now the new dataset is available in the datasets
list and can be used to restrict subsequent
queries - Use Search options/Dataset menu to select which
dataset to be searched, select Similarity search
Tanimoto gt 0.6 and click OK
Note this will not load any structures into
Molecule browser!
31Exercise 2b. Pre-set physicochemical profile
- The objective is to extract compounds that have
physicochemical properties, relevant for
bioaccumulation from the set of structurally
similar compounds found by previous query. - The recommended descriptors and ranges are
- LogP lt 4.5
- Molecular weight lt 1100
- Cross sectional diameter lt 17.4 Å
- Maximum diameter lt 43 Å
32Exercise 2b. Pre-set physicochemical profile
- Use Search/Structure search menu
- The window with options for structure,
descriptors and experimental data queries
appears. - Click on Descriptors icon to obtain a list of
descriptors available in the database
33Exercise 2b. Pre-set physicochemical profile
- Select XLogP descriptor (click on first column
- Click on Condition column and select lt sign.
- Double click on the next column and enter 4.5
- Repeat with descriptors
- WeightDescriptor (Molecular weight) lt 1100
- CrosssectionalDiameterDescriptor (crossectional
diameter) lt 17.4 - MaximumDiameterDescriptor (maximum diameter or
maximum length) lt 43 - Click the Search button
34Exercise 2b. Pre-set physicochemical profile
- 123 out of the 156 structurally similar compounds
have the predefined profile. - The descriptor values can be inspected in the
Descriptors tab
35Exercise 2c. Retrieve available toxicity data
- Use Search Options/Options menu to select he
endpoint - Select AQUIRE tab
- Select LC50 (Lethal concentration to 50 of test
compounds) from the first list box
36Exercise 2c. Retrieve available toxicity data
- The next step is to tell the software we want to
retrieve the data for all retrieved compounds
(not only for the current structure). To do this - Select Molecule processing tab
- Select Molecule Browser Current set of
structures from the first list box
37Exercise 2c. Retrieve available toxicity data
- Use Molecule/AQUIRE menu to retrieve LC50 data
for the current set of compounds - Click Start button.
38Exercise 2c. Retrieve available toxicity data
- Browse the compounds to view AQUIRE data at the
bottom panel - Repeat the same procedure to retrieve BCF data
from AQUIRE
39Exercise 2d. Retrieve available toxicity data
(ER Binding)
- Structure/Search menu
- Click experiments
- Select DSSTox-ERBinding
- Select EndpointER RBA
- Click Search
40Exercise 2d. Retrieve available toxicity data
(ER Binding)
- Browse ER Binding data, save results into file
41More exercises
- Batch search
- Import structures into database
- Import descriptors and experimental data (e.g.
bioconcentration factor dataset) - Import QSAR models
- Database processing
- Descriptor calculation
- Atom environments, Fingerprint, SMILES generation
- Create new (empty) database.
- Create users for the new database
- Import compounds
42Ambit - Summary
- AMBIT software is a set of libraries and tools,
providing various chemoinformatics
functionalities for data management. - The AMBIT system consists of a database and
functional modules allowing a variety of flexible
searches and mining of the data stored in the
database. - The unique feature of AMBIT is the ability to
store multifaceted information about chemical
structures and provide a searchable interface
linking these diverse components.
43Thank you!