CEFIC LRI Tools Ambit 1'21 - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

CEFIC LRI Tools Ambit 1'21

Description:

Finding basic information about a query compound in the database ... Dataset 2 (600 structures from DSSTox EPA Fathead Minnow) Dataset 3 (AQUIRE) ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 44
Provided by: ninajel
Category:
Tags: cefic | lri | ambit | minnow | tools

less

Transcript and Presenter's Notes

Title: CEFIC LRI Tools Ambit 1'21


1
CEFIC LRI Tools Ambit 1.21
  • Nina Jeliazkova
  • Ideaconsult Ltd. Sofia, Bulgaria

2
Outline
  • Ambit overview
  • Demo
  • Finding basic information about a query compound
    in the database
  • Complex query in the database retrieve data
    meeting multiple criteria from Ambit database
  • Import data from EURAS Gold standard
    Bioconcentration database

3
Introduction why Ambit ?
  • Limited free, publicly accessible,
    methodologically transparent software was
    identified as one of the roadblocks for
    broadening use of in-silico methods (ICCA
    Workshop in Setubal 2002, OECD)
  • Realization that efficient use of existing
    information on chemicals requires better ways for
  • Storage
  • standardized formats, computer automated
    verification of structures, capability to store
    large amounts of data
  • Taking advantage of rapidly evolving field of
    data mining and extraction of relevant information

4
IT strategy
  • Ambit - building blocks for Decision Support
    System
  • High emphasis on
  • interoperability for plug and play
  • Flexibility modular design
  • Transparency
  • Open source, relying on open standards. Open
    source software lowers the user barrier,
    facilitates the dissemination activities and
    enables the reproducibility of models and results
  • The cheminformatics functionality relies on the
    open source Java library The Chemistry
    Development Kit http//cdk.sourceforge.net/
  • The software is based on MySQL database
    (www.mysql.com), which is the most popular open
    source relational database.
  • Chemical Markup Language (CML)
  • acknowledged method of encoding chemical data in
    XML
  • Is being adopted by a large number of chemical
    organisations, from government, through
    commercial to academia.
  • The choice of CML for the internal format makes
    the database independent of the software which is
    able to access it, in contrast to some
    proprietary solutions.

5
IT strategy
  • Desktop installation MySQL database and
    standalone application (AmbitDatabaseTools) on
    the same PC
  • Intranet installation MySQL database on a server
    and standalone application (AmbitDatabaseTools)
    on the user PCs
  • Internet installation My SQL Database and web
    server (JSP and Servlets), Web browser as user
    interface

6
Ambit overview
  • The AMBIT database
  • stores chemical structures, their identifiers
    such as CAS, INChI numbers attributes such as
    molecular descriptors, experimental data together
    with test descriptions, and literature
    references. The database can also store QSAR
    models. In addition the software can generate a
    suite of 2D and 3D molecular descriptors.
  • can be searched by identifiers, attribute value
    or range, experimental data value or range, user
    defined structure and substructure, structural
    similarity
  • AMBIT database contains over 450 000 chemical
    compounds with data imported from over a dozen
    databases http//ambit.acad.bg/ambit/stats/.
    The number of compounds is growing all the time
    and one the of systems great strengths is that
    any dataset can be imported for comparison and
    analysis. AMBITDatabaseTools 1.21 allows the user
    to create a local database and to import his own
    sets of chemical compounds.
  • AMBIT Discovery performs chemical grouping and
    assesses the applicability domain of a QSAR
    offering a variety of methods including using
    different approaches to similarity assessments
    statistical that rely on descriptor space
    approaches based on mechanistic understanding
    and approaches based on structural similarity.
  • ECB QMRF inventory a tailored version of Ambit
    database (under development). Will store
    information in QMRF. Large effort on
    standardization

7
AMBIT Database Today
Not restricted to these datasets! Any dataset can
be imported. (e.g. DSSTox, AQUIRE, LLNA )
8
AMBIT Database Schema
9
AMBIT Online Similarity search
10
AMBIT OnlineQuery result
11
Links to other databases example KEGG
12
Information about QSAR models
13
Search AQUIRE database online
14
Search EURAS Bioconcentration database online
15
Ambit Database Tools 1.21
Standalone application available at
http//ambit.acad.bg/downloads
  • AMBITDatabase main window consists of following
    areas
  • Task bar on the left
  • Molecule browser (top right)
  • Molecule data tabs (bottom right)
  • Fast SMILES entry panel (top)
  • Status bar at the bottom.

16
Demo
  • Finding basic information about a query compound
    in the database
  • Complex query in the database retrieve data
    meeting multiple criteria from Ambit database
  • Import data from EURAS Gold standard
    Bioconcentration database

17
Exercise 1. Finding basic information about a
query compound in the database
  • Launch AmbitDatabaseTools 1.20
  • Start menu/ All Programs/ CEFIC-LRI/Ambit 1.20

Ambit database tools main screen. Various tasks
can be started from the menu options at the left
panel. This exercise uses Search / CAS RN menu to
lookup for compound with specific CAS RN
18
Exercise 1a. Lookup by CAS RN
  • An input box appears
  • Enter 66-25-1 and click OK.
  • The result appears in top panel (Molecule
    browser)
  • Click on 3D tab to view the 3D structure
  • Further processing save, calculate descriptors,
    etc.

19
Exercise 1b. Retrieve descriptors
  • The objective of this exercise is to retrieve
    values of several descriptors from the database.
    The descriptors we are interested are
  • LogP
  • Crossectional diameter
  • Maximum diameter
  • Molecular weight
  • Use Molecule/Advanced data retrieval menu

20
Exercise 1b. Retrieve descriptors
  • The following window appears
  • Check Read descriptors row
  • The following window appears
  • Check following descriptors
  • XLogPDescriptor
  • WeightDescriptor
  • CrossectionalDiameterDescriptor
  • MaximumDiameterDescriptor

21
Exercise 1b. Retrieve descriptors
  • The results appear in Descriptors tab
  • Further processing save, etc.

22
Exercise 1c. Retrieve AQUIRE data
  • Use Molecule/AQUIRE menu to retrieve toxicity
    data for hexaldehyde
  • The results can be observed in bottom panel,
    EXPERIMENTAL data tab. Click on each row to view
    more details.
  • Save to a file using File/Save menu (sdf, csv,
    xls, txt)

23
SDF file for hexaldehyde
  • CDK 6/23/07,1323 19 18 0 0 0 0 0 0 0
    0999 V2000 -0.0187 1.5258 0.0104 C 0 0
    0 0 0 0 0 0 0 0 0 0
  • 0.0021 -0.0041 0.0020 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • 1.4167 2.0553 -0.0004 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • -1.4333 -0.5336 0.0129 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • 1.3963 3.5622 0.0079 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • 6 18 1 0 0 0 0
  • 6 19 1 0 0 0 0
  • M END
  • gt ltNSCgt
  • 2596
  • gt ltCrossSectionalDiameterDescriptor Angstromgt
  • 2.4897
  • gt ltXLogPDescriptorgt
  • 1.7530
  • gt ltMaximumDiameterDescriptor Angstromgt

24
XLS file for hexaldehyde
25
Exercise 2. Complex queries Use Ambit database
to retrieve data that meet multiple criteria
  • Use Search options /options menu to configure
    desired searches
  • Switch to Similarity tab and set 0,7 for Tanimoto
    threshold (we will be searching for structures
    with Tanimoto similarity gt 0.7)

26
Exercise 2a. Similarity search
  • Use Search/Structure search menu to invoke
    advanced query window
  • Draw dimetylphtalate as shown at the figure
  • Click Similarity button
  • Browse the 7 compounds found (in Molecule
    Browser)
  • Go to Search/options and lower threshold to 0.6
  • Use Search/Structure search/Similarity again with
    the same compound

27
Exercise 2a. Similarity search
  • Now there are 156 compounds with Tanimoto
    similarity gt 0.6
  • We will be using Molecule/Save as dataset menu
    to store the query results into the database
  • Hint you can store query results directly into
    database, without loading into Molecule Browser,
    by setting Search Options/Result destination
    DATABASE and then performing the query

28
Database and datasets - background
  • There can be many Ambit databases running on one
    MySQL server
  • Within Ambit database the chemical compounds can
    be grouped in many subsets.
  • Typically, one database consists of multiple
    subsets (datasets), corresponding to the origin
    of the data (e.g. the file used to import the
    compounds)
  • The search results can be marked as a separate
    subset within Ambit database
  • The search can be performed within entire Ambit
    database or just on a selected subset.
  • This allows to use results of one query as a
    input to another and restrict the set of
    structures step by step
  • Database server (MySQL)
  • Ambit Database 1 (e.g. ambit)
  • Dataset 1 (200 000 structures from NCI)
  • Dataset 2 (600 structures from DSSTox EPA Fathead
    Minnow)
  • Dataset 3 (AQUIRE)
  • Dataset 4 (DSSTox carcinogenic potency data)
  • Dataset 5 (EURAS Bioconcentration factor data)
  • Dataset 6 (my similarity search results)
  • Ambit Database 2 (e.g. test_database)
  • Ambit Database N (e.g. my_secret_dataset)
  • Other (non-Ambit) databases

29
Exercise 2a. Similarity search
  • Use Molecule/Save as dataset menu to store the
    query results into the database
  • In the dialog box (as at right), add button
    to add a new entry for the dataset.
  • Type in the name for the dataset (e.g.
    Similarity search Tanimoto gt 0.6)
  • Click OK

30
Exercise 2a. Similarity search
  • Now the new dataset is available in the datasets
    list and can be used to restrict subsequent
    queries
  • Use Search options/Dataset menu to select which
    dataset to be searched, select Similarity search
    Tanimoto gt 0.6 and click OK

Note this will not load any structures into
Molecule browser!
31
Exercise 2b. Pre-set physicochemical profile
  • The objective is to extract compounds that have
    physicochemical properties, relevant for
    bioaccumulation from the set of structurally
    similar compounds found by previous query.
  • The recommended descriptors and ranges are
  • LogP lt 4.5
  • Molecular weight lt 1100
  • Cross sectional diameter lt 17.4 Å
  • Maximum diameter lt 43 Å

32
Exercise 2b. Pre-set physicochemical profile
  • Use Search/Structure search menu
  • The window with options for structure,
    descriptors and experimental data queries
    appears.
  • Click on Descriptors icon to obtain a list of
    descriptors available in the database

33
Exercise 2b. Pre-set physicochemical profile
  • Select XLogP descriptor (click on first column
  • Click on Condition column and select lt sign.
  • Double click on the next column and enter 4.5
  • Repeat with descriptors
  • WeightDescriptor (Molecular weight) lt 1100
  • CrosssectionalDiameterDescriptor (crossectional
    diameter) lt 17.4
  • MaximumDiameterDescriptor (maximum diameter or
    maximum length) lt 43
  • Click the Search button

34
Exercise 2b. Pre-set physicochemical profile
  • 123 out of the 156 structurally similar compounds
    have the predefined profile.
  • The descriptor values can be inspected in the
    Descriptors tab

35
Exercise 2c. Retrieve available toxicity data
  • Use Search Options/Options menu to select he
    endpoint
  • Select AQUIRE tab
  • Select LC50 (Lethal concentration to 50 of test
    compounds) from the first list box

36
Exercise 2c. Retrieve available toxicity data
  • The next step is to tell the software we want to
    retrieve the data for all retrieved compounds
    (not only for the current structure). To do this
  • Select Molecule processing tab
  • Select Molecule Browser Current set of
    structures from the first list box

37
Exercise 2c. Retrieve available toxicity data
  • Use Molecule/AQUIRE menu to retrieve LC50 data
    for the current set of compounds
  • Click Start button.

38
Exercise 2c. Retrieve available toxicity data
  • Browse the compounds to view AQUIRE data at the
    bottom panel
  • Repeat the same procedure to retrieve BCF data
    from AQUIRE

39
Exercise 2d. Retrieve available toxicity data
(ER Binding)
  • Structure/Search menu
  • Click experiments
  • Select DSSTox-ERBinding
  • Select EndpointER RBA
  • Click Search

40
Exercise 2d. Retrieve available toxicity data
(ER Binding)
  • Browse ER Binding data, save results into file

41
More exercises
  • Batch search
  • Import structures into database
  • Import descriptors and experimental data (e.g.
    bioconcentration factor dataset)
  • Import QSAR models
  • Database processing
  • Descriptor calculation
  • Atom environments, Fingerprint, SMILES generation
  • Create new (empty) database.
  • Create users for the new database
  • Import compounds

42
Ambit - Summary
  • AMBIT software is a set of libraries and tools,
    providing various chemoinformatics
    functionalities for data management.
  • The AMBIT system consists of a database and
    functional modules allowing a variety of flexible
    searches and mining of the data stored in the
    database.
  • The unique feature of AMBIT is the ability to
    store multifaceted information about chemical
    structures and provide a searchable interface
    linking these diverse components.

43
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com