An Introduction to Open Smallmolecule Resources of High Utility for Systems Biologists - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

An Introduction to Open Smallmolecule Resources of High Utility for Systems Biologists

Description:

Medicinal chemistry has a long history of providing a bridge between biology and ... levitra = CID 110634 PDE-5 inhibitor. carvedilol = CID 2585 alpha/beta blocker ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 40

Provided by: chriss67

Category:

more less

Transcript and Presenter's Notes

Title: An Introduction to Open Smallmolecule Resources of High Utility for Systems Biologists

1
An Introduction to Open Small-molecule Resources
of High Utility for Systems Biologists

Tutorial for the International Conference on
Systems Biology
Göteborg, August 2008
Christopher Southan, European Bioinformatics
Institute,
Wellcome Trust Genome Campus, Cambridge, UK

2
Context

Medicinal chemistry has a long history of
providing a bridge between biology and chemistry
by identifying compounds that produce biological
effects
It is increasingly recognised that bioactive
compounds are an essential part of the
perturbation toolbox for systems biology
Advancing biological knowledge vial a broad
spectrum of small molecule investigations can
lead to improved understanding not only of
systems biology but also disease mechanisms and
new opportunities for therapeutic intervention

3
Systems Chemical Biology

Oprea et al. Nat Chem Biol. 2007 (8)447-50
PMID 17637771
The increasing availability of data related
to genes, proteins and their modulation by small
molecules has provided a vast amount of
biological information leading to the emergence
of systems biology and the broad use of
simulation tools for data analysis. However,
there is a critical need to develop
cheminformatics tools that can integrate chemical
knowledge with these biological databases and
simulation approaches, with the goal of creating
systems chemical biology.

4
Chemical Biology goes back a long way .
5
So does Bioactive Compound Structure
Representation..
6
But .... Times Have Changed for Chemical
Information
7
Strophanthidin from 1952 to 2008 Now just a
click to Hinxton
8
Or Bethesda.
9
The times have also changed for Chemical Biology
10
And the Union of Chemistry and Biology
11
November 2004 The Seeds of Revolution
12
PubChem and ChEBI Revolutionary Consequences

Arrival of the missing entity of formal and
linked chemical structure representation within
the global web of bioinformatic relationships

13
PubChem and ChEBI Revolutionary Consequences

Arrival of the missing entity of formal and
linked chemical structure representation within
the global web of bioinformatic relationships
Ability to search across links between
biochemical data, biological effects and chemical
structure information

14
PubChem and ChEBI Revolutionary Consequences

Arrival of the missing entity of formal and
linked chemical structure representation within
the global web of bioinformatic relationships
Ability to search across links between
biochemical data, biological effects and chemical
structure information
Deposition not just of HTS results but a wide
range of other types of screening data directly
linked to chemical structure information in
public repositories

15
PubChem and ChEBI Revolutionary Consequences

Arrival of the missing entity of formal and
linked chemical structure representation within
the global web of bioinformatic relationships
Ability to search across links between
biochemical data, biological effects and chemical
structure information
Deposition not just of HTS results but a wide
range of other types of screening data directly
linked to chemical structure information in
public repositories
Proliferation of cheminformatics tools,
databases, nomenclatures, and ontologies in the
public domain

16
PubChem and ChEBI Revolutionary Consequences

Arrival of the missing entity of formal and
linked chemical structure representation within
the global web of bioinformatic relationships
Ability to search across links between
biochemical data, biological effects and chemical
structure information
Deposition not just of HTS results but a wide
range of other types of screening data directly
linked to chemical structure information in
public repositories
Proliferation of cheminformatics tools,
databases, nomenclatures, and ontologies in the
public domain
A quantum jump in the global enablement of
chemical biology and medicinal chemistry

17
Post-Revolution How Many Compounds are Out There
?

Chemical Structure Lookup Service 36 million,
100 sources
ChemSpider 21.5 million 150 sources
PubChem - 19,296,269 70 sources
SureChem 9 million from US, European and WO
patents,

But how many are verified as bioactive ?
18
Relationships in Bioactive Chemical Space
metabolomes natural products
drugs
chem genomics sys biol probes
assay data
drug-like cpds from literature patents
Protein Sequences
19
Searchable Chemical Structure Designations and
Representations in Databases

SD/MOL files
IUPAC standard name
Sketched Image
SMILES
InChI codes
InChI strings
Experimental 3D structure

Code names (CID 121880)
Generic, trade and MeSH names
CAS numbers
Database acession numbers e.g. PubChem CID, SID,
ChEBI ID, ChemSpider ID

All can be exact-match searched, some allow
simillarity searching, some also inter-convert
20
SD/MOLfile
The basic MDL chemical table files of atoms,
bonds, connectivity and 3D coordinates

benzene
ACD/Labs0812062058
6 6 0 0 0 0 0 0 0 0 1 V2000
1.9050 -0.7932 0.0000 C 0 0 0 0 0
0 0 0 0 0 0 0
1.9050 -2.1232 0.0000 C 0 0 0 0 0
0 0 0 0 0 0 0
0.7531 -0.1282 0.0000 C 0 0 0 0 0
0 0 0 0 0 0 0
0.7531 -2.7882 0.0000 C 0 0 0 0 0
0 0 0 0 0 0 0
-0.3987 -0.7932 0.0000 C 0 0 0 0 0
0 0 0 0 0 0 0
-0.3987 -2.1232 0.0000 C 0 0 0 0 0
0 0 0 0 0 0 0
2 1 1 0 0 0 0
3 1 2 0 0 0 0
4 2 2 0 0 0 0
5 3 1 0 0 0 0
6 4 1 0 0 0 0
6 5 2 0 0 0 0

21
Experimental 3D Structures
Cn3D view of PDB 1I7G on the left PubChem
tesaglitazarCID 208901 on the right
22
SMILES -simplified molecular input line entry
notation for encoding molecular structures

Interconverts with 2D sketchers
Can then be searched
Human readable

23
Structure Sketchers/Converters
24
IUPAC Systematic Naming of Organic Chemical
Compounds

International Union of Pure and Applied Chemistry
(IUPAC)
Should human readable and allow an unambiguous
structural formula to be drawn
Usable for automated text-to-structure conversion
Taxol
(2aR,4S,4aS,6R,9S,11S,12S,12aR,12bS)-1,2a,3,4,4a,6
,9,10,11,12,12a,12b-Dodecahydro-
4,6,9,11,12,12b-hexahydroxy-4a,8,13,13-tetramethyl
-7,11-methano-5H-cyclodeca(3,4)benz(1,2b)oxet-5-on
e 6,12b-diacetate, 12-benzoate, 9-ester with
(2R,3S)-N-benzoyl-3-phenylisoserine

25
IUPAC International Chemical Identifier (InChI)
Textual Identifier for Chemical Substances

A formalized string conversion of IUPAC names but
not human readable
Express more information than the simpler SMILES
notation and differ in that every structure has a
unique InChI string
InChI algorithm converts structural information
in a three-step process normalization (to remove
redundant information), canonicalization (to
generate a unique number label for each atom),
and serialization (to give a string of
characters) but without explicit 3D information
The 25 character InChIKey is a hashed version of
the full InChI designed to allow for easy web
searches of chemical compounds (e,g, Google)

26
CAS Registry Number

Unique numeric identifier Contains up to 10
digits, divided by hyphens into three parts, e.g.
58-08-2 for caffeine (Google it)
Has no chemical significance
Widely used but not open-access because the
source chemical information links to the CAS
commercial databases e.g. SciFinder
Consequently the consistency of mappings to open
identifiers cannot be verified

27
PubChem Identifiers CIDs and SIDs

PubChem is the NCBI informatics backbone for
the NIH Molecular Libraries Initiative
A suite of three databases, PubChem Compound
unique structures with computed properties )
PubChem BioAssay ( results supplied by
depositors) and PubChem Substance ( deposited
compound structures)
The ten MLI-funded screening centers are run
cellular and target-based HTSs using a compound
collection of 250 K and submitting the results
to PubChem

28
PubChem is now a Global Hub Including
bioinformatic dbs with in-links
ChEBi, enzyme ligands 8K
MMDB, PDB ligands 55K
P u b C h e m
ZINC, ready-to-dock 3.8 mill
KEGG, drugs and metabolites 14K
ChemBank, chemical genomics 0.4 mill
Human Metabolite db 2K
ChemIDplus, NIH tox data 383K
MEROPS protease inhibitors
ChemSpider 20 million
DrugBank, drugs and targets 4K
Drugs of the Future 3.4K
GPCR-Ligand Database
Nature Chemical Biology 0.8 K
LIPID MAPS, metabolism 8.8K
29
Searchable Measures of Chemical Similarity

1D measured or computed molecular properties,
e.g., molecular weight, number of rings,
molecular surface area or volume, pKa, logP etc
3D map a molecular surface, chemical graphs,
spectral descriptors, distribution of
electrostatic charge around a molecule
2D fingerprints are by far the most common, based
on a bit-string encoding of substructural
occurrences

30
Molecular Fingerprints for Similarity Searching

Each bit in the fingerprint (or fragment
bit-string) represents one molecular fragment.
Typical length is 1000 bits
The bit string for a molecule records the
presence (1) or absence (0) of each fragment
in the molecule
Compare fingerprints of two molecules to identify
common bits and hence common substructures (and
hence overall structural resemblance)

31
Tanimoto Chemical Similarity

Tally features
Unique (a,b)
Both on (c)
Both off (d)
Similarity Formula
Tanimotoc/(abc)

Beware Chemical Similarity searches are not
standardised between databases
32

PubChem Chemical Searching

33
Bio-Chem Data Joins
34
A Pharmaceutical Portfolio from PubChem
35
Disambiguation
From Wells et al. Reaching for high-hanging
fruit in drug discovery at proteinprotein
interfaces
1R6N
1Y2F
36
OSRA Optical Structure Recognition
37
Checking Chemical Patents

Taking Nutlin-3 as an example the SMILES entry
from PubChem
CC(C)OC1C(CCC(C1)OC)C2NC(C(N2C(O)N3CCNC(O)C3
)C4CCC(CC4)Cl)C5CCC(CC5)Cl
was pasted into the SureChem search box
There are nine exact matches including the
granted patent application from Roche shown below

38
Exploring Relationships in Entrez
BLAST Sequence Similarity
Protein Sequence
Biological Terms MeSH indexed
Literature PubMed
VAST Structure Similarity
Protein 3D Structures
Bioactivity Assay Results
2D Chemical Structure Similarity (3D soon)
Small Molecule Structures
Protein Sequences
Activity Profile Similarity
39
Linkage between Swiss-Prot-DrugBank-PubChem-MMDB
(411) (15728) 181 (2501)
see these marketed target links

Write a Comment

User Comments (0)