An Introduction to Open Smallmolecule Resources of High Utility for Systems Biologists - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

An Introduction to Open Smallmolecule Resources of High Utility for Systems Biologists

Description:

Medicinal chemistry has a long history of providing a bridge between biology and ... levitra = CID 110634 PDE-5 inhibitor. carvedilol = CID 2585 alpha/beta blocker ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 40
Provided by: chriss67
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Open Smallmolecule Resources of High Utility for Systems Biologists


1
An Introduction to Open Small-molecule Resources
of High Utility for Systems Biologists
  • Tutorial for the International Conference on
    Systems Biology
  • Göteborg, August 2008
  • Christopher Southan, European Bioinformatics
    Institute,
  • Wellcome Trust Genome Campus, Cambridge, UK

2
Context
  • Medicinal chemistry has a long history of
    providing a bridge between biology and chemistry
    by identifying compounds that produce biological
    effects
  • It is increasingly recognised that bioactive
    compounds are an essential part of the
    perturbation toolbox for systems biology
  • Advancing biological knowledge vial a broad
    spectrum of small molecule investigations can
    lead to improved understanding not only of
    systems biology but also disease mechanisms and
    new opportunities for therapeutic intervention

3
Systems Chemical Biology
  • Oprea et al. Nat Chem Biol. 2007 (8)447-50
    PMID 17637771
  • The increasing availability of data related
    to genes, proteins and their modulation by small
    molecules has provided a vast amount of
    biological information leading to the emergence
    of systems biology and the broad use of
    simulation tools for data analysis. However,
    there is a critical need to develop
    cheminformatics tools that can integrate chemical
    knowledge with these biological databases and
    simulation approaches, with the goal of creating
    systems chemical biology.

4
Chemical Biology goes back a long way .
5
So does Bioactive Compound Structure
Representation..
6
But .... Times Have Changed for Chemical
Information
7
Strophanthidin from 1952 to 2008 Now just a
click to Hinxton
8
Or Bethesda.
9
The times have also changed for Chemical Biology
10
And the Union of Chemistry and Biology
11
November 2004 The Seeds of Revolution
12
PubChem and ChEBI Revolutionary Consequences
  • Arrival of the missing entity of formal and
    linked chemical structure representation within
    the global web of bioinformatic relationships

13
PubChem and ChEBI Revolutionary Consequences
  • Arrival of the missing entity of formal and
    linked chemical structure representation within
    the global web of bioinformatic relationships
  • Ability to search across links between
    biochemical data, biological effects and chemical
    structure information

14
PubChem and ChEBI Revolutionary Consequences
  • Arrival of the missing entity of formal and
    linked chemical structure representation within
    the global web of bioinformatic relationships
  • Ability to search across links between
    biochemical data, biological effects and chemical
    structure information
  • Deposition not just of HTS results but a wide
    range of other types of screening data directly
    linked to chemical structure information in
    public repositories

15
PubChem and ChEBI Revolutionary Consequences
  • Arrival of the missing entity of formal and
    linked chemical structure representation within
    the global web of bioinformatic relationships
  • Ability to search across links between
    biochemical data, biological effects and chemical
    structure information
  • Deposition not just of HTS results but a wide
    range of other types of screening data directly
    linked to chemical structure information in
    public repositories
  • Proliferation of cheminformatics tools,
    databases, nomenclatures, and ontologies in the
    public domain

16
PubChem and ChEBI Revolutionary Consequences
  • Arrival of the missing entity of formal and
    linked chemical structure representation within
    the global web of bioinformatic relationships
  • Ability to search across links between
    biochemical data, biological effects and chemical
    structure information
  • Deposition not just of HTS results but a wide
    range of other types of screening data directly
    linked to chemical structure information in
    public repositories
  • Proliferation of cheminformatics tools,
    databases, nomenclatures, and ontologies in the
    public domain
  • A quantum jump in the global enablement of
    chemical biology and medicinal chemistry

17
Post-Revolution How Many Compounds are Out There
?
  • Chemical Structure Lookup Service 36 million,
    100 sources
  • ChemSpider 21.5 million 150 sources
  • PubChem - 19,296,269 70 sources
  • SureChem 9 million from US, European and WO
    patents,

But how many are verified as bioactive ?
18
Relationships in Bioactive Chemical Space
metabolomes natural products
drugs
chem genomics sys biol probes
assay data
drug-like cpds from literature patents
Protein Sequences
19
Searchable Chemical Structure Designations and
Representations in Databases
  • SD/MOL files
  • IUPAC standard name
  • Sketched Image
  • SMILES
  • InChI codes
  • InChI strings
  • Experimental 3D structure
  • Code names (CID 121880)
  • Generic, trade and MeSH names
  • CAS numbers
  • Database acession numbers e.g. PubChem CID, SID,
    ChEBI ID, ChemSpider ID

All can be exact-match searched, some allow
simillarity searching, some also inter-convert
20
SD/MOLfile
The basic MDL chemical table files of atoms,
bonds, connectivity and 3D coordinates
  • benzene
  • ACD/Labs0812062058
  • 6 6 0 0 0 0 0 0 0 0 1 V2000
  • 1.9050 -0.7932 0.0000 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • 1.9050 -2.1232 0.0000 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • 0.7531 -0.1282 0.0000 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • 0.7531 -2.7882 0.0000 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • -0.3987 -0.7932 0.0000 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • -0.3987 -2.1232 0.0000 C 0 0 0 0 0
    0 0 0 0 0 0 0
  • 2 1 1 0 0 0 0
  • 3 1 2 0 0 0 0
  • 4 2 2 0 0 0 0
  • 5 3 1 0 0 0 0
  • 6 4 1 0 0 0 0
  • 6 5 2 0 0 0 0

21
Experimental 3D Structures
Cn3D view of PDB 1I7G   on the left PubChem
tesaglitazarCID 208901 on the right
22
SMILES -simplified molecular input line entry
notation for encoding molecular structures
  • Interconverts with 2D sketchers
  • Can then be searched
  • Human readable

23
Structure Sketchers/Converters
24
IUPAC Systematic Naming of Organic Chemical
Compounds
  • International Union of Pure and Applied Chemistry
    (IUPAC)
  • Should human readable and allow an unambiguous
    structural formula to be drawn
  • Usable for automated text-to-structure conversion
  • Taxol
  • (2aR,4S,4aS,6R,9S,11S,12S,12aR,12bS)-1,2a,3,4,4a,6
    ,9,10,11,12,12a,12b-Dodecahydro-
  • 4,6,9,11,12,12b-hexahydroxy-4a,8,13,13-tetramethyl
    -7,11-methano-5H-cyclodeca(3,4)benz(1,2b)oxet-5-on
    e 6,12b-diacetate, 12-benzoate, 9-ester with
    (2R,3S)-N-benzoyl-3-phenylisoserine

25
IUPAC International Chemical Identifier (InChI)
Textual Identifier for Chemical Substances
  • A formalized string conversion of IUPAC names but
    not human readable
  • Express more information than the simpler SMILES
    notation and differ in that every structure has a
    unique InChI string
  • InChI algorithm converts structural information
    in a three-step process normalization (to remove
    redundant information), canonicalization (to
    generate a unique number label for each atom),
    and serialization (to give a string of
    characters) but without explicit 3D information
  • The 25 character InChIKey is a hashed version of
    the full InChI designed to allow for easy web
    searches of chemical compounds (e,g, Google)

26
CAS Registry Number
  • Unique numeric identifier Contains up to 10
    digits, divided by hyphens into three parts, e.g.
    58-08-2 for caffeine (Google it)
  • Has no chemical significance
  • Widely used but not open-access because the
    source chemical information links to the CAS
    commercial databases e.g. SciFinder
  • Consequently the consistency of mappings to open
    identifiers cannot be verified

27
PubChem Identifiers CIDs and SIDs
  • PubChem is the NCBI informatics backbone for
    the NIH Molecular Libraries Initiative
  • A suite of three databases, PubChem Compound
    unique structures with computed properties )
    PubChem BioAssay ( results supplied by
    depositors) and PubChem Substance ( deposited
    compound structures)
  • The ten MLI-funded screening centers are run
    cellular and target-based HTSs using a compound
    collection of 250 K and submitting the results
    to PubChem

28
PubChem is now a Global Hub Including
bioinformatic dbs with in-links
ChEBi, enzyme ligands 8K
MMDB, PDB ligands 55K
P u b C h e m
ZINC, ready-to-dock 3.8 mill
KEGG, drugs and metabolites 14K
ChemBank, chemical genomics 0.4 mill
Human Metabolite db 2K
ChemIDplus, NIH tox data 383K
MEROPS protease inhibitors
ChemSpider 20 million
DrugBank, drugs and targets 4K
Drugs of the Future 3.4K
GPCR-Ligand Database
Nature Chemical Biology 0.8 K
LIPID MAPS, metabolism 8.8K
29
Searchable Measures of Chemical Similarity
  • 1D measured or computed molecular properties,
    e.g., molecular weight, number of rings,
    molecular surface area or volume, pKa, logP etc
  • 3D map a molecular surface, chemical graphs,
    spectral descriptors, distribution of
    electrostatic charge around a molecule
  • 2D fingerprints are by far the most common, based
    on a bit-string encoding of substructural
    occurrences

30
Molecular Fingerprints for Similarity Searching
  • Each bit in the fingerprint (or fragment
    bit-string) represents one molecular fragment.
    Typical length is 1000 bits
  • The bit string for a molecule records the
    presence (1) or absence (0) of each fragment
    in the molecule
  • Compare fingerprints of two molecules to identify
    common bits and hence common substructures (and
    hence overall structural resemblance)

31
Tanimoto Chemical Similarity
  • Tally features
  • Unique (a,b)
  • Both on (c)
  • Both off (d)
  • Similarity Formula
  • Tanimotoc/(abc)

Beware Chemical Similarity searches are not
standardised between databases
32
  • PubChem Chemical Searching

33
Bio-Chem Data Joins
34
A Pharmaceutical Portfolio from PubChem
35
Disambiguation
From Wells et al. Reaching for high-hanging
fruit in drug discovery at proteinprotein
interfaces
1R6N
1Y2F
36
OSRA Optical Structure Recognition
37
Checking Chemical Patents
  • Taking Nutlin-3 as an example the SMILES entry
    from PubChem
  • CC(C)OC1C(CCC(C1)OC)C2NC(C(N2C(O)N3CCNC(O)C3
    )C4CCC(CC4)Cl)C5CCC(CC5)Cl
  • was pasted into the SureChem search box
  • There are nine exact matches including the
    granted patent application from Roche shown below

38
Exploring Relationships in Entrez
BLAST Sequence Similarity
Protein Sequence
Biological Terms MeSH indexed
Literature PubMed
VAST Structure Similarity
Protein 3D Structures
Bioactivity Assay Results
2D Chemical Structure Similarity (3D soon)
Small Molecule Structures
Protein Sequences
Activity Profile Similarity
39
Linkage between Swiss-Prot-DrugBank-PubChem-MMDB
(411) (15728) 181 (2501)
see these marketed target links
Write a Comment
User Comments (0)
About PowerShow.com