Title: The Bioinformatics of Small Molecules
1The Bioinformatics of Small Molecules
- David Wishart
- University of Alberta
- david.wishart_at_ualberta.ca
VanBUG Oct. 13, 2005
225,000
metabolite
3Why Are Small Molecules Important?
- Constituents to all macromolecules (DNA, RNA,
protein, carbohydrates) - Serve as cofactors and signaling molecules to
1000s of proteins - The chemistry part of biochemistry
- 99 of all drug entities and 90 of all drug
types are small molecules - 90 of all biomarkers used in clinical chemistry
are small molecules
4Small Molecules sit on top of the Pyramid of Life
Metabolomics Proteomics Genomics
1400 Chemicals
3000 Enzymes
25,000 Genes
5Molecular Informatics
Cheminformatics Bioinformatics Bioinformatics
1400 Chemicals
3000 Enzymes
25,000 Genes
6Cheminformatics vs. Bioinformatics
- Cheminformatics The application of information
technology to the study, analysis, distribution
and archiving of chemical data - Bioinformatics The application of information
technology to the study, analysis, distribution
and archiving of molecular biological data
7Two Solitudes
Bioinformatics
Cheminformatics
8Cheminformatics vs. Bioinformatics
- Established in the 1960s
- Designed for the needs of organic chemists
- User-pay, limited public access
- Funded by large companies (MDL, Bielstein, Sigma,
CAS)
- Established in the 1990s
- Designed for needs of molecular biologists
- Web-based, open access model
- Funded by large govt agencies (NCBI, EBI, NIH,
GC)
9Blurring the Boarders
2000
2005
Meta bolomics
Systems Biology
Proteomics
Genomics
10Whats Driving This?
NIH Roadmap
11Whats Driving This?
- Govt funded drug discovery and drug research
- Drive to find newer and better clinical
biomarkers - Molecular imaging (fMRI, PET)
- Biosimulation and improved modeling of metabolic
pathways - Modeling past success of open data access model
in biology to chemistry
12Major New Initiatives
- PubChem NIH/NCBI initiative
- BIND/SMID Genome Canada initiative
- KEGG Japanese initiative
- ChEBI European initiative
- DrugBank U of Alberta
- Human Metabolome Project - U of Alberta
- SimCell U of Alberta
Primary Focus on Databases
13PubChem
http//pubchem.ncbi.nlm.nih.gov/
14PubChem
- Released Sept. 16, 2004
- Part of the NIH Molecular Libraries Roadmap, led
by Steve Bryant - Contains more than 850,000 molecules
- 3 Linked databases Compound, Substance and
Bioassay - Links out to PubMed abstracts, NCBI 3D structures
and other Entrez resources
15PubChem Details
16NIH vs ACS
17BIND Small Molecules
www.bind.ca
18SMID Small Molecule Interaction Database
http//smid.blueprint.org/
19BIND/SMID
- Shows links (and mol. contacts) between small
molecules and the macromolecules to which they
bind - Extracted from PDB data
- Supports search by SMID ID, Protein GI, PDB ID,
Domain ID, Taxonomy - Supports BLAST sequence searches
- SMID Genomes lists putative ligand interactions
based on SMID/SMID BLAST
20SMID Genomes
21KEGG Kyoto Encyclopedia of Genes and Genomes
http//www.genome.jp/kegg/
22KEGG
- First small molecule database, established in
1996 - Links small molecules to EC data and known
pathways - Source data for many other small molecule
databases and tools - Provides limited linkage between small molecules
and the enzymes they interact with
23KEGG Contents
- PATHWAY 29,921 pathways generated from 246
reference pathways - GENES 1,138,129 genes in 31 eukaryotes 241
bacteria 24 archaea - LIGAND 12,973 compounds, 2,469 drugs, 11,148
glycans, 6,442 reactions - BRITE 7,526 KO (KEGG Orthology) groups
24ChEBI Chemical Entities of Biological Interest
http//www.ebi.ac.uk/chebi/
25ChEBI
- Includes 5719 compounds and other molecular
entities that are either products of nature or
synthetic products used to intervene in the
processes of living organisms - Derived from KEGG Ligand, IntEnz and Chemical
Ontology - Provides structures, names, synonyms, InChI,
Smiles, ontology, Registry s
26Major New Initiatives
- PubChem NIH/NCBI initiative
- BIND/SMID Genome Canada initiative
- KEGG Japanese initiative
- ChEBI European initiative
- DrugBank U of Alberta
- Human Metabolome Project - U of Alberta
- SimCell U of Alberta
27DrugBank
http//redpoll.pharmacy.ualberta.ca/drugbank/
28DrugBank
- A freely accessible, web-enabled, fully queryable
database that links drug structure/activity data
with protein structure/function/sequence data - Brings well-developed bioinformatics concepts of
search and comparison to medicinal chemistry - Links bioinformatics, proteomics and drug
discovery together
29DrugBank
- Contains nomenclature, synthesis,
structure/activity, physical chemistry info on
1000 FDA approved drugs - Contains nomenclature, structure, sequence,
pharmacology, drug metabolism info on
corresponding biomolecular targets - Wrapped with extensive querying and search tools
30DrugBank Browser
31DrugCard Links
32Query Tools
PharmaBrowse
ChemQuery
33Query Tools
SeqSearch
DataExtractor
34DrugBank Stats
35DrugBank Applications
- Newly sequenced proteomes can be analyzed
automatically for similarities to existing drug
targets, giving researchers quick lead ideas - Newly determined protein structures can be
Autodocked to a large 3D structure database of
known, well-behaved compounds to suggest lead
ideas
36DrugBank Applications
- Newly synthesized or identified lead compounds
can be compared to existing structures to
assess/predict possible efficacy, cross
reactivity, metabolism or physical properties - Existing drugs can be compared or analyzed for
key trends, properties or features to help in
drug design and drug synthesis efforts
37Major New Initiatives
- PubChem NIH/NCBI initiative
- BIND/SMID Genome Canada initiative
- KEGG Japanese initiative
- ChEBI European initiative
- DrugBank U of Alberta
- Human Metabolome Project - U of Alberta
- SimCell U of Alberta
38Human Metabolome Database
www.hmdb.ca
39HMDB
- A web-accessible database that links endogenous
human metabolite data to genes and diseases - Brings phys/chem data, structure data,
spectroscopic data, concentration data, disease
data and molecular biology data (SNPs, sequences,
EC, GenBank, UniProt, GO, reactions, pathway,
KEGG) into single repository
40The HMDB MetaboCard
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53Major New Initiatives
- PubChem NIH/NCBI initiative
- BIND/SMID Genome Canada initiative
- KEGG Japanese initiative
- ChEBI European initiative
- DrugBank U of Alberta
- Human Metabolome Project - U of Alberta
- SimCell U of Alberta
54BiosimulationThree Types of Simulation
Atomic Scale 0.1 - 1.0 nm Coordinate data Dynamic
data 0.1 - 10 ns Molecular dynamics
Meso Scale 1.0 - 10 nm Interaction data Kon,
Koff, Kd 10 ns - 10 ms Mesodynamics
Continuum Model 10 - 100 nm Concentrations Diffusi
on rates 10 ms - 1000 s Fluid dynamics
55Nationalism in Simulation
- Petri Nets Germany, Japan
- Flux-Balance Analysis USA
- Pi Calculus France
- ODEs and PDEs Japan, UK
- Agent-Based methods (CA) - Canada
56CA Methods in Games
SimCity 2000
The SIMS
57Dynamic Cellular Automata
- A novel method to apply Brownian motion to
objects in the Cellular Automata lattice (mimics
collisions) - Takes advantage of the scale-free nature of
Brownian motion and the scale-free nature of
heterogeneous mixtures to allow simulations to
span many orders of time (nanosec to hours) and
space (nanometers to meters)
58SimCell
http//wishart.biology.ualberta.ca/SimCell/
59SimCell
- Java application that uses Dynamic Cellular
Automata (DCA) to model motions, interactions,
transport and transformations at the meso-scale
(10-8 to 10-6 m) - Uses a square, 2D lattice to model processes,
lattice squares are equivalent to 3x3 nm regions - Molecular objects are moved randomly and
interactions determined according to a set of
interaction rules that are only applied when
objects are in contact (collision detection)
60Diffusion in Cytoplasm
61Enzyme-Substrate Progress Curves
Lactate Lo (1 e-kt)
Lactate Lo (1 e-kt)
pyruvate NADH ? lactate NAD
62The TCA Cycle SimCell
Acetate
Acetyl-CoA
Glycerol
Pyruvate
Oxaloacetate
Citrate
Isocitrate
L-Malate
?-Ketoglutarate
Fumarate
2
1
Succinate dehydrogenase
Succinate
Succinyl-CoA
63Succinate Production
Observed Predicted (SimCell)
64Glycerol Consumption
Observed Predicted (SimCell)
65Summary
- Small molecules are an integral part of genomics,
proteomics system biology - Several drivers are pushing or merging
cheminformatics into bioinformatics - New databases and new techniques are emerging to
assist in drug discovery, toxicology, biomarker
ID, cellular metabolism and cellular modelling - Bioinformatics ? Biosimulation
66Thanks
- Craig Knox
- Savita Shrivastava
- An Chi Guo
- Murtaza Hassanali
- Zhan Chang
- Jennifer Woolsey
- Kevin Jewell
- Dan Tzur
- Kevin Jeroncic
- Joey Cruz
- David Arndt
- David Block
- Peter Tang
- Russ Greiner
AICML
67BioAssay Database
- The BioAssay Database contains bioactivity
screens of chemical substances described in
PubChem Substance - Provides searchable descriptions of each
bioassay, including descriptions of the
conditions and readouts specific to a screening
protocol