Title: Bioinformatics: Impact on Health and Drug Development
1Bioinformatics Impact on Health and Drug
Development
- Symposium 6 Ballroom B
- 7th International ISSX Meeting
- Vancouver, BC Aug. 31, 2004
2Bioinformatics Impact on Health Drug
Development
- 740 am Bioinformatics in Drug Discovery and
Development D.S. Wishart - 820 am PharmGKB The Pharmacogenetics and
Pharmacogenomics Knowledge Base R. Altman - 900 am Bioinformatics and Visual Genomics
Seeing Genes, Proteins and Metabolism C. Sensen
3Bioinformatics Impact on Health Drug
Development
- 940 am Coffee Break
- 1020 am Automated Docking and MD Simulations
of Substrate Binding in Cytochrome P450 N.
Vermeulen - 1100 am Metabolic Profiling Using an LC/MS
NMR Based Approach J. Shockcor - 1140 am Posters and Refreshments
4Bioinformatics in Drug Discovery and Development
- David Wishart, University of Alberta
- 7th International ISSX Meeting
- Vancouver, BC Aug. 29-Sept. 2, 2004
5The Pyramid of Life
Metabolomics Proteomics Genomics
1400 Chemicals
B I O I N F O R M A T I C S
10,000 Proteins
30,000 Genes
6Drug Discovery Development
80 40 50 200 50
million
3.5 yrs 1 yr 2 yrs 3 yrs 2.5
yrs Discovery Phase I Phase II Phase III
FDA Approval
Drug Development Pipeline
Chemistry
Genomics
Proteomics
Metabolomics
B I O I N F O R M A T I C S
7Bioinformatics (or Computational Biology)
- Not just the study of DNA or protein sequence
data - Inclusive definition concerns the storage,
display, reduction, management, analysis,
extraction, simulation, modelling, fitting or
prediction of biological, medical or
pharmaceutical data
8Key Informatics Challenges in Drug Development
- Using genomic, proteomic, metabolomic
structural data to ID drug targets or drug leads - Using genomic, metabolomic and structural data to
predict drug metabolism, xenobiotic toxicity and
characterize adverse drug reactions
9Drugs from Genomes
Gene Therapies
Protein Drugs
Drug Targets
10Two Types of Diseases
- Diseases that arise from in-born sequence errors
in germ cells or spontaneous (or age-related)
mutations in somatic cells - Diseases that arise from an infectious vector
(virus, bacterium or parasite) that has its
origins outside
Endogenous Disease
Exogenous Disease
11Endogenous Diseases
- Select cohort with disease or condition
- Isolate gene region showing distinct features
- Sequence whole region of interest
- Compare to Human UniGene Map
- ID location of common mutations
- Predict function cell location of gene prdct
- Predict/Determine structure of gene product
- Design antagonists, agonists or replacement
12Exogenous Diseases
- Sequence pathogen or pathogens
- Identify critical genes
- metabolic enzymes
- toxins or pseudo-toxins
- targeting receptors or coat proteins
- Select unique (low homology) genes
- Use prior knowledge to ID lead compds
- Develop vaccine candidates
13Bioinformatics
- Both exogenous and endogenous diseases require
methods for rapid and comprehensive genomic,
proteomic and metabolomic annotation - Identifying drug targets or drug candidates
requires linking metabolomic or chemical compound
data with sequence and pathway data
14Genome Annotation - Magpie
C. Sensen
15Metabolomes (KEGG)
- Number of pathways 17,263
- Number of organisms 213
- Number of genes 754,236
- Number of compounds 11,165
- Number of glycans 10,895
- Number of chemical reactions 6,140
http//www.genome.jp/kegg/kegg1.html
16Therapeutic Target DB (C.Y. Zong)
http//xin.cz3.nus.edu.sg/group/cjttd/TTD_ns.asp
17Database Integration
KEGG
Magpie
DrugBank
TTD
18The DrugBank Home Page
http//redpoll.pharmacy.ualberta.ca
19DrugBank
- A freely accessible, web-enabled, fully queryable
database that links drug structure/activity data
with protein structure/function/sequence data - Contains nomenclature, synthesis, structure,
activity, chemistry info on FDA drugs - Contains nomenclature, structure, sequence,
pharmacology, drug metabolism info on
corresponding biomolecule targets - Extensive querying search tools
20DrugBank Browser
http//redpoll.pharmacy.ualberta.ca
21DrugBank DrugCard
22DrugBank DrugCard
- Common names, alternate names, brand names, IUPAC
names, CAS , mixtures, source, manufacturer,
MSDS link, PIN, DIN - Structure, formula, solubility, toxicity, state,
LogP, melting/boiling point, synthesis, 3D
structure, SMILES, MOL-file, PDB file, NMR MS
spectra, l max - Drug class, indication, pharmacology, mechanism,
drug target, prescription information,
metabolites metabolism, metabolism SNPs - Target sequence, GenBank link, target structure
(2o, 3o or model), PDB file, target MW, target
AA, cellular location, chromosome, chromosome
position, SNPs
23DrugBank Querying
- Sorting (by MW, indication, category)
- Text query (boolean query, AND, OR, NOT, ) using
GLIMPSE - Sequence query (BLAST search)
- Structure query (draw structure, search for
similar structures) - Relational data extraction (columns of numbers or
text for graphing)
24DrugBank Applications
- Newly sequenced proteomes can be analyzed
automatically for similarities to existing drug
targets, giving researchers quick lead ideas - Newly determined protein structures can be
Autodocked to a large database of known,
well-behaved compounds to suggest lead ideas
25DrugBank Applications
- Newly synthesized or identified lead compounds
can be compared to existing structures to
assess/predict possible efficacy, cross
reactivity, metabolism or physical properties - Existing drugs can be compared or analyzed for
key trends, properties or features to help in
drug design synthesis efforts
26Key Informatics Challenges in Drug Development
- Using genomic, metabolomic structural data to
ID drug targets or drug leads - Using genomic, metabolomic structural data to
predict or characterize drug metabolism,
xenobiotic toxicity and adverse drug reactions
27Predicting Drug Metabolism Through CyP450 Docking
N. Vermeulen
28Predicting Gene-Drug Interactions via Curated
Community Knowledge
R. Altman
29Seeking Gene-Drug Relations through PolySearch
http//redpoll.pharmacy.ualberta.ca
30PolySearch
- Supports PubMed text searching for gene, drug
disease associations (user provides
disease/gene/drug name) - Automatically scores IDs genes and searches
for known SNPs or mutations against std. SNP
databases - Grabs gene sequences and generates primers around
SNPs - Archives (MySQL database) or sends results as
HTML page to user
31PolySearch
- Searches over 14 million PubMed records, gt3400
diseases (and synonyms), 14,000 human genes
(43,000 synonyms), gt1000 compounds or drugs
(gt3000 compound synonyms) - Assesses quality using SCI list of impact factors
for 8600 journals - Example of growing use of text mining in
bioinformatics
32Characterizing ADR Drug Metabolism via
Spectroscopy
- Not all ADRs can be predicted in vitro or in
silico - Identifying drug metabolites and characterizing
metabolic changes in blood or urine requires
advanced computational/bioinformatics methods - Represents an emerging application of
bioinformatics computational biology
33Metabonomics
Efficacy
Primary Molecules
Filtration
Toxicity
Secondary Molecules
Dilution
Concentration
Resorption
Chemical Fingerprint
34Characterizing ADR Drug Metabolism via
Spectroscopy
Sample Injection
35Classifying ADR via PCA
J. Shockcor
36Chemical Shift Chromatography
Mixture separation by HPLC (followed by ID via
Mass Spec)
Mixture separation by NMR (simultaneous separation
ID)
Chemical Shift Chromatography
37Spectral Fitting (Principles)
Constrained Least Squares Fitting
38 NMR Analysis of Urine
Chenomx Inc. Eclipse 2.0
39Current Compound List
- L-Isoleucine
- L-Lactic Acid
- L-Lysine
- L-Methionine
- L-phenylalanine
- L-Serine
- L-Threonine
- L-Valine
- Malonic Acid
- Methylamine
- Mono-methylmalonate
- N,N-dimethylglycine
- N-Butyric Acid
- Pimelic Acid
- Propionic Acid
- Pyruvic Acid
- Salicylic acid
- Sarcosine
- ()-(-)-Methylsuccinic Acid
- 2,5-Dihydroxyphenylacetic Acid
- 2-hydroxy-3-methylbutyric acid
- 2-Oxoglutaric acid
- 3-Hydroxy-3-methylglutaric acid
- 3-Indoxyl Sulfate
- 5-Hydroxyindole-3-acetic Acid
- Acetamide
- Acetic Acid
- Acetoacetic Acid
- Acetone
- Acetyl-L-carnitine
- Alpha-Glucose
- Alpha-ketoisocaproic acid
- Benzoic Acid
- Betaine
- Beta-Lactose
- Citric Acid
- Creatine
- DL-Carnitine
- DL-Citrulline
- DL-Malic Acid
- Ethanol
- Formic Acid
- Fumaric Acid
- Gamma-Amino-N-Butyric Acid
- Gamma-Hydroxybutyric Acid
- Gentisic Acid
- Glutaric acid
- Glycerol
- Glycine
- Glycolic Acid
- Hippuric acid
- Homovanillic acid
- Hypoxanthine
- Imidazole
- Inositol
- isovaleric acid
40Metabolic Microarray
Acetic Acid Betaine Carnitine Citric
Acid Creatinine Dimethylglycine Dimethylamine Hipp
ulric Acid Lactic Acid Succinic
Acid Trimethylamine Trimethlyamin-N-Oxide Urea Lac
tose Suberic Acid Sebacic Acid Homovanillic
Acid Threonine Alanine Glycine Glucose
Normal Below Normal Above Normal Absent
Patient 1 Patient 2 Patient 3 Patient 4 Patient
5 Patient 6 Patient 7 Patient 8 Patient 9 Patient
10 Patient 11 Patient 12 Patient 13 Patient
14 Patient 15
41The Human Metabolome Project
- 7.2 million Genome Canada project starting
Sept. 1, 2004 (10 PIs in analytical clinical
chemistry bioinformatics) - Expect to ID and archive gt1400 metabolites and
metabolite ranges using NMR, MS, HPLC
informatics - Establishment of the Human Metabolome Databank
(HMD)
42The HMD
- Web-accessible, freely available continuously
updated compilation of base-line metabolites in
urine and plasma - Similar content to DrugBank, including pathway
prediction and metabolic modeling - Compound ordering
43Conclusions
- Bioinformatics is being used to integrate
genomic, metabolomic structural data to help ID
drug targets or drug leads - Bioinformatics combines genomic, metabolomic
structural data to help predict or characterize
drug metabolism, xenobiotic toxicity and adverse
drug reactions
44Conclusions
- Unlike genomics/proteomics data, most drug, drug
metabolism, ADR and ADME data is still in books
or journals not in electronic form - This limits development of tools, databases and
predictive software - As more data is made electronic, look to
increased use of simulation and modelling
software to predict ADME, ADR and toxicology
45The Future
- Greater integration
- More freeware and greater web-accessibility
- Greater use of text mining and machine learning
methods - Focus on predictions
Meta- bolomics
B I O I N F O R M A T I C S
Proteomics
Genomics
46Acknowledgements
- Anchi Guo (PDF)
- Murtaza Hassanali (student)
- Nelson Young (RA/Programmer)
- Haiyan Zhang (Programmer/Analyst)
- Bahram Habibi-Nazhad (PDF)
- Jennifer Woolsey (student)
- Chenomx Inc. (Edmonton)
- Genome Canada, NSERC