Title: Using CDK applications in the BioMeta database
1Using CDK applicationsin the BioMeta database
CDK Workshop 2007 Jan 29 Feb 2, 2007 Cologne
2 Concept
M.A. Ott
Creating the right chemistry in Pathways
- Enzymes dont do new chemistrythey do old
chemistry fasterEnzymes dont perform
magicthey perform known (organic) chemistry
3 Contents
M.A. Ott
Creating the right chemistry in Pathways
- Introduction
- BioMeta database
- Molecular structure validation
- Substructure searchingConclusion
4 Ongoing progess in omics
M.A. Ott
Creating the right chemistry in Pathways
- Genomics going strong
- Transcriptomics good progress
- Proteomics some progress
- Metabolomics less progress
- Systems biology not in ten years
5 Metabolite and pathway databases
M.A. Ott
Creating the right chemistry in Pathways
http//www.genome.ad.jp/kegg/ http//www.ebi.ac.uk
/chebi/ http//www.brenda.uni-koeln.de/ http//www
.expasy.org/tools/pathways/ http//www.empproject.
com/ http//metacyc.org/ http//ecocyc.org/ http/
/www.ncgr.org/pathdb/ http//medicago.vbi.vt.edu/B
-Net/ http//www.mol-net.de/biopath/ http//www.er
go-light.com/ERGO/ http//www.reactome.org/ http/
/www.chem.qmul.ac.uk/iubmb/enzyme/
Kyoto Encyclopedia of Genes and Genomes Chemical
Entities of Biological Interest BRENDA enzyme
database ExPASy biochemical pathway EMP
enzymology and metabolism data MetaCyc metabolic
pathways EcoCyc E. coli genome and
metabolism PathDB biological pathways and
networks B-NET - Biochemical Network BioPath
C_at_ROL database ERGO comprehensive genome
analysis Reactome biological processes IUBMB
enzymes
6 Metabolism and friends
M.A. Ott
Creating the right chemistry in Pathways
- Metabolism The reactions in organisms
- Anabolism Synthesis of larger biomolecules
from smaller ones Usually requires
energy input - Catabolism Break down of larger molecules
into smaller ones Usually releases energy
7 Enzymes
M.A. Ott
Creating the right chemistry in Pathways
- Virtually all metabolic reactions require
enzymatic catalysis - Rate enhancement factor can be up to 1015
- Enzymes are highly reaction-specific,
catalyzing only one conversion - Enzymes are substrate-specific, limiting their
action to only one compound or to related
compounds
8 The Metabolic Network
M.A. Ott
Creating the right chemistry in Pathways
9 Chemistry in Bioinformatics
M.A. Ott
Creating the right chemistry in Pathways
- Molecular structures are illustrations
rather than data in their own right - However, many research areas require
structures - Drug design ligand docking
- Metabolic network reconstruction -
Systems biology - A need for accurate molecular (2D!) structure
data
10 Some metabolites
M.A. Ott
Creating the right chemistry in Pathways
- Housekeeping metabolitesWater, NADH, ATP,
Phosphate, - End products of pathwaysVitamin B12, Urea,
Penicillin, Ecdysone, - Intermediates in pathwaysGlucose-6-P,
Lanosterol, Precorrins, Shikimate,
11 Housekeeping metabolites ATP / ADP
M.A. Ott
Creating the right chemistry in Pathways
Full structures
Shorthand
12 Housekeeping metabolites NADH / NAD
M.A. Ott
Creating the right chemistry in Pathways
Full structures
Shorthand
13 Housekeeping metabolites and chemical
distance
M.A. Ott
Creating the right chemistry in Pathways
What is the chemical distance between B and D?
14 Important housekeeping metabolites
M.A. Ott
Creating the right chemistry in Pathways
Water Ammonia Phosphate ATP, ADP NADH,
NAD NADPH, NADP FAD, FADH2 FMN, FMNH2 SAM,
SAH Coenzyme A Glutamate/2-Oxoglutarate
Phosphorylation Reduction/oxidation Reductio
n/oxidation Reduction/oxidation Reduction/oxidati
on Methyl group transfer Acyl group
transfer Amino/ketone group exchange
15 The BioMeta Database
M.A. Ott
Creating the right chemistry in Pathways
16 The BioMeta Database
M.A. Ott
Creating the right chemistry in Pathways
17 BioMeta database structure
M.A. Ott
Creating the right chemistry in Pathways
18 BioMeta database structure
M.A. Ott
Creating the right chemistry in Pathways
Reaction 1 Mol_1 Mol_2 lt-gt Mol_3
Mol_4 Reaction 2 Mol_2 Mol_3 -gt Mol_5
Mol_6 Reaction 3 2 Mol_6 -gt
Mol_7
R-M Links
Molecules
Reactions
19 Sample database query
M.A. Ott
Creating the right chemistry in Pathways
- ec_nr enzyme_name rxn_id rev role
stoich name - ---------------------------------------------
------------------ - 1.2.3.4 oxalate oxidase MR000247 ir s
1 Oxalic acid - ,, ,, ,, ,, s
1 O2 - ,, ,, ,, ,, p
1 H2O2 - ,, ,, ,, ,, p
2 CO2
20 Accuracy required!
M.A. Ott
Creating the right chemistry in Pathways
- Without accurate chemical structures and
balancedreaction descriptions, a biochemical
pathways database is unsuitable for chemical
computation
- Reaction imbalances cause incorrect fluxes
- Duplicate molecules cause metabolic networksto
be incomplete - Docking applications require correct
stereochemistry
21 Molecular structure validation
M.A. Ott
Creating the right chemistry in Pathways
- Valence checking
- Rings and aromaticity detection
- Calculation of molecular formula,weight, and
exact mass - Stereochemistry detection
- Canonicalization
- Calculation of canonical string identifiers
22 Molecular structure identification
M.A. Ott
Creating the right chemistry in Pathways
- Mesomerism (resonance forms)
- Tautomerism (variable positions of hydrogens)
- Other chemical flexibility
- Protonation state
23 Molecular structures and reactions
M.A. Ott
Creating the right chemistry in Pathways
Carbon imbalance
Direction
Phosphorus imbalance
No stereochemistry
No stereochemistry
- Reactions balancing atoms charges
- adding direction/reversibility
- Molecules adding stereochemical configurations
- canonicalizing tautomeric form
24 And now whats really going on
M.A. Ott
Creating the right chemistry in Pathways
25 Validation results
(structures)
M.A. Ott
Creating the right chemistry in Pathways
26 Validation results
(reaction balance)
M.A. Ott
Creating the right chemistry in Pathways
27 Some unresolved issues
M.A. Ott
Creating the right chemistry in Pathways
28 Substructure search
M.A. Ott
Creating the right chemistry in Pathways
- Two CDK applications are used
- FingerPrinter to assist in
pre-screening - SDFSubstructureFinder actual substructure search
29 Substructure search prescreen
M.A. Ott
Creating the right chemistry in Pathways
- Query substructure MolFile and SMILES
- Calculation of molecular formula and of rings
- FingerPrinter yields 1024-bit fingerprint
- Values are used in SQL query
- SELECT molecules. FROM molecules, mol_elem me_0,
mol_elem me_1 - WHERE str_type 1
- AND ring_count gt 1
- AND molecules.mol_nr me_0.mol_nr AND
me_0.atomic_nr 6 AND me_0.coeff gt 4 - AND molecules.mol_nr me_1.mol_nr AND
me_1.atomic_nr 16 AND me_1.coeff gt 0 - AND fingerprint '0000110100...011101000'
'0000110100...011101000'
30 Substructure search final match
M.A. Ott
Creating the right chemistry in Pathways
- Database size 13,242 entries
- Prescreening yields a list of entries (422)
- An SDFile is written from their molfiles
- SDFSubstructureFinder is run using query
SMILES string (C1CCSC1) - Final result 14 hits
31 Preliminary evaluation
M.A. Ott
Creating the right chemistry in Pathways
FingerPrinter can be surprisingly slow!Timeouts
are encountered for these structures
32 Preliminary evaluation (I)
M.A. Ott
Creating the right chemistry in Pathways
FingerPrinter can be surprisingly slow!Timeouts
are encountered for these structures
33 Preliminary evaluation (II)
M.A. Ott
Creating the right chemistry in Pathways
Presently SDFSubstructureFinderdoes not treat
aromatic bonds
34 BioMeta database
M.A. Ott
Creating the right chemistry in Pathways
- Based on freely available data (KEGG Ligand
database) - Augmentation, completion and correction of
small-molecule information - Correcting
wrong structures - Adding missing
stereochemistry to structures - Balancing atoms
and charges in reactions - Standardizing
structure and reaction representation
35 BioMeta things done/being done
M.A. Ott
Creating the right chemistry in Pathways
- Correcting compound structures and reactions
- Generating 3D models (using Corina)
- Judging directionality/reversibility of reactions
- Distinguishing cofactors/housekeeping/
common metabolites from real metabolites - Increasing functionality of WWW interface
http//biometa.cmbi.ru.nl/
36 BioMeta future directions
M.A. Ott
Creating the right chemistry in Pathways
- Tautomer handling (through using InChI strings)
- Identifying merging duplicate entries
- Atom-to-atom correspondence (reaction mapping)
- Putative/hypothetical reactions for predicting
metabolism
37 Acknowledgements
M.A. Ott
Creating the right chemistry in Pathways
- Felix van Diggelen
- Gert Vriend
- CMBI