Open chemical dictionaries and ontologies for biosciences - PowerPoint PPT Presentation

1 / 96
About This Presentation
Title:

Open chemical dictionaries and ontologies for biosciences

Description:

Ontology or the science of something and of nothing, of being and ... Stereochemistry other than sp3 tetrahedral and sp2 trigonal planar. Polymers. Conformers ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 97
Provided by: kiril7
Category:

less

Transcript and Presenter's Notes

Title: Open chemical dictionaries and ontologies for biosciences


1
Open chemical dictionaries and ontologies for
biosciences
Kirill Degtyarenko, EMBL-EBI
2
The team
  • Rafael Alcántara
  • Michael Ashburner
  • Volker Ast
  • Sergio Contrino
  • Michael Darsow
  • Paula de Matos
  • Marcus Ennis
  • Janna Hastings
  • Alan McNaught
  • Martin Zbinden

3
Thanks
  • EU funding
  • Tamara Kulikova photo

4
What is EBI ?
5
EMBL-EBIThe European Bioinformatics Institute
  • We develop and provide
  • EMBL Nucleotide Sequence Database
  • UniProt (Swiss-Prot/TrEMBL/PIR)
  • InterPro
  • Macromolecular Structure Database
  • ENSEMBL
  • ArrayExpress

6
EMBL-EBIThe European Bioinformatics Institute
  • Is also home to
  • Gene Ontology editorial office http//www.geneonto
    logy.org/

7
What is an ontology?
8
What is an ontology?
9
Ontology definitions
  • Ontology the theory or study of being as such
    i.e., of the basic characteristics of all reality
    (Encyclopædia Britannica)
  • Ontology or the science of something and of
    nothing, of being and not-being, of the thing and
    the mode of the thing, of substance and accident
    (Gottfried Wilhelm Leibniz)
  • Ontology A formal definition of concepts
    (entities, relationships) of a given area of
    knowledge, described in a standardized form
    (Carugo Pongor, 2002)
  • An ontology is a specification of a
    conceptualization (Tom Gruber)
  • More cracking definitions from http//www.formalon
    tology.it/ !

10
Working definition
Ontology an explicit specification of some
topic which includes a vocabulary of terms
(names) with defined logical relationships to
each other. Jane Lomax, EBI
11
NCBI Taxonomy
Eukaryota Metazoa Chordata Craniata
Vertebrata Euteleostomi Archosauria
Aves Neognathae
Passeriformes Hirundinidae
Hirundo Hirundo rustica
Phylum ? Subphylum ?
Class ? Order ?
Family ? Genus ? Species ?
12
Enzyme Taxonomy
EC 2 Transferases EC 2.8 Transferring
sulfur-containing groups EC 2.8.2
Sulfotransferases EC 2.8.2.25 Flavonol
3-sulfotransferase
13
OBO
Open Biomedical Ontologies is an umbrella web
address for well-structured controlled
vocabularies for shared use across different
biological and medical domains http//obo.source
forge.net/
14
ChEBI What is it?
Chemical Entities of Biological Interest an
EBI database/dictionary of biochemical compounds
15
What are the biochemical compounds?
Can be defined as consisting of molecules not
directly encoded by the genome ... that are
either the products of nature or are synthetic
products used ... to intervene in the processes
of living organisms Michael Ashburner
16
Molecular entity
Any constitutionally or isotopically distinct
atom, molecule, ion, ion pair, radical, radical
ion, complex, conformer etc., identifiable as a
separately distinguishable entity IUPAC Gold
Book
17
In fact, ChEBI contains
  • Molecular entities
  • trans-vaccenic acid
  • Groups
  • trans-vaccenoyl group
  • Classes
  • fatty acids

18
Small molecules?
  • Yes, but big molecules as well!
  • alumina
  • amylose
  • metaborate
  • poly(vinyl alcohol)

19
1-D ChEBI
  • Numeric ID
  • Carefully checked terminology
  • Unambiguous ChEBI name
  • IUPAC names
  • Cross-references to free resources

20
Unambiguous ChEBI name
  • CHEBI28918
  • L-adrenaline
  • not just adrenaline

21
IUPAC name
  • 4-(1R)-1-hydroxy-2-(methylamino)ethylbenzene-1,
    2-diol

22
The Unpronounceables
CHEBI32902 gibberellin A4
IUPAC name (1R,2R,5R,8R,9S,10R,11S,12S)-12-hydr
oxy-11-methyl-6-methylidene-16-oxo-15-oxapentacycl
o9.3.2.15,8.01,10.02,8heptadecane-9-carboxylic
acid
23
Need for 2-D
  • Better to see the face than to hear the name
    (Zen proverb)
  • Structures and identifiers based on structures
    offer new ways of crosslinking to other databases
  • Our users desperately want it!

24
Connection table
ChEBI 9 10 0 0 0 0 999 V2000
11.8219 -7.2713 0.0000 C 0 0 0 0 0 0
0 0 0 0 0 0 11.8219 -8.0922 0.0000 C
0 0 0 0 0 0 0 0 0 0 0 0 12.6074
-7.0165 0.0000 N 0 0 0 0 0 0 0 0 0
0 0 0 11.1072 -6.8574 0.0000 C 0 0
0 0 0 0 0 0 0 0 0 0 12.6039 -8.3505
0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
11.1072 -8.5027 0.0000 N 0 0 0 0 0
0 0 0 0 0 0 0 13.0886 -7.6818
0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
10.3923 -7.2713 0.0000 N 0 0 0 0 0 0
0 0 0 0 0 0 10.3888 -8.0922 0.0000 C
0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0
0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0
0 2 5 1 0 0 0 0 2 6 1 0 0 0 0
3 7 1 0 0 0 0 4 8 2 0 0 0 0 6 9
2 0 0 0 0 5 7 2 0 0 0 0 8 9 1 0
0 0 0 M END
25
2-D ChEBI
  • One or more 2-D (or 3-D) connection tables
  • One is default
  • Autogenerated images (PNG)
  • Default diagrams should be unambiguous

26
Art of chemical drawing
(R)-camphor
ambiguous
unambiguous
27
From 2-D back to 1-D
  • SMILES
  • InChI

28
SMILES (1)
  • Simplified Molecular Input Line Entry
    Specification
  • Developed by David Weininger in 1988
  • Extended by others (e.g. Daylight)
  • String of standard ASCII characters
  • A number of valid SMILES can be produced for the
    same molecule

29
SMILES (2)
  • N1CNC2C1CNCN2
  • c1ncc2ncnc2n1
  • C1N\CN/C\2N/CN\C1/2
  • c1ncnc2/NC\Nc12
  • n1cc2c(nc1)ncn2
  • Hc1nc(H)c2n(H)c(H)nc2n1

30
InChI (1)
  • IUPAC International Chemical Identifier or InChI
  • Open source
  • Developed by Stein, Heller, Tchekhovskoi and
    McNaught
  • Used by NIST, PubChem, CML and ChEBI

31
InChI (2)
InChI1/C5H4N4/c1-4-5(8-2-6-1)9-3-7-4/h1-3H,(H,6,7
,8,9)
32
Limitations (1)
  • Stereochemistry other than sp3 tetrahedral and
    sp2 trigonal planar
  • Polymers
  • Conformers
  • Radicals/different spin state
  • Topological isomers
  • Mixtures
  • Markush structures

33
Limitations (2)
cisplatin
transplatin
InChI1/2ClH.2H3N.Pt/h21H21H3/q2/p-2
34
3-D ChEBI
cisplatin
35
ChEBI ontology
  • Molecular structure ontology
  • Subatomic particle ontology
  • Biological role ontology
  • Application ontology

36
L-adrenaline
  • Molecular structure ontology
  • catecholamines
  • Biological role ontology
  • hormone
  • Application ontology
  • antiglaucoma
  • bronchodilator
  • cardiostimulant

37
The family relations
L-cystein-S-yl
L-cysteine()
L-cysteine zwitterion
cysteine
D-cysteine
L-cysteino
L-cysteine
L-cysteinium
L-cysteinyl
L-cysteinate(1)
L-cysteine residue
L-cysteinate(2)
L-cysteinate residue
38
Relationships in ChEBI
39
Is A relationship
?
L-cysteine
cysteine
is a
40
Is Enantiomer Of
?
L-cysteine
D-cysteine
is enantiomer of
41
Is Tautomer Of
L-cysteine
L-cysteine zwitterion
42
Is Conjugate Acid Of
L-cysteinium
L-cysteinate(2)
L-cysteine
L-cysteinate(1)
is conjugate acid of
43
Is Conjugate Base Of
L-cysteinium
L-cysteinate(2)
L-cysteine
L-cysteinate(1)
44
Acid/base relationships
L-cysteinium
L-cysteinate(2)
?
?
L-cysteine
L-cysteinate(1)
45
Is Part Of
?
L-cysteinium
L-cysteine hydrochloride
is part of
46
Is Substituent Group From
L-cysteine


L-cysteinyl
L-cysteino


L-cysteine residue
47
Has Parent Hydride
is parent hydride of
H
benzene
1,2,3-trichlorobenzene
has parent hydride
48
Has Functional Parent
is functional parent of
F
L-cysteine
S-(4-bromophenyl)-L-cysteine
has functional parent
49
The family relations
L-cysteine()
L-cysteinium
L-cystein-S-yl
cysteine
L-cysteine zwitterion
L-cysteine
D-cysteine
L-cysteino
L-cysteinyl
L-cysteinate(1)
L-cysteine residue
L-cysteinate(2)
L-cysteinate residue
50
Ontology of L-cysteine
51
Ontology of L-cysteine (1)
52
Ontology of L-cysteine (2)
53
Current status (25.04.07)
54
Users of ChEBI
  • ArrayExpress
  • BIND
  • BioModels
  • ChemIDplus
  • Human Metabolite Database
  • KEGG COMPOUND
  • Reactome
  • Industry (Chenomx, Lion, etc.)

55
http//www.ebi.ac.uk/come/
  • Italian word come (how)
  • English word come (not GO)
  • Classification Of Metalloproteins
  • COfactors and Metals
  • COMplex proteins, etc.
  • Co-Ordination of Metals in proteins
  • Contrino and me

56
COMe version 5.1
  • Controlled vocabulary
  • 1376 protein classes (PRX)
  • 524 bioinorganic motifs (BIM)
  • 179 small molecules (MOL)
  • organised as
  • XML version (master)
  • Oracle version

57
COMe top of hierarchy
Complex proteins belong to at least one of three
groups
  • Metalloprotein
  • Organic prosthetic group protein
  • Modified amino acid protein

58
COMe entry PRX000552
59
Path to PRX000552
   complex protein  PRX000001 includes  
?  metalloprotein  PRX000002 includes     ?
 iron protein  PRX000004 includes       ?
 iron-sulphur protein  PRX000007
includes         ?  Fe(34)S4 protein  PRX000054
includes           ?  Fe4S4Cys4
protein  PRX000088 includes             ?
 Fe4S4/DMSO reductase-like  PRX000546
includes               ?  formate dehydrogenase
catalytic subunit  PRX000557 includes            
     ?  molybdenum formate dehydrogenase
catalytic subunit  PRX000733 includes            
       ?  formate dehydrogenase N, catalytic
subunit  PRX000552 includes
Instance ?  formate dehydrogenase,
nitrate-inducible, major subunit Escherichia
coli UniProtP24183
60
Molecule (MOL)
  • Controlled vocabulary of small molecular
    entities bound to complex proteins
  • Cross-references to (bio)chemical resources
    chemPDB , NIST Chemistry Webbook, LIGAND, RESID
  • In future ChEBI

61
COMe entry MOL000015
62
Bioinorganic motif (BIM)
  • A common structural feature of a class of
    functionally related, but not necessarily
    homologous, proteins, that includes the metal
    atom(s) and first coordination shell ligands
  • Degtyarenko (2000) Bioinformatics 16, 851864

63
Example BIM000027
T-4
Fe(SG.Cys)4
64
Example BIM000056
(Fe2S2)(ND.His)2(SG.Cys)2
65
Example BIM000061
Fe4(µ3-S)4(OD.Asp)(SG.Cys)3
66
Fe4S4sirohaem centre (1)
MOL000131 Fe4S4
67
Fe4S4sirohaem centre (2)
BIM000008 Fe4(µ3-S)4(SG.Cys)4
68
Fe4S4sirohaem centre (3)
BIM000026 (Fe4S4)(SG.Cys)3Fe(por)µ-(SG.Cys)

69
Relationships in COMe
  • IsA inherits all attributes
  • PRX to PRX
  • cytochrome c IsA cytochrome
  • Is_Part_Of no inheritance
  • BIM to BIM MOL to MOL MOL to BIM BIM to
    PRX
  • Fe(por)(NE.His)2 Is_Part_Of cytochrome b5
  • Is_Bound_To no inheritance
  • MOL to PRX
  • haem b Is_Bound_To cytochrome c

70
Paths to PRX000552
71
Physico-chemical ontology ???
  • Physico-chemical property
  • Physico-chemical method
  • Available at OBO web site (http//obo.sourceforge.
    net/)

72
Molecular entity has
  • Mass (molecular weight)
  • Size
  • Shape
  • Charge
  • Structure
  • One can derive many properties from known
    complete structure
  • Spectra
  • ?

73
Relationships in FIX
  • IsA
  • Raman spectroscopy IsA vibrational spectroscopy
  • Is_Part_Of
  • phasing method Is_Part_Of crystallography
  • Can_Be_Determined_By
  • molecular structure Can_Be_Determined_By
    crystallography

74
Molecular Property vs Method
Heat capacity Mass Net charge Shape Size Structur
e Geometry Connectivity Topography
Calorimetry Centrifugation Crystallography Electro
phoresis Isotope method Mass spectrometry Microsco
py Spectroscopy
75
A snapshot of FIX (1)
76
A snapshot of FIX (2)
77
A snapshot of FIX (3)
78
Physico-chemical process (REX)
  • IUPAC definitions (if available)
  • Macroscopic and microscopic processes
  • Available at OBO web site

79
Biochemical reactions (1)
  • Enzymatic reactions
  • Non-enzymatic reactions

80
Biochemical reactions (2)
  • Catalytic Catalyst
  • Enzymatic protein
  • Abzymatic antibody
  • Deoxyribozymatic DNA
  • Ribozymatic RNA
  • Heterogeneous surface (e.g. metal)
  • Homogeneous solute (e.g. metal)
  • Non-catalytic
  • Photoinduced
  • Spontaneous

81
Biochemical reactions (3)
  • Biotransformation
  • A B ? C D (A, B, C, D small molecules)
  • Binding
  • A M ? AM (M macromolecule)
  • Molecular transport
  • A(compartment X) ? A(compartment Y)
  • Electron and exciton transfer reactions
  • Conformation change (e.g. folding)

82
Relationships in REX
  • IsA
  • redox reaction IsA chemical reaction
  • Is_Part_Of
  • photoexcitation Is_Part_Of photoabsorption
  • Is_Reverse_Of
  • associative desorption Is_Reverse_Of
    dissociative adsorption
  • Not DAG!

83
A snapshot of REX (1)
84
A snapshot of REX (2)
85
A snapshot of REX (3)
86
Users
  • ChEBI, FIX, REX
  • Oscar3 (University of Cambridge)
  • ProjectProspect (Royal Society of Chemistry)
  • Gene Ontology
  • ChEBI REX
  • Coming soon IntEnz
  • COMe
  • InterPro

87
Summary
  • Ontologies provide controlled vocabulary
    organised as a directed graph
  • ChEBI standard terminology and structure of
    (bio)chemical compounds
  • COMe ontology for bioinorganic proteins
  • FIX controlled vocabulary for physico-chemical
    properties and methods
  • REX controlled vocabulary for physico-chemical
    processes

88
Links to remember
http//www.ebi.ac.uk/chebi/ http//www.ebi.ac.uk/c
ome/ http//www.ebi.ac.uk/kirill/FIX/ http//www
.ebi.ac.uk/kirill/REX/
89
GrazieThank you???????
90
Future plans
  • Diversity of biochemical reactions and
    mechanistic aspects of enzymatic catalysis
  • Development of database for quantitative
    properties of functional centres in
    metalloproteins and other complex proteins
  • Further development of ontology of complex
    proteins based on the concept of bioinorganic
    motif

91
Diversity of biochemical reactions
  • Unambiguous chemical representation of reactions
  • Further development of REX
  • Collaboration with IntEnz, MACiE

92
Quantitative properties of functional centres
  • From qualitative to quantitative annotation
  • Proteins utilise and modulate properties of
    non-peptide groups
  • Redox potentials
  • Absorption maxima (better, spectra)
  • Dissociation constants
  • Collaboration with experimentalists (here?)

93
Metalloproteins
  • Further development of COMe
  • Annotation and prediction of metal-binding sites
  • Selectivity and specificity of metalloprotein
    cofactors
  • Collaborations (University of Edinburgh, Academia
    Sinica)

94
Metal-binding sites in proteins
  • Position-specific annotation of experimentally
    determined and inferrable metal-binding sites in
    the UniProt and closely connected resources (e.g.
    InterPro)
  • Automation of major annotation steps to ensure
    the sustainability of this feature with minimum
    maintenance/curation in the future
  • Development of prediction methodology for protein
    metal binding sites from multiple sequence
    alignments

95
Selectivity and specificity of metalloprotein
cofactors
  • Two separate problems binding selectivity and
    catalytic efficiency
  • The database of experimentally defined
    qualitative and quantitative data for
    protein-metal interactions (coordination
    geometry, binding constants, redox potentials)
  • Density functional theory (DFT)
  • Continuum dielectric methods (CDM)

96
Computational metallomics
  • Metallomics comprehensive analysis of the
    entirety of metal and metalloid species within a
    cell or tissue type
  • A branch of metabolomics
  • Enzymatic reactions, transport phenomena, metal
    targeting (metallochaperones) and non-catalytic
    reactions
  • The database of metal metabolism and transport
    pathways
Write a Comment
User Comments (0)
About PowerShow.com