Title: GO in the coming months
1Ontologies for Informatics . Infrastructure for
Systems Biology . Oxford October 19 2004
2To provide structured controlled vocabularies
for the representation of biological
knowledge in biological databases.
3- Manifesto of Liberation Bioinformatics
- Be open source
- Use open standards
- Make data code available without constraint
- Involve your community
4Gene Ontology - 1998
FlyBase Drosophila Cambridge, EBI,
Harvard Berkeley Bloomington. SGD Sacchar
omyces Stanford. MGI Mus Jackson Labs., Bar
Harbor.
5Gene Ontology - 2004
- Fruitfly - FlyBase
- Budding yeast - Saccharomyces Genome Database
(SGD) - Mouse - Mouse Genome Database (MGD GXD)
- Rat - Rat Genome Database (RGD)
- Weed - The Arabidopsis Information Resource
(TAIR) - Worm - WormBase
- Dictyostelium discoidem - Dictybase
- InterPro/UniProt at EBI - InterPro
- Fission yeast - Pombase
- Human - UniProt, Ensembl, NCBI, Incyte, Celera,
Compugen - Parasites - Plasmodium, Trypanosoma, Leishmania -
GeneDB - Sanger - Microbes - Vibrio, Shewanella, B. anthracus, -
TIGR - Grasses - rice maize - Gramene database
- zebra fish - Zfin
- Coming Xenopus, Chlamydomonas, Tetrahymena,
Gallus more.
6GOThree (Orthogonal) Ontologies
- Biological Process
- Goal or objective within cell, tissue ..
- Molecular Function
- Elemental activity or task
- Cellular Component
- location or complex
7Content of GO
- molecular function 7422 terms
- biological process 8972 terms
- cellular component 1472 terms
- all 17,866 terms
- definitions 16,600 (93)
8What is the least complex data structure that is
sufficient?
What data structure to use ?
- Key word list?
- Hierarchical tree?
- Directed acyclic graph?
- Other?
9Directed Acyclic Graph
tree directed acyclic
graph
10Classes of parent-child relationship
- ISA (hypernomy/hyponomy)
- as in an elephant is a mammal
- PARTOF (meronomy/holonomy)
- as in a trunk is part of an elephant
- REGULATES
- carbohydrate metabolism
- regulates regulation of carbohydrate metabolism
11Structure of the Ontologies
Cellular component membrane vacuolar
membrane nuclear membrane intracellular
cell ltcytoplasm ltvacuole ltvacuolar
membrane ltvacuolar lumen ltnucleus
ltnuclear membrane
ISA () PARTOf (lt)
12 GO terms are defined
have unique ids
term chloroplast go_id GO0009507 definition
A chlorophyll-containing plastid with
thylakoids organized into grana and frets, or
stroma thylakoids, and embedded in a
stroma. definition_reference ISBN0471245208 t
erm ketone catabolism goid GO0042182 definiti
on The breakdown into simpler components of
ketones, a class of organic compounds that
contain the carbonyl group, CO, and in which the
carbonyl group is bonded only to carbon atoms.
The general formula for a ketone is RCOR, where
R and R are alkyl or aryl groups. definition_refe
rence GOcurators
13Annotation of GO terms to gene products
- literature curation
- Inferred from Mutant Phenotype
- Inferred from Direct Assay
- Inferred from Genetic Interaction
- Inferred from Physical Interaction
- Inferred from Expression Pattern
- Traceable Author Statement
- Non-traceable Author Statement.
- homologies
- Inferred from Sequence Similarity
- computed annotation
- Inferred from Electronic Annotation
14GO Gene Association Tables Herpes
viruses Vibrio cholerae, B. anthracis, Coxiella
burnetii, Pseudomonas syringae, Shewanella
oneidensis Dictyostelium discoidem
Saccharomyces cerevisiae, Schizosaccharomyces
pombe Trypanosoma brucei, Leishmania major,
Plasmodium falciparum Caenorhabditis
elegans Drosophila melanogaster, Glossina
morsitans Danio rerio Mus domesticus, Rattus
norvegicus, Homo sapiens bioinformaticus Arabidop
sis thaliana, Oryza sativa
15go/gene_associations/
FB FBgn0015567 agr-Adaptin GO0005886 FBFBrf00
93110PMID9118220 IDA C FB FBgn0015567 agr-Ada
ptin GO0007269 FBFBrf0108281PMID10218159 NAS
P FB FBgn0015567 agr-Adaptin GO0016192 FBFBr
f0124164 NAS P FB FBgn0015567 agr-Adaptin GO
0030122 FBFBrf0115359 NAS C FB FBgn0015567 agr
-Adaptin GO0030122 FBFBrf0124164 NAS C FB FB
gn0015567 agr-Adaptin GO0006901 FBFBrf0108281
PMID10218159 TAS P FB FBgn0015567 agr-Adaptin
GO0008021 FBFBrf0108281PMID10218159 TAS C F
B FBgn0015567 agr-Adaptin GO0016181 FBFBrf014
1528PMID11697879 TAS P FB FBgn0015567 agr-Ada
ptin GO0016183 FBFBrf0108281PMID10218159 TAS
P FB FBgn0015567 agr-Adaptin GO0030135 FBFBr
f0108281PMID10218159 TAS C FB FBgn0010215 agr
-Cat GO0003779 FBFBrf0132100 ISS F FB FBgn001
0215 agr-Cat GO0007016 FBFBrf0129868PMID109
08592 ISS P FB FBgn0010215 agr-Cat GO0008092
FBFBrf0132100 ISS F FB FBgn0010215 agr-Cat G
O0016342 FBFBrf0129868PMID10908592 ISS C FB F
Bgn0010215 agr-Cat GO0016343 FBFBrf0129868PM
ID10908592 ISS F FB FBgn0010215 agr-Cat GO00
05912 FBFBrf0151280PMID12147138 NAS C SGD
S0004660 AAC1 GO0005743
SGD_REF12031PMID2167309 TAS
C SGD S0004660 AAC1
GO0006854 SGD_REF12031PMID2167309 IDA
P SGD S0004660 AAC1
GO0005471 SGD_REF12031PMID2167309
IDA F SGD S0000289 AAC3
GO0005743 SGD_REF13606PMID191584
2 TAS C SGD S0000289
AAC3 GO0006854
SGD_REF13606PMID1915842 IMP P
SGD S0000289 AAC3
GO0005471 SGD_REF13606PMID19158 42
IMP F ADP/ATP translocator
YBR085WANC3 gene taxid4932 20010213
SGD
16 Curated GO Annotations
1.12.2001 1.12.2003 Gene products
42421 253962 GO terms 4262 7741
17Expression studies Human ontogenic tumor gene
expression Human breast cancer gene
expression Human endothelial cell gene
expression Human fibrosarcoma cell
cDNAs Human osteoblast progenitor cell gene
expression Human fibrosarcoma cell gene
expression Mouse cDNAs - FANTOM/FANTOM2
Projects Mouse lung gene expression Mouse
dendritic cell gene expression Mouse hepatic
and hippocampal gene expression Mouse liver
tumor gene expression Drosophila gene
expression during aging Drosophila embryo gene
expression Affymetrix Probe Sets Protein
annotation Vertebrate nuclear proteins Human
GPCR proteins Mouse proteome PANTHER protein
families EST collections Cattle ESTs, Pig ESTs,
Dog ESTs Paracoccidioides brasiliensis
ESTs Plasmodium falciparum ESTs Honey bee
ETSs Schizophyllum commune ESTs Meloidogyne
incognita ESTs Plasmodium vivax
ESTs Amblyomma variegatum ETSs Genomic
annotation Drosophila melanogaster
genome Caenorhabditis briggsae
genome Anopheles gambiae genome Schizosaccharo
myces pombe genome Plasmodium yoelli
genome Plasmodium falciparum genome Dictyostel
ium genome Rice genome Plant alternatively
spliced genes Human pseudogenes
http//www.geneontology.org/GO.biblio.
html
18Database annotations
SGD Dwight et al. 2002
19Annotation summaries
Meloidogyne incognita McCarter et al. 2003
20The combinatorial nightmare
21Combinatoric explosion
Regulation
Negative or Positive
Process
Body part
Induction
2 2 ( of processes - 2)
2 2 ( of processes - 2) ( of body parts)
22(No Transcript)
23(No Transcript)
24OBOL - Open Biological Ontologies Language
Chris Mungall
25The OBOL System
- Approach annotation-time term composition vs
tools for maintenance of large directed acyclic
graphs - Requires new generalization hierarchies
- Term decomposition using grammars
- Generating computable logical definitions
- Using logical definitions term creation and
error checking
26A A Formal Grammar for OBO terms Formal Grammar
for OBO terms
A Formal Grammar for OBO terms
- All GO terms are NOUN-PHRASES
- A NOUN-PHRASE is (recursively) made from
- a NOUN (includes inflected verbs eg binding)
- an ADJECTIVE followed by a NOUN-PHRASE
- a NOUN-PHRASE preceeded by a NOUN-PHRASE acting
as ADJECTIVE eg clathrin coat - a NOUN-PHRASE then PREPOSITION then NOUN-PHRASE
eg regulation of transcription - an (optional) NOUN-PHRASE then a RELATIONAL
ADJECTIVE then a NOUN-PHRASE eg clathrin-coated
vesicle - Precedence rules are also required to prune parse
forest
27- Gene Ontology Software
- Browsers - Amigo
- Database - mySQL
- Editor - DAG-EDIT
-
- geneontology.sourceforge.net
- Third party software (e.g. Spotfire TreeMap
GoFish FatiGO) -
-
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32- OBO-Edit - a powerful editor for directed
- acyclic graphs.
- data adaptors
- multiple edits on same graph
- define your own relationship types
- plug in architecture - e.g. add an external
- in-line dictionary
33(No Transcript)
34The importance of community feedback Everyone
can suggest new terms for GO and tell us what
errors we have made. geneontology.sourceforge.n
et