Title: Annotating genomes using ontologies
1Annotating genomes using ontologies
2(No Transcript)
3find all lactose metabolism genes
find all transmembrane receptors genes
what are the genes that are involved in cell
migration?
find all genes expressed during wing development
4(No Transcript)
5(No Transcript)
6- text search unreliable
- no integration with genomic data
- what about new genomes?
- Solution
- annotate using ontologies and controlled
vocabularies
7Outline
- What is an ontology?
- The Gene Ontology (GO)
- function, process and location
- the GO Graph
- The GO Annotation model
- Using GO to annotate your genome
8What is an ontology?
- A formal representation of some domain of
knowledge - (e.g. anatomy, disease, biochemistry)
- What are the different kinds of entity which
exist? - How are these entities related?
9GPCR activity
Combining with an extracellular or intracellular
messenger to initiate a change in cell activity
receptor activity
is_a
Combining with an extracellular or intracellular
messenger to initiate a change in cell activity,
and spanning to the membrane of either the cell
or an organelle
transmembrane receptor activity
is_a
A receptor that binds an extracellular ligand and
transmits the signal to a heterotrimeric
G-protein complex. These receptors are
characteristically seven-transmembrane receptors
and are made up of hetero- or homodimers
G-protein coupled receptor activity
10How are ontologies used in bioinformatics?
- Biological data (genes, gene products) can be
annotated using terms/classes from ontologies - Knowledge can be automatically transferred from
experiment-rich model organisms to newly
sequenced organisms, or to Homo sapiens
11OBO - Open Bio Ontologies
- Gene Ontology
- molecular function
- biological process
- cellular component
- Cell Ontology
- ChEBI - Chemical Ontology
- Anatomical Ontologies
- human, mouse, fly, worm, zebrafish
- Others
- Pathogen life cycle
- Pathway ontologies
- Sequence Ontology
- Phenotype and disease ontologies
- Organismal taxonomies
http//obo.sourceforge.net
12The Gene Ontology
- Since 1999
- Originally fly, mouse, yeast
- Currently
- 16,000 classes (terms)
- 6 million annotations covering a wide range of
eukaryotes and prokaryotes - D melanogaster - 11,000 out of 14,000 genes have
GO annotations
13The 3 Gene Ontologies
- Molecular Function elemental activity/task
- the tasks performed by individual gene products
examples are carbohydrate binding and ATPase
activity - Biological Process biological goal or objective
- broad biological goals, such as mitosis or purine
metabolism, that are accomplished by ordered
assemblies of molecular functions - Cellular Component location or complex
- subcellular structures, locations, and
macromolecular complexes examples include
nucleus, telomere, and RNA polymerase II
holoenzyme
14Molecular Function
- activities or jobs of a gene product
GO0004347 glucose-6-phosphate isomerase activity
15Molecular Function
- insulin binding
- GO0005009 insulin receptor activity
16Molecular Function
- GO0015238 drug transporter activity
17Molecular Function
Term id GO0015238 name drug
transporter activity namespace
molecular_function def "Enables the
directed movement of a drug into, out of, within
or between cells. A drug is any naturally
occurring or synthetic substance, other than a
nutrient, that, when administered or applied to
an organism, affects the structure or functioning
of the organism in particular, any such
substance used in the diagnosis, prevention, or
treatment of disease." ISBN0198506732 is_a
GO0005215 ! transporter activity
- GO0015238 drug transporter activity
18Molecular Function
- A gene product may have several functions a
function term refers to a single reaction or
activity, not a gene product. - Sets of functions make up a biological process.
19Biological Process
- a commonly recognized series of events
20Biological Process
21Biological Process
- GO0006111 regulation of gluconeogenesis
22Biological Process
- GO0035108 limb morphogenesis
23Biological Process
- GO0008049
- male courtship behavior
24Biological Process
Term id GO0008049 name male
courtship behavior namespace biological_process d
ef "The actions or reactions of a male, for
the purpose of attracting a sexual partner."
FBbf exact_synonym "male courtship behaviour"
is_a GO0007619 ! courtship behavior
GO0008049 male courtship behavior
25- What is the difference between molecular function
and biological process?
26Example Gene Product hammer
Function (what) Process (why) Drive nail (into
wood) Carpentry Drive stake (into soil)
Gardening Smash roach Pest Control Clowns
juggling object Entertainment
27Cellular Component
- where a gene product acts
28Cellular component
Image from http//microscopy.fsu.edu
29Cellular component
- A cell can be a part or a whole organism
Images from http//microscopy.fsu.edu
30Cellular Component
31No GO Areas
- GO covers normal functions and processes
- No pathological processes
- see the OBO Disease Ontology
- No experimental conditions
- see MGED
- NO evolutionary relationships
- NO gene products
- NOT a system of nomenclature
32Ontology Structure
cell membrane chloroplast mitochondr
ial chloroplast membrane
membrane
is-a part-of
33Ontology Relationships
Directed Acyclic Graph
http//www.ebi.ac.uk/ego
34Relations
- OBO Relations ontology
- is_a
- part_of
- derives_from
- located_in
-
- http//obo.sourceforge.net/relationship
35Synonyms
- GO classes (terms) represent types/kinds
- independent of language
- but language is important to the user
36Whats in a name?
37Whats in a name?
- Glucose synthesis
- Glucose biosynthesis
- Glucose formation
- Glucose anabolism
- Gluconeogenesis
- All refer to the process of making glucose from
simpler components
38Synonyms
- GO synonyms include alternative wordings,
spellings, and related concepts - Broader, narrower, exact or related
- Useful search aid
- name glucose transport
- exact_synonym gluco-hexose transport
- narrow_synonym glucose shuttling
39Downloading GO
- http//www.geneontology.org/GO.downloads.shtmlont
- OBO Format
- supported parsers exist for java, perl
- http//search.cpan.org/cmungall/go-perl/
40Annotation using GO
- GO Annotation
- assigning roles to genes and gene products using
Gene Ontology classes (terms) - gene products can have multiple roles
- Annotation can be manual (curation) or automated
41(No Transcript)
42(No Transcript)
43GO Annotation
- Database object
- gene or gene product
- GO term ID
- e.g. GO0003677
- Reference for annotation
- e.g. PubMed paper, BLAST results
- Evidence code
- from evidence code ontology
44GO Annotation
ISS Inferred from Sequence/Structural
Similarity IDA Inferred from Direct Assay IPI
Inferred from Physical Interaction TAS Traceabl
e Author Statement NAS Non-traceable Author
Statement IMP Inferred from Mutant
Phenotype IGI Inferred from Genetic
Interaction IEP Inferred from Expression
Pattern IC Inferred by Curator ND No Data
available IEA Inferred from electronic annotation
45Annotation file format
FB FBgn0046687 Tre1
GO0050909 FBFBrf0127919 NAS P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0008527 FBFBrf0127919
NAS F trapped in endoderm-1 FB
FBgn0046687 Tre1 GO0050916
FBFBrf0128502PMID10884225 IMP P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0050909
FBFBrf0128502PMID10884225 IMP P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0008527
FBFBrf0128502PMID10884225 IMP F
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0016021
FBFBrf0128502PMID10884225 NAS C
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0007186
FBFBrf0129744PMID10908591 ISS P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0016526
FBFBrf0129744PMID10908591 ISS F
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0016021
FBFBrf0129744PMID10908591 ISS C
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0016021
FBFBrf0138267PMID11566105 NAS C
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0008354
FBFBrf0167988PMID14691551 IMP P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0008354
FBFBrf0174508PMID15108810 TAS P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0007280
FBFBrf0179403PMID15196560 TAS P
trapped in endoderm-1
http//www.geneontology.org/GO.current.annotations
.shtml
46How do you annotate using GO?
- Manual curation
- from literature
- time consuming
- accurate
- Automated
- sequence similarity, motif prediction (interpro)
- scales to whole genomes
- inaccurate
47Annotation using blast
- Download GO annotations FASTA file
- http//www.godatabase.org/dev/database
- download file go_YYYYMM-seqdb.fasta
- peptide sequences of gene products annotated with
GO - GO annotation classes included in FASTA header
- can be used to transfer annotations
- or Use GOst
- http//www.godatabase.org/cgi-bin/gost/gost.cgi
- web interface to GO-Blast
48(No Transcript)
49(No Transcript)
50Annotation using interproscan
- Interpro is a database of protein families,
domains and functional sites - interproscan predicts these
- incorporates pfam, prints, panther,
- http//www.ebi.ac.uk/InterProScan
- Can be downloaded and run locally
- interpro IDs can be mapped to GO IDs
- all mappings to GO
- http//www.geneontology.org/GO.indices.shtml
51(No Transcript)
52(No Transcript)
53Other approaches
- Natural Language Processing and mining the
literature - http//www.gmod.org/textpresso.shtml
54Perils of automated annotation
- Pay attention to the evidence code
- Be aware of negative annotations
- use the graph structure
- small sequence changes can lead to big functional
differences
55GO SQL Database
- Can be queried online via AmiGO
- http//www.godatabase.org
- Can be downloaded as MySQL export
- supports powerful queries
- perl API
56Current challenges
- Integrating OBO ontologies
- erythrocyte differentiation
- regulation of smooth muscle contraction
- Complex annotations
- dynamically combining classes from different OBO
ontologies in meaningful ways - transepithelial migration of germ cells using
GPCR activity - Obol - a grammar for Bio-Ontologies
- http//www.fruitfly.org/cjm/obol
57Resources
- http//www.geneontology.org
- Main GO site
- http//www.godatabase.org
- Query the GO Database (web interface), GOst
- http//www.godatabase.org/dev
- Download database, fasta files, perl code
- http//obo.sourceforge.net
- Open Bio Ontologies