Annotating genomes using ontologies - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Annotating genomes using ontologies

Description:

find all genes expressed during wing development. text ... Smash roach Pest Control. Clown's juggling object Entertainment. Example: Gene Product = hammer ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 58
Provided by: chris1009
Category:

less

Transcript and Presenter's Notes

Title: Annotating genomes using ontologies


1
Annotating genomes using ontologies
  • BioE131/231

2
(No Transcript)
3
find all lactose metabolism genes
find all transmembrane receptors genes
what are the genes that are involved in cell
migration?
find all genes expressed during wing development
4
(No Transcript)
5
(No Transcript)
6
  • text search unreliable
  • no integration with genomic data
  • what about new genomes?
  • Solution
  • annotate using ontologies and controlled
    vocabularies

7
Outline
  • What is an ontology?
  • The Gene Ontology (GO)
  • function, process and location
  • the GO Graph
  • The GO Annotation model
  • Using GO to annotate your genome

8
What is an ontology?
  • A formal representation of some domain of
    knowledge
  • (e.g. anatomy, disease, biochemistry)
  • What are the different kinds of entity which
    exist?
  • How are these entities related?

9
GPCR activity
Combining with an extracellular or intracellular
messenger to initiate a change in cell activity
receptor activity
is_a
Combining with an extracellular or intracellular
messenger to initiate a change in cell activity,
and spanning to the membrane of either the cell
or an organelle
transmembrane receptor activity
is_a
A receptor that binds an extracellular ligand and
transmits the signal to a heterotrimeric
G-protein complex. These receptors are
characteristically seven-transmembrane receptors
and are made up of hetero- or homodimers
G-protein coupled receptor activity
10
How are ontologies used in bioinformatics?
  • Biological data (genes, gene products) can be
    annotated using terms/classes from ontologies
  • Knowledge can be automatically transferred from
    experiment-rich model organisms to newly
    sequenced organisms, or to Homo sapiens

11
OBO - Open Bio Ontologies
  • Gene Ontology
  • molecular function
  • biological process
  • cellular component
  • Cell Ontology
  • ChEBI - Chemical Ontology
  • Anatomical Ontologies
  • human, mouse, fly, worm, zebrafish
  • Others
  • Pathogen life cycle
  • Pathway ontologies
  • Sequence Ontology
  • Phenotype and disease ontologies
  • Organismal taxonomies

http//obo.sourceforge.net
12
The Gene Ontology
  • Since 1999
  • Originally fly, mouse, yeast
  • Currently
  • 16,000 classes (terms)
  • 6 million annotations covering a wide range of
    eukaryotes and prokaryotes
  • D melanogaster - 11,000 out of 14,000 genes have
    GO annotations

13
The 3 Gene Ontologies
  • Molecular Function elemental activity/task
  • the tasks performed by individual gene products
    examples are carbohydrate binding and ATPase
    activity
  • Biological Process biological goal or objective
  • broad biological goals, such as mitosis or purine
    metabolism, that are accomplished by ordered
    assemblies of molecular functions
  • Cellular Component location or complex
  • subcellular structures, locations, and
    macromolecular complexes examples include
    nucleus, telomere, and RNA polymerase II
    holoenzyme

14
Molecular Function
  • activities or jobs of a gene product

GO0004347 glucose-6-phosphate isomerase activity
15
Molecular Function
  • insulin binding
  • GO0005009 insulin receptor activity

16
Molecular Function
  • GO0015238 drug transporter activity

17
Molecular Function
Term id GO0015238 name drug
transporter activity namespace
molecular_function def "Enables the
directed movement of a drug into, out of, within
or between cells. A drug is any naturally
occurring or synthetic substance, other than a
nutrient, that, when administered or applied to
an organism, affects the structure or functioning
of the organism in particular, any such
substance used in the diagnosis, prevention, or
treatment of disease." ISBN0198506732 is_a
GO0005215 ! transporter activity
  • GO0015238 drug transporter activity

18
Molecular Function
  • A gene product may have several functions a
    function term refers to a single reaction or
    activity, not a gene product.
  • Sets of functions make up a biological process.

19
Biological Process
  • a commonly recognized series of events

20
Biological Process
  • GO0006350 transcription

21
Biological Process
  • GO0006111 regulation of gluconeogenesis

22
Biological Process
  • GO0035108 limb morphogenesis

23
Biological Process
  • GO0008049
  • male courtship behavior

24
Biological Process
Term id GO0008049 name male
courtship behavior namespace biological_process d
ef "The actions or reactions of a male, for
the purpose of attracting a sexual partner."
FBbf exact_synonym "male courtship behaviour"
is_a GO0007619 ! courtship behavior
GO0008049 male courtship behavior
25
  • What is the difference between molecular function
    and biological process?

26
Example Gene Product hammer
Function (what) Process (why) Drive nail (into
wood) Carpentry Drive stake (into soil)
Gardening Smash roach Pest Control Clowns
juggling object Entertainment
27
Cellular Component
  • where a gene product acts

28
Cellular component
Image from http//microscopy.fsu.edu
29
Cellular component
  • A cell can be a part or a whole organism

Images from http//microscopy.fsu.edu
30
Cellular Component
31
No GO Areas
  • GO covers normal functions and processes
  • No pathological processes
  • see the OBO Disease Ontology
  • No experimental conditions
  • see MGED
  • NO evolutionary relationships
  • NO gene products
  • NOT a system of nomenclature

32
Ontology Structure
cell membrane chloroplast mitochondr
ial chloroplast membrane
membrane
is-a part-of
33
Ontology Relationships
Directed Acyclic Graph
http//www.ebi.ac.uk/ego
34
Relations
  • OBO Relations ontology
  • is_a
  • part_of
  • derives_from
  • located_in
  • http//obo.sourceforge.net/relationship

35
Synonyms
  • GO classes (terms) represent types/kinds
  • independent of language
  • but language is important to the user

36
Whats in a name?
37
Whats in a name?
  • Glucose synthesis
  • Glucose biosynthesis
  • Glucose formation
  • Glucose anabolism
  • Gluconeogenesis
  • All refer to the process of making glucose from
    simpler components

38
Synonyms
  • GO synonyms include alternative wordings,
    spellings, and related concepts
  • Broader, narrower, exact or related
  • Useful search aid
  • name glucose transport
  • exact_synonym gluco-hexose transport
  • narrow_synonym glucose shuttling

39
Downloading GO
  • http//www.geneontology.org/GO.downloads.shtmlont
  • OBO Format
  • supported parsers exist for java, perl
  • http//search.cpan.org/cmungall/go-perl/

40
Annotation using GO
  • GO Annotation
  • assigning roles to genes and gene products using
    Gene Ontology classes (terms)
  • gene products can have multiple roles
  • Annotation can be manual (curation) or automated

41
(No Transcript)
42
(No Transcript)
43
GO Annotation
  • Database object
  • gene or gene product
  • GO term ID
  • e.g. GO0003677
  • Reference for annotation
  • e.g. PubMed paper, BLAST results
  • Evidence code
  • from evidence code ontology

44
GO Annotation
ISS Inferred from Sequence/Structural
Similarity IDA Inferred from Direct Assay IPI
Inferred from Physical Interaction TAS Traceabl
e Author Statement NAS Non-traceable Author
Statement IMP Inferred from Mutant
Phenotype IGI Inferred from Genetic
Interaction IEP Inferred from Expression
Pattern IC Inferred by Curator ND No Data
available IEA Inferred from electronic annotation
45
Annotation file format
FB FBgn0046687 Tre1
GO0050909 FBFBrf0127919 NAS P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0008527 FBFBrf0127919
NAS F trapped in endoderm-1 FB
FBgn0046687 Tre1 GO0050916
FBFBrf0128502PMID10884225 IMP P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0050909
FBFBrf0128502PMID10884225 IMP P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0008527
FBFBrf0128502PMID10884225 IMP F
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0016021
FBFBrf0128502PMID10884225 NAS C
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0007186
FBFBrf0129744PMID10908591 ISS P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0016526
FBFBrf0129744PMID10908591 ISS F
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0016021
FBFBrf0129744PMID10908591 ISS C
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0016021
FBFBrf0138267PMID11566105 NAS C
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0008354
FBFBrf0167988PMID14691551 IMP P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0008354
FBFBrf0174508PMID15108810 TAS P
trapped in endoderm-1 FB FBgn0046687
Tre1 GO0007280
FBFBrf0179403PMID15196560 TAS P
trapped in endoderm-1
http//www.geneontology.org/GO.current.annotations
.shtml
46
How do you annotate using GO?
  • Manual curation
  • from literature
  • time consuming
  • accurate
  • Automated
  • sequence similarity, motif prediction (interpro)
  • scales to whole genomes
  • inaccurate

47
Annotation using blast
  • Download GO annotations FASTA file
  • http//www.godatabase.org/dev/database
  • download file go_YYYYMM-seqdb.fasta
  • peptide sequences of gene products annotated with
    GO
  • GO annotation classes included in FASTA header
  • can be used to transfer annotations
  • or Use GOst
  • http//www.godatabase.org/cgi-bin/gost/gost.cgi
  • web interface to GO-Blast

48
(No Transcript)
49
(No Transcript)
50
Annotation using interproscan
  • Interpro is a database of protein families,
    domains and functional sites
  • interproscan predicts these
  • incorporates pfam, prints, panther,
  • http//www.ebi.ac.uk/InterProScan
  • Can be downloaded and run locally
  • interpro IDs can be mapped to GO IDs
  • all mappings to GO
  • http//www.geneontology.org/GO.indices.shtml

51
(No Transcript)
52
(No Transcript)
53
Other approaches
  • Natural Language Processing and mining the
    literature
  • http//www.gmod.org/textpresso.shtml

54
Perils of automated annotation
  • Pay attention to the evidence code
  • Be aware of negative annotations
  • use the graph structure
  • small sequence changes can lead to big functional
    differences

55
GO SQL Database
  • Can be queried online via AmiGO
  • http//www.godatabase.org
  • Can be downloaded as MySQL export
  • supports powerful queries
  • perl API

56
Current challenges
  • Integrating OBO ontologies
  • erythrocyte differentiation
  • regulation of smooth muscle contraction
  • Complex annotations
  • dynamically combining classes from different OBO
    ontologies in meaningful ways
  • transepithelial migration of germ cells using
    GPCR activity
  • Obol - a grammar for Bio-Ontologies
  • http//www.fruitfly.org/cjm/obol

57
Resources
  • http//www.geneontology.org
  • Main GO site
  • http//www.godatabase.org
  • Query the GO Database (web interface), GOst
  • http//www.godatabase.org/dev
  • Download database, fasta files, perl code
  • http//obo.sourceforge.net
  • Open Bio Ontologies
Write a Comment
User Comments (0)
About PowerShow.com