Title: Aligning the Gene Ontologies
1Aligning the Gene Ontologies
- Antonio Sanfilippo
- Christian Posse
- Banu Gopalan
- Pacific Northwest National Laboratory
- Richland, WA
2Problem
- The Gene Ontologies address one knowledge domain
at a time and are not cross-indexed - Related concepts across distinct ontologies have
to be explicitly used to retrieve all relevant
information
3Solution Develop an automatic alignment
methodology for the Gene Ontologies
- Current approached to automatic ontology
alignment - Based on logical structure of ontologies
- Based on the reference content of ontology nodes
- Our approach
- Establish weighted cross-ontological links using
reference content of ontology nodes - Calculate semantic similarity between nodes in
the same ontology using reference content and
ontology structure - Align ontology nodes by combining weighted
cross-ontological links and semantic similarity
values
4Establish Weighted Links across Ontologies
- Construct word-based vector signatures of GO terms
- Apply Singular Value Decomposition (SVD) so that
vector signatures are smaller and more
discriminative
- Derive weighted links by taking cosine measure of
SVD vector signatures for GO terms of distinct
ontologies
5Calculate Semantic Similarity between GO Terms in
the Same Ontology
- Adapt entropy-based approaches developed for
WordNet (Lord et al. 2003, Couto et al. 2003, ) - The semantic similarity between two nodes in a GO
ontology is equal to the entropy value of their
most immediately dominating parent (after Resnik
95)
sem_sim(GO0007166, GO0007242)
-entropy(GO0007165) -logP(GO0007165)
6GO Alignment
- Compute alignment values between GO nodes across
Gene Ontologies by combining weighted links and
semantic similarity values
BP - GO0007165 (signal transduction)
MF - GO0004871 (signal transducer activity)
GO0007166 cell surface receptor linked signal
transduction
GO0007242 intracellular signaling cascade
GO0005057 receptor signaling protein activity
GO0004872 receptor activity
sem_sim
weighted_link
alignment_strength(GO0005057, GO0007242)
weighted_link(GO0005057, GO0007242)
sem_sim(GO0007166, GO0007242)
7A Sample Application Discover Relation between
Genes
- Prior systemic administration of
Lipopolysaccharide (LPS) induces neuro-protection
against subsequent stroke injury in mice (Stevens
et al. 2004) - LPS-induced neuro-protection involves changes in
the expression of 12 genes - All have similar regulation patterns
- Regulated only with LPS
- Candidates for mediators of protective phenotype
- Q How to help understand the molecular
mechanisms of regulation for these genes? - A One approach is to investigate ways in which
the 12 genes are related via Molecular Functions,
Biological Processes and Cellular Components
8Our Goal
9Relating Genes Via MF, BP and CC
- Data gathering
- Collect documents and GO codes associated with
each of the 12 Genes - GO alignment
- Using documents and GO to make alignment
assessments - Gene comparison by aligned GO codes
- If gene A has GO1 from BP and gene B has GO2 from
MF, then the comparison of genes A and B will
have GO1-GO2
10(No Transcript)
11Further Work
- Refine alignment approach using extraction of
links and relations from text to inform the
creation of GO signatures - Carry out full alignment of the Gene Ontologies
- Evaluation
- Collect cross-ontology GO links from databases to
establish ground truth set - Evaluate utility of correlations between genes
and cross-ontological links to explain the
mechanism of LPS preconditioning - Unify the three Gene Ontologies without loss of
specificity in the three distinct domains - Provide a semantic characterization of
cross-ontological links to relate concepts across
the three Gene Ontologies in a more meaningful
way
12Thank you!
13Some Alignment Results
14Data Set