Title: GS2PATH: Linking Gene Ontology and Pathways
1GS2PATH Linking Gene Ontology and Pathways
6th InCoB 2007
- Jin Ok Yang
- Korean BioInformation Center
2KOBIC (Korean BioInformation Center)
- The national bioinformatics center of Korea
- Integration of diverse biological information
- Genome information
- Biodiversity information
- Bioresource information
- Bioinformatics training
- International exchange program
- Collaborative Development of bioinformatic tools
- Bioportal (Biowiki)
- Biopipeline (Bioworkflow engine)
3BioWiki
- Wiki
- a web technology that enables anyone to create
and update website contents - suited for developing online knowledge bases
(e.g., Wikipedia ) - BioWiki
- To adopt the wiki paradigm in biology
- Collaborative development of biological
knowledge bases - BioWiki Contest ( http//biowiki.net )
4BioPipe (http//www.biopipe.net)
- BioWorkFlow Engine
- No installation required
- Drag Drop, and then Connect
- BioPipe Contest !!
- Aug 15th Sep 20th
- Open free Web 2.0
Toolbar
Drag the module from the list and drop it into
the design view.
Ontology View
Design View
Monitoring View
5GS2PATH Linking Gene Ontology and Pathways
6th InCoB 2007
- Jin Ok Yang
- Korean BioInformation Center
6Background
GO Pathways
- Efforts on analyzing functional relationships
among gene sets with GO term and pathways - Gene Ontology (GO) Term based analysis ?
Analysis focused on function - GO term related pathways ? More useful information
How do you interpret the gene set ?
7Gene set enrichment
- Enrichment Test
- Means test to investigate which specific GO term
the given gene set has - P-value for GO term was calculated by using
hyper-geometric probability - Gene set enrichment
- Derives its power by focusing on gene sets, that
is, groups of genes that share common biological
function, chromosomal location, or regulation - Evaluates microarray data at the level of gene
sets which are defined based on prior biological
knowledge
8Introduction GO
- GO databases and tools
- GO term was used mostly to analyze data sets to
identify significant biological changes - Pathways also can be exploited to find functional
relationships in genes
9Introduction Pathways
10GS2PATH
- A system to find gene set enrichment in each Gene
Ontology (GO) terms and map the part of gene set
on GO term into biological pathways (KEGG and
BioCarta) - An integrated search tool for analyzing the
functional relationships in gene sets and for
providing comprehensive results
11Features
- Functional relationships between GO term and
pathways - Hyper-geometric test for gene set enrichment
- Dual search for up- and down- regulation gene set
- Various filtering options for GO terms
- the number of descendant node, evidence of GO
terms and statistical values mapping gene set
in each GO term - User-specified coloring for genes onto pathways
12Implementation (1/3)
- GS2Path consists of
- one internal database (mapping database)
- four components
- Query Processor, GO Accessor, KEGG Accessor, and
BioCarta Accessor
13Schema of internal mapping DB
14Architecture
15Implementation (2/3)
- Query Processor
- receives a user query
- Converts query into gene related information
- distributes it to the other components, waiting
for receiving results - from them
- GO Accessor
- retrieves statistical values mapping gene set
in each GO terms to KEGG and BioCarta Pathways - Calculates P-value using cumulative
hyper-geometric distribution
16Implementation (3/3)
- BioCarta and KEGG Accessor
- retrieve results from BioCarta and KEGG
databases, respectively - To support user-specified coloring,
- For KEGG, exploiting the web service API
(SOAP/WSDL) of KEGG - For BioCarta, no supporting user-defined coloring
API. Thus, after retrieving the image of a
pathway from BioCarta database, we color genes in
the image on-the-fly.
17GO Term based Pathways Analysis
18Search
- Gene set enrichment test in organism total
profile GO, KEGG and BioCarta - Single or two parts analysis (up and down
regulation) - Pathway viewer for KEGG and BioCarta
19Input
- Database
- GO category
- Biological Process
- Molecular Function
- Cellular Component
- Pathways KEGG and BioCarta
- Organism
- Human, Mouse, Rat, and Yeast
- Gene ID list
20Test
- Enrichment test
- P-value Hyper-geometric probability
- FDR (False Discovery Rate)
- Adjustment of p-value
21Filtering
- GO Term
- Evidence
- Slim
- Number of genes in term
- P-value
- Pathways KEGG and Biocarta
- Number of genes in term
- P-value
22Example microarray clustering data
Part A
Part B
23Interface
Select GO category or Pathways
Select Organism
Put the gene set
24Click
25Retaining only GO terms having at least 5 genes
26(No Transcript)
27Select customized colors
28(No Transcript)
29(No Transcript)
30Genes colored in KEGG and BioCarta
31Conclusion
- Using Gs2path, users
- Get the integrated Gene Ontology terms and
pathways information together - Filter the results with various conditions
- Capture relationships between Gene Ontology terms
and Pathways - Available at http//array.kobic.re.kr8080/arraypo
rt/gs2path/