Gene Ontology - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Gene Ontology

Description:

Use microarrays to find heart circadian genes and liver circadian genes ... Using hypergeometric test (the 2nd generation of GoSurfer ) Obs22. Obs21. Gene List 2 ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 17
Provided by: ovidiu
Category:

less

Transcript and Presenter's Notes

Title: Gene Ontology


1
Gene Ontology
2
Gene Ontology
  • Gene Ontology (GO) is a controlled vocabulary
    that can be applied to all organisms even as
    knowledge of gene and protein roles in cells is
    accumulating and changing.
  • It is used to annotate genes.
  • It is computable biological knowledge!
  • Parent-child relationship and hierarchical
    organization.

3
A motivating exampleUtilizing GO in microarray
study
  • Use microarrays to find heart circadian genes and
    liver circadian genes
  • Map these two sets of genes onto the GO space
  • Color code the GO terms that are exclusively
    occupied by either heart circadian genes of liver
    circadian genes.

Storch et al, Nature. 2002 41778-83
4
Finding Enriched GO terms
  • A comparison between two gene sets
  • Looking for the GO terms that are enriched in one
    of the gene sets and relatively depleted in the
    other
  • For every GO term, a p-value can be calculated
    from the following table
  • Using hypergeometric test (the 2nd generation of
    GoSurfer )

The null hypothesis
The alternative
5
(No Transcript)
6
Multiple hypothesis testing
  • Suppose there are totally N GO terms being
    tested.
  • The N GO terms are tested simultaneously
  • Picking GO terms with p-value become problematic
  • If the null hypothesis was true for all GO terms
    and all the tests were independent to each other,
    the N p-values would take a uniform (0,1)
    distribution.
  • I.e., it is expected to have 5 percent of the GO
    terms with p-values lt 0.05, even all the GO terms
    are in fact not enriched.
  • In reality, the situation is more involved.
  • The GO terms are not independent to each other
  • the hierarchical structure of the GO
  • the usage of multiple GO terms in the annotation
    of one gene.
  • We do not even know the distribution of the
    p-values when the null is true for all the GO
    terms

7


8
Quantities in multiple testing
  • False discovery rate (FDR). The FDR is the
    expected proportion of false positive among
    rejected hypotheses, i.e.,
  • Q-value The smallest achievable FDR for a given
    rejection region
  • We propose to use FDR (or q-value) in finding
    enriched GO terms

9
The procedure
  • Calculate Pi for every GO term i.
  • Permute the genes in the two gene lists.
  • Calculate Piperm for the permutated data. Rank
    order the Piperms
  • Repeat 2 and 3 for many times
  • Estimate FDR for the ith GO term by
  • Use a threshold on FDR to identify enriched GO
    terms.

10
GoSurfer software
  • Windows based graphical data mining tool
  • Map genes (Affymetrix probe set id, etc) onto the
    gene ontology space
  • Perform statistical tests to search for
    interesting GO terms
  • Highlight these GO terms in the GO tree
  • All the GO terms can be interactively interrogated

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Using GoSurfer as downstream software of
microarray analysis
  • The dChip-GoSurfer software pipeline
  • dChip is a microarray analysis software package
  • The interesting gene lists generated by dChip
    can be used as input of GoSurfer
  • GoSurfer can be called out from dChip

16
Reference
  • Gene ontology tool for the unification of
    biology. The Gene Ontology Consortium.Nature
    Genetics 2000 25(1)25-9
  • Comparative Analysis of Gene Sets in the Gene
    Ontology Space under the Multiple Hypothesis
    Testing Framework. Zhong S, Tian L, Li C,
    Storch FK, and Wong WH. Proc IEEE Computational
    Systems Bioinformatics (CSB) 2004 425-435
Write a Comment
User Comments (0)
About PowerShow.com