Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data - PowerPoint PPT Presentation

About This Presentation
Title:

Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data

Description:

Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data ... Succeptible to Tamoxifen. Slightly better survival rate ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 36
Provided by: Obe83
Category:

less

Transcript and Presenter's Notes

Title: Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data


1
Gene Ontology as a tool for the systematic
analysis of large-scale gene-expression data
  • Stefan Bentink
  • Joint groupmeeting Klipp/Spang
  • 11-20-2002

2
Overview
  • Microarrays and the Gene Ontology (GO) database
  • Scoring differential gene-expression in GO groups
  • Checking scores against different null
    hypothesises
  • Sample data (two types of Breast Cancer) and
    results

3
Overview
  • Microarrays and the Gene Ontology (GO) database
  • Scoring differential gene-expression in GO groups
  • Checking scores against different null
    hypothesises
  • Sample data (two types of Breast Cancer) and
    results

4
Microarrays sample scheme
5
Microarrays comparative analysis
sample tissue I 1,2,... tissue II 1,2,...
gene 1 mean mean gt t-value
gene 2 mean mean gt t-value
gene 3 mean mean gt t-value
...
6
How to interprete the data?
  • Long list of siginficant genes
  • Which genes are of interest?
  • Solution pooling of genes into functional
    classes
  • provides a general overview
  • Gene Ontology database provides such a functional
    classification

7
The Gene Ontology database
8
The Gene Ontology database
  • GO is a database of terms for genes
  • Known genes are annotated to the terms
  • Terms are connected as a directed acyclic graph
  • Levels represent specifity of the terms

9
The Gene Ontology database
Apoptotic protease activator
10
The Gene Ontology database
  • Every child-term is a member of its parent-term
  • GO contains three different sub-ontologies
  • Molecular function
  • Biological process
  • Cellular component
  • Unique identfier for every term
  • GO0003673(rootGene Ontology)

11
Gene Ontology and microarrays
  • Hypothesis Functionally related, differentially
    expressed genes should accumulate in the
    corresponding GO-group.
  • Problem Find a method, which scores accumulation
    of differential gene expression in a node of the
    Gene Ontology.

12
Gene Ontology and microarrays
P-value for every gene by a two-sample t-test
13
Overview
  • Microarrays and the Gene Ontology (GO) database
  • Scoring differential gene-expression in GO groups
  • Checking scores against different null
    hypothesises
  • Sample data (two types of Breast Cancer) and
    results

14
Scoring methods
?
  • Number of significant genes in a GO-group
  • Sum of negative logarithms of all p-values
  • supP(n)-F(n) according to Kolmogorov-Smirnov

15
The p-value
tlt0 gt p cdf tgt0 gt p 1-cdf gt p(0,
0.5 m(0, 1 m2p
t
  • cdf cummulative distribution function

16
Sum of log-score
  • Pavalidis, Lewis, Noble 2001 Zien, Küffner,
    Zimmer, Lengauer 2000
  • 2p -gt 1 gt -log(2p) -gt 0
  • Small p-values, high score

17
Kolmogorov-Smirnov-Score
SsupP(n)-F(n) P(n) p-values for genes that
fall into a GO-group. F(n) equally
distributed values between 0 and 1.
18
Overview
  • Microarrays and the Gene Ontology (GO) database
  • Scoring differential gene-expression in GO groups
  • Checking scores against different null
    hypothesises
  • Sample data (two types of Breast Cancer) and
    results

19
Null hypothesises
  • The significant genes (according to Bonferoni
    a0.05/n) are distributed over the GO-groups by
    chance
  • The existing differential gene expression is
    distributed over the GO-groups by chance
  • There is no differential gene expression in a
    GO-group

20
Checking H0 by permutation
Permutation of rows Mapping of p-values into
GO-groups is randomized. H0 Distribution of
differential gene expression
Permutation of columns Level of p-values is
randomized. H0 No differential gene expression
in a GO-group
21
Checking H0 by permutation
  • 1000 random permutations gt background
    distributions
  • H0 Distr. of significant genes
  • Randomizing GO-groups (rows)
  • H0 Distr. of all p-values
  • Randomizing GO-groups (rows)
  • H0 Level of p-values
  • Permutation of columns

22
Methods (summary)
Data P-values
23
Overview
  • Microarrays and the Gene Ontology (GO) database
  • Scoring differential gene-expression in GO groups
  • Checking scores against different null
    hypothesises
  • Sample data (two types of Breast Cancer) and
    results

24
Results Data (Breast Cancer)
  • Two major subclasses
  • Estrogen receptor postive (ER)
  • Estrogen receptor negative (ER-)
  • Estrogen receptor postive
  • Succeptible to Tamoxifen
  • Slightly better survival rate
  • Great molecular differences between the two types

25
Results Data (Breast Cancer)
  • Data 25 ER, 24 ER-
  • Array Affymetrix HuGeneFL
  • 7000 Genes
  • 4000 annotated to GO-terms
  • Data were normalized by variance stabilization
    (Heydebreck et. al 2001)

26
Results Pre-conditions
  • GO-group considered to be significant if less
    than 5 of the random permutations exceeds the
    score
  • Only GO-groups with more than 5 and less than
    1000 genes were taken into account

27
Results Number of significant genes
According to the pre-conditions 16 GO-groups were
found
28
Results Permutation of rows (distribution
hypothesis)
Sum of log P Kolmogorov-Smirnov
29
Results Permutation of columns (differential
gene-expression hypothesis)
Sum of log P Kolmogorov-Smirnov
30
Results
  • The column-permutation leads to a very low
    background distribution
  • Many significant GO-groups
  • May help to find functional groups without
    differential gene-expression
  • Different scoring methods seem to be
    complementary as indicated by the results of the
    row-permutation

31
Results Permutation of the rows
Sum of log 44 GO-groups were found (5 cond.,
...)
KS-score 77 GO-groups were found (5 cond., ...)
GO0000087 M-Phase of mitotic cell-cycle (37
genes)
32
Results Comparing the scoring-methods (from the
row-permutation)
A counting of significant genes in GO-groups B
Kolomogorov-Smirnov C sum of logarithms
A 16 B 77 C 43 A and B 3 A and C 13 C and
B 13 A, B and C 3 C without A 30 B without
A 74
33
Browsing the results
34
Results Interesting GO-term (M-Phase)
  • Contains a couple of interesting proliferative
    genes (p-value 510-4 gt not significant)
  • E.g. polo-like kinase
  • t-value -3.45 p-value 5.5910-4
  • would not been found by a single-gene approach
  • correlation with ER-Receptor could be found in
    literature (Wolf et al, 2000)

35
Summary/ outlook
  • GO provides a general view on large-scale
    gene-expression data
  • Less deregulated but very interesting genes could
    be found
  • Third null hypothesis gt differential gene
    expression over a wide range of genes (outlook
    which GO-groups contain no differential
    gene-expression)
  • No bias of scores by top-level genes (outlook
    leaving out top-level genes for scoring)
  • Possible modification of scoring-methods up- and
    downregulation
Write a Comment
User Comments (0)
About PowerShow.com