Some discussion on multiple hypothesis test and false discovery adjustment PowerPoint PPT Presentation

presentation player overlay
1 / 14
About This Presentation
Transcript and Presenter's Notes

Title: Some discussion on multiple hypothesis test and false discovery adjustment


1
Some discussion on multiple hypothesis test and
false discovery adjustment
2
P-value of NULL distribution
  • 6000 genes
  • 100 samples, 50 each
  • All N(0,1)
  • P-value uniform distribution 0,1

3
P-value and hypothesis test
  • P-value of any statistics the possibility that
    getting the statistics from NULL hypothesis
  • P-value cutoff set an arbitrary threshold on one
    hypothesis test
  • When there are multiple hypothesis tests, it may
    risk in getting too much false discovery
  • e.g. in 6000 genes, if cut by p-value 0.05, there
    will be up to
  • 6000 0.05 300 false discoveries
  • Need some adjustment to these statistics, to
    guarantee that the False Discovery Rate in the
    finally rejected lists is lower than a
    pre-defined level f.

4
FDR adjusted P-value
  • Sort raw P-value
  • Reject the test with the i-th smallest P-value if
    Pi lt f/i, where f is a pre-defined FDR level.

5
Difficulties
  • Maybe it is too stringent, especially when the
    signal is very weak, and the number of genes is
    too large
  • Pre-screening may help to purify the data set and
    get better result after FDR.
  • What does it mean by FDR lt 0.05, if there are
    only 10 genes being rejected?

6
Pair-wise t-test of pre-post exposure
  • Raw P-value of paired t-test
  • Scatter plot of raw P-value and mean difference
  • There are some information, but too weak berried
    under noise.

7
Linear regression of gene expression versus
exposure
  • Raw P-value of linear regression
  • Scatter-plot of R-sqrt and raw P-value

8
Alternative idea estimate the overall
distribution by permutation
9
Randomly flip pre-post pair of all genes of some
sample, and get the maximum t-statistics from the
whole permuted data set
10
Distribution of the maximum t-statistics in 1000
permuted data sets
  • Find the gene with the true t-statistics above
    the 95 percentile of this distribution

11
Consider the sets of gene, not individual ones
  • Genes are correlated
  • We have prior knowledge to group genes together
  • GO terms
  • Pathway,
  • functional group
  • Clustering
  • Design statistics on set of gene, not individual
    genes, to be differentially expressed between
    conditions
  • Combine weak information of genes in the sets,
    and reduce the number of hypothesis tests

12
(No Transcript)
13
(No Transcript)
14
Moothas data
Write a Comment
User Comments (0)
About PowerShow.com