Some discussion on multiple hypothesis test and false discovery adjustment presentation

About This Presentation

Transcript and Presenter's Notes

Title: Some discussion on multiple hypothesis test and false discovery adjustment

1
Some discussion on multiple hypothesis test and
false discovery adjustment
2
P-value of NULL distribution

6000 genes
100 samples, 50 each
All N(0,1)
P-value uniform distribution 0,1

3
P-value and hypothesis test

P-value of any statistics the possibility that
getting the statistics from NULL hypothesis
P-value cutoff set an arbitrary threshold on one
hypothesis test
When there are multiple hypothesis tests, it may
risk in getting too much false discovery
e.g. in 6000 genes, if cut by p-value 0.05, there
will be up to
6000 0.05 300 false discoveries
Need some adjustment to these statistics, to
guarantee that the False Discovery Rate in the
finally rejected lists is lower than a
pre-defined level f.

4
FDR adjusted P-value

Sort raw P-value
Reject the test with the i-th smallest P-value if
Pi lt f/i, where f is a pre-defined FDR level.

5
Difficulties

Maybe it is too stringent, especially when the
signal is very weak, and the number of genes is
too large
Pre-screening may help to purify the data set and
get better result after FDR.
What does it mean by FDR lt 0.05, if there are
only 10 genes being rejected?

6
Pair-wise t-test of pre-post exposure

Raw P-value of paired t-test
Scatter plot of raw P-value and mean difference
There are some information, but too weak berried
under noise.

7
Linear regression of gene expression versus
exposure

Raw P-value of linear regression
Scatter-plot of R-sqrt and raw P-value

8
Alternative idea estimate the overall
distribution by permutation
9
Randomly flip pre-post pair of all genes of some
sample, and get the maximum t-statistics from the
whole permuted data set
10
Distribution of the maximum t-statistics in 1000
permuted data sets

Find the gene with the true t-statistics above
the 95 percentile of this distribution

11
Consider the sets of gene, not individual ones

Genes are correlated
We have prior knowledge to group genes together
GO terms
Pathway,
functional group
Clustering
Design statistics on set of gene, not individual
genes, to be differentially expressed between
conditions
Combine weak information of genes in the sets,
and reduce the number of hypothesis tests

12
(No Transcript)
13
(No Transcript)
14
Moothas data

Write a Comment

User Comments (0)

About PowerShow.com

Some discussion on multiple hypothesis test and false discovery adjustment PowerPoint PPT Presentation