Identifying Differentially Expressed Genes presentation

About This Presentation

Transcript and Presenter's Notes

Title: Identifying Differentially Expressed Genes

1
Identifying Differentially Expressed Genes

Suppose we have T genes which we measured under
two experimental conditions (Ctl and Nic) in n
replicated experiments
ti and pi are the t-statistic and the
corresponding p-value for the ith gene, i1,...,T
P-value is the probability of observing as
extreme or more extreme value of the t-statistic
under the null-distribution (i.e. the
distributions assuming that ?iCtl ?iNic ) than
the one calculated from the data (t)
The ith gene is "differentially expressed" if we
can reject the ith null hypothesis ?iCtl ?iNic
and conclude that ?iCtl ? ?iNic at a significance
level ? (i.e. if pilt?)
Type I error is committed when a null-hypothesis
is falsely rejected
Type II error is committed when a null-hypothesis
is not rejected but it is false
Experiment-wise Type I Error is committed if any
of a set of (T) null hypothesis is falsely
rejected
If the significance level is chosen prior to
conducting experiment, we know that by following
the hypothesis testing procedure, we will have
the probability of falsely concluding that any
one gene is differentially expressed (i.e.
falsely reject the null hypothesis) is equal to ?

2
Experiment-wise error rate
Assuming that individual tests of hypothesis are
independent and true p(Not Committing The
Experiment-Wise Error) p(Not Rejecting H01 AND
Not Rejecting H02 AND ... AND Not Rejecting H0T)
(1- ? )(1- ? )...(1- ? ) (1- ?
)T p(Committing The Experiment-Wise Error) 1-(1-
? )T If we want to keep the FWER at ?
level Sidaks adjustment ?a 1-(1- ? )1/T
Bonferroni adjustment ?a ?/T
3
Adjusting p-value

Individual HypothesesH0i ?iCtl
?iNic ? pip(tdf gt ti) , i1,...,T
"Composite" Hypothesis
H0 ?iCtl ?iNic, i1,...,T ? pminpi,
i1,...,T
The composite null hypothesis is rejected if even
a single individual hypothesis is rejected
Consequently the p-value for the composite
hypothesis is equal to the minimum of individual
p-values
If all tests have the same reference
distribution, this is equivalent topp(tdf gt
tmax)

4
Null distribution of the composite p-value
p(minpi, i1,...,T lt ?) assuming
independence1-1-?T Instead of adjusting the
significance level, can adjust all p-values pia
1-1-piT pib piT
5
This is not good enough

Traditional statistical approaches to multiple
comparison adjustments which strictly control the
experiment-wise error rates are not optimal
probability of rejecting a single true null
hypothesis ?
Need a balance between the false positive and
false negative rates
Instead of controlling the probability of
generating a single false positive, we control
the proportion of false positives

6
False Discovery Rate

Using ? to test each hypothesis Expected number
of false positives is (m0)? - large if most null
hypothesis true and the probability of committing
a FWE is approximately 1-(1-?)(m0)
When using ?adjusted, the probability of
committing FWE is ?
False Discovery Rate (FDR) is equal to expected
V/R
We don't know m0 so multiple comparisons
adjustments are usually made assuming m0m
Turns out we can control FDR

7
False Discovery Rate
Following decision making procedure will keep FDR
below q
Alternatively, adjust p-values as
8
Randomization Issue
The ProblemIdentify genes whose expression in a
target organ (Lung) of a model organism (Rat) is
affected by an environmental toxicant
(W) PopulationAll model organisms of this type
(Rats) Sample12 randomly selected rats from the
population of all rats. (Randomly means that all
rats in the population have the equal chance of
being selected) RandomizationRandomly select 6
rats to be treated by the toxicant. Randomly is
the key word here that allows us to ascribe
observed changes to the treatment alone. Prepare
samples and extract RNA from all 12 rats Randomly
assign labeled RNA to different
microarrays Process microarrays in a random order
9
Randomization Issue
The ProblemIdentify genes whose expression in a
target organ (Lung) of a model organism (Mouse)
is affected by an environmental toxicant
(Nickel) PopulationAll model organisms of this
type (Mice) Sample6 randomly selected rats from
the population of all rats. (Randomly means that
all rats in the population have the equal chance
of being selected) RandomizationRandomly select
3 rats to be treated by the toxicant. Randomly is
the key word here that allows us to ascribe
observed changes to the treatment alone. Prepare
samples and extract RNA from all 6 mice Randomly
assign labeled RNA to different
microarrays Process microarrays in a random order
10
limma

... is a package for the analysis of microarray
data, especially the use of linear models for
analyzing designed experiments and the assessment
of differential expression.
Specially constructed data objects to represent
various aspects of microarray data
Specially constructed "object methods" for
importing, normalizing, displaying and analyzing
microarray data
All objects and methods are transparent
All objects can be accessed and modified outside
of limma
Unique in the implementation of the empirical
Bayes procedure for identifying differentially
expressed genes by "borrowing" information from
different genes (everything so far has been gene
by gene)

11
Measurement Error Model With Additive
BackgroundWNickel CCtl
Old Model
Foreground (F)
Background (B)
New Model

There are other models for accounting for the
background signal
Simple subtraction of the background intensities
often introduces additional variability in the
observed signal
The problem is in the fact that we use a
single-observation estimate for ?B
With this in mind, various strategies have been
proposed to pool background information from more
than one spot to estimate ?B

12
Single Channel Microarrays Each Sample Assigned
to a Different Microarray

6 microarrays, 6 samples (C1,...,C3,W1,...,W3)
Randomly assign samples to different microarrays
In terms of a single gene, 6 different "spots"

Proceed with a two-sample t-test as we did so far

13
Two-Channel Microarrays One C and One W Sample
Assigned to Each Microarray

3 microarrays, 6 samples (C1,...,C3,W1,...,W3)
Randomly select pairs and assign then to
different microarrays
In terms of a single gene, 3 different "spots"

Individual samples are no longer "free" to be
assigned to any microarray restriction on the
randomization process
Measurements are "blocked" within a microarray
(terminology)
We could still randomly assign samples and not
have treatment and the control on each
microarray, but this would be unreasonable
(arguments to come)
Need to use a paired t-test

14
Paired t-test

For a specific gene ri xiw -xic ith
difference, i1,,3

Statistical Model of observed data

Differential expression ? ? ? 0

t
-t
15
Two-sample t-test vs paired t-test

Reference Distribution t2n-2 tn-1

16
limma
Data to import http//eh3.uc.edu/teaching/cfg/200
6/data/07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5
-B.gpr http//eh3.uc.edu/teaching/cfg/2006/data/07
-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A.gpr ht
tp//eh3.uc.edu/teaching/cfg/2006/data/07-18-03_MO
-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr File
descriptionshttp//eh3.uc.edu/teaching/cfg/2006/
data/NTargets.txt Spot descriptionshttp//eh3.uc
.edu/teaching/cfg/2006/data/SpotTypes.txt Importin
g datasource("http//eh3.uc.edu/teaching/cfg/200
6/R/LimmaDataImport.R")
17
limma
gt library(limma) gt data.directorylt-"http//eh3.uc.
edu/teaching/cfg/2006/data/" gt targetslt-readTarget
s("http//eh3.uc.edu/teaching/cfg/2006/data/NTarge
ts.txt") gt targets array
experiment cy3 1 17.0
07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B.gpr
Nic-WT72hr_2 2 73.0 07-21-03_MO-S-06-23-03N73-CSV
3-3-vs-72SV3-5-A.gpr Ctl-WT00hr_3 3 98.1
07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr
Ctl-WT00hr_1 cy5 date 1
Ctl-WT00hr_2 7/21/2003 2 Nic-WT72hr_3 7/21/2003 3
Nic-WT72hr_1 7/18/2003
18
limma
gt spottypeslt-readSpotTypes("http//eh3.uc.edu/teac
hing/cfg/2006/data/SpotTypes.txt") gt spottypes
SpotType ID Name Color 1 cDNA
black 2 Blank Blank
blue 3 Control control red 4
Empty Empty blue 5 empty empty
blue gt
19
RGList class
gt LimmaDataNickellt-read.maimages(filestargetsexp
eriment,source"genepix", path data.directory,
columnslist(Gf "F532 Median",Gb "B532
Median", Rf "F635 Median", Rb "B635
Median"), annotationc("Name","ID","Block","Row"
,"Column"),wt.funwtflags(0)) Read
http//eh3.uc.edu/teaching/cfg/2006/data//07-21-03
_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B.gpr Read
http//eh3.uc.edu/teaching/cfg/2006/data//07-21-03
_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A.gpr Read
http//eh3.uc.edu/teaching/cfg/2006/data//07-18-03
_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr gt
attributes(LimmaDataNickel) names 1 "R"
"G" "Rb" "Gb" "weights" "targets"
"genes" class 1 "RGList" attr(,"package") 1
"limma"
20
RGList class
gt LimmaDataNickelR13, 07-21-03_MO-S-06-23
-03N17-72SV2-3-vs-CSV2-5-B 07-21-03_MO-S-06-23-03N
73-CSV3-3-vs-72SV3-5-A 1,
2264
3642 2,
303
734 3,
140
164 07-18-03_MO-S-06-23-03N98-CSV1-
3-vs-72SV1-5-B 1,
726 2,
248 3,
120 gt LimmaDataNickelRb13,
07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B
07-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A 1,
126
154 2,
126
150 3,
128
155
07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B 1,

129 2,
127 3,
127 gt

Write a Comment

User Comments (0)

About PowerShow.com

Identifying Differentially Expressed Genes PowerPoint PPT Presentation