Identifying Differentially Expressed Genes - PowerPoint PPT Presentation

About This Presentation
Title:

Identifying Differentially Expressed Genes

Description:

P-value is the probability of observing as extreme or more ... Randomly is the key word here that allows us to ascribe observed changes to the treatment alone. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 21
Provided by: mariomed
Learn more at: http://eh3.uc.edu
Category:

less

Transcript and Presenter's Notes

Title: Identifying Differentially Expressed Genes


1
Identifying Differentially Expressed Genes
  • Suppose we have T genes which we measured under
    two experimental conditions (Ctl and Nic) in n
    replicated experiments
  • ti and pi are the t-statistic and the
    corresponding p-value for the ith gene, i1,...,T
  • P-value is the probability of observing as
    extreme or more extreme value of the t-statistic
    under the null-distribution (i.e. the
    distributions assuming that ?iCtl ?iNic ) than
    the one calculated from the data (t)
  • The ith gene is "differentially expressed" if we
    can reject the ith null hypothesis ?iCtl ?iNic
    and conclude that ?iCtl ? ?iNic at a significance
    level ? (i.e. if pilt?)
  • Type I error is committed when a null-hypothesis
    is falsely rejected
  • Type II error is committed when a null-hypothesis
    is not rejected but it is false
  • Experiment-wise Type I Error is committed if any
    of a set of (T) null hypothesis is falsely
    rejected
  • If the significance level is chosen prior to
    conducting experiment, we know that by following
    the hypothesis testing procedure, we will have
    the probability of falsely concluding that any
    one gene is differentially expressed (i.e.
    falsely reject the null hypothesis) is equal to ?

2
Experiment-wise error rate
Assuming that individual tests of hypothesis are
independent and true p(Not Committing The
Experiment-Wise Error) p(Not Rejecting H01 AND
Not Rejecting H02 AND ... AND Not Rejecting H0T)
(1- ? )(1- ? )...(1- ? ) (1- ?
)T p(Committing The Experiment-Wise Error) 1-(1-
? )T If we want to keep the FWER at ?
level Sidaks adjustment ?a 1-(1- ? )1/T
Bonferroni adjustment ?a ?/T
3
Adjusting p-value
  • Individual HypothesesH0i ?iCtl
    ?iNic ? pip(tdf gt ti) , i1,...,T
  • "Composite" Hypothesis
  • H0 ?iCtl ?iNic, i1,...,T ? pminpi,
    i1,...,T
  • The composite null hypothesis is rejected if even
    a single individual hypothesis is rejected
  • Consequently the p-value for the composite
    hypothesis is equal to the minimum of individual
    p-values
  • If all tests have the same reference
    distribution, this is equivalent topp(tdf gt
    tmax)

4
Null distribution of the composite p-value
p(minpi, i1,...,T lt ?) assuming
independence1-1-?T Instead of adjusting the
significance level, can adjust all p-values pia
1-1-piT pib piT
5
This is not good enough
  • Traditional statistical approaches to multiple
    comparison adjustments which strictly control the
    experiment-wise error rates are not optimal
    probability of rejecting a single true null
    hypothesis ?
  • Need a balance between the false positive and
    false negative rates
  • Instead of controlling the probability of
    generating a single false positive, we control
    the proportion of false positives

6
False Discovery Rate
  • Using ? to test each hypothesis Expected number
    of false positives is (m0)? - large if most null
    hypothesis true and the probability of committing
    a FWE is approximately 1-(1-?)(m0)
  • When using ?adjusted, the probability of
    committing FWE is ?
  • False Discovery Rate (FDR) is equal to expected
    V/R
  • We don't know m0 so multiple comparisons
    adjustments are usually made assuming m0m
  • Turns out we can control FDR

7
False Discovery Rate
Following decision making procedure will keep FDR
below q
Alternatively, adjust p-values as
8
Randomization Issue
The ProblemIdentify genes whose expression in a
target organ (Lung) of a model organism (Rat) is
affected by an environmental toxicant
(W) PopulationAll model organisms of this type
(Rats) Sample12 randomly selected rats from the
population of all rats. (Randomly means that all
rats in the population have the equal chance of
being selected) RandomizationRandomly select 6
rats to be treated by the toxicant. Randomly is
the key word here that allows us to ascribe
observed changes to the treatment alone. Prepare
samples and extract RNA from all 12 rats Randomly
assign labeled RNA to different
microarrays Process microarrays in a random order
9
Randomization Issue
The ProblemIdentify genes whose expression in a
target organ (Lung) of a model organism (Mouse)
is affected by an environmental toxicant
(Nickel) PopulationAll model organisms of this
type (Mice) Sample6 randomly selected rats from
the population of all rats. (Randomly means that
all rats in the population have the equal chance
of being selected) RandomizationRandomly select
3 rats to be treated by the toxicant. Randomly is
the key word here that allows us to ascribe
observed changes to the treatment alone. Prepare
samples and extract RNA from all 6 mice Randomly
assign labeled RNA to different
microarrays Process microarrays in a random order
10
limma
  • ... is a package for the analysis of microarray
    data, especially the use of linear models for
    analyzing designed experiments and the assessment
    of differential expression.
  • Specially constructed data objects to represent
    various aspects of microarray data
  • Specially constructed "object methods" for
    importing, normalizing, displaying and analyzing
    microarray data
  • All objects and methods are transparent
  • All objects can be accessed and modified outside
    of limma
  • Unique in the implementation of the empirical
    Bayes procedure for identifying differentially
    expressed genes by "borrowing" information from
    different genes (everything so far has been gene
    by gene)

11
Measurement Error Model With Additive
BackgroundWNickel CCtl
Old Model
Foreground (F)
Background (B)
New Model
  • There are other models for accounting for the
    background signal
  • Simple subtraction of the background intensities
    often introduces additional variability in the
    observed signal
  • The problem is in the fact that we use a
    single-observation estimate for ?B
  • With this in mind, various strategies have been
    proposed to pool background information from more
    than one spot to estimate ?B

12
Single Channel Microarrays Each Sample Assigned
to a Different Microarray
  • 6 microarrays, 6 samples (C1,...,C3,W1,...,W3)
  • Randomly assign samples to different microarrays
  • In terms of a single gene, 6 different "spots"
  • Proceed with a two-sample t-test as we did so far

13
Two-Channel Microarrays One C and One W Sample
Assigned to Each Microarray
  • 3 microarrays, 6 samples (C1,...,C3,W1,...,W3)
  • Randomly select pairs and assign then to
    different microarrays
  • In terms of a single gene, 3 different "spots"
  • Individual samples are no longer "free" to be
    assigned to any microarray restriction on the
    randomization process
  • Measurements are "blocked" within a microarray
    (terminology)
  • We could still randomly assign samples and not
    have treatment and the control on each
    microarray, but this would be unreasonable
    (arguments to come)
  • Need to use a paired t-test

14
Paired t-test
  • For a specific gene ri xiw -xic ith
    difference, i1,,3
  • Statistical Model of observed data
  • Differential expression ? ? ? 0

t
-t
15
Two-sample t-test vs paired t-test
  • Reference Distribution t2n-2 tn-1

16
limma
Data to import http//eh3.uc.edu/teaching/cfg/200
6/data/07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5
-B.gpr http//eh3.uc.edu/teaching/cfg/2006/data/07
-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A.gpr ht
tp//eh3.uc.edu/teaching/cfg/2006/data/07-18-03_MO
-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr File
descriptionshttp//eh3.uc.edu/teaching/cfg/2006/
data/NTargets.txt Spot descriptionshttp//eh3.uc
.edu/teaching/cfg/2006/data/SpotTypes.txt Importin
g datasource("http//eh3.uc.edu/teaching/cfg/200
6/R/LimmaDataImport.R")
17
limma
gt library(limma) gt data.directorylt-"http//eh3.uc.
edu/teaching/cfg/2006/data/" gt targetslt-readTarget
s("http//eh3.uc.edu/teaching/cfg/2006/data/NTarge
ts.txt") gt targets array
experiment cy3 1 17.0
07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B.gpr
Nic-WT72hr_2 2 73.0 07-21-03_MO-S-06-23-03N73-CSV
3-3-vs-72SV3-5-A.gpr Ctl-WT00hr_3 3 98.1
07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr
Ctl-WT00hr_1 cy5 date 1
Ctl-WT00hr_2 7/21/2003 2 Nic-WT72hr_3 7/21/2003 3
Nic-WT72hr_1 7/18/2003
18
limma
gt spottypeslt-readSpotTypes("http//eh3.uc.edu/teac
hing/cfg/2006/data/SpotTypes.txt") gt spottypes
SpotType ID Name Color 1 cDNA
black 2 Blank Blank
blue 3 Control control red 4
Empty Empty blue 5 empty empty
blue gt
19
RGList class
gt LimmaDataNickellt-read.maimages(filestargetsexp
eriment,source"genepix", path data.directory,
columnslist(Gf "F532 Median",Gb "B532
Median", Rf "F635 Median", Rb "B635
Median"), annotationc("Name","ID","Block","Row"
,"Column"),wt.funwtflags(0)) Read
http//eh3.uc.edu/teaching/cfg/2006/data//07-21-03
_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B.gpr Read
http//eh3.uc.edu/teaching/cfg/2006/data//07-21-03
_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A.gpr Read
http//eh3.uc.edu/teaching/cfg/2006/data//07-18-03
_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr gt
attributes(LimmaDataNickel) names 1 "R"
"G" "Rb" "Gb" "weights" "targets"
"genes" class 1 "RGList" attr(,"package") 1
"limma"
20
RGList class
gt LimmaDataNickelR13, 07-21-03_MO-S-06-23
-03N17-72SV2-3-vs-CSV2-5-B 07-21-03_MO-S-06-23-03N
73-CSV3-3-vs-72SV3-5-A 1,
2264
3642 2,
303
734 3,
140
164 07-18-03_MO-S-06-23-03N98-CSV1-
3-vs-72SV1-5-B 1,
726 2,
248 3,
120 gt LimmaDataNickelRb13,
07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B
07-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A 1,
126
154 2,
126
150 3,
128
155
07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B 1,

129 2,
127 3,
127 gt
Write a Comment
User Comments (0)
About PowerShow.com