Estimating parameters in a statistical model source"http:eh3'uc'eduteachingcfg2006RNickelLinearModel - PowerPoint PPT Presentation

About This Presentation

Title:

Estimating parameters in a statistical model source"http:eh3'uc'eduteachingcfg2006RNickelLinearModel

Description:

source('http://eh3.uc.edu/teaching/cfg/2006/R/NickelLinearModels.R' ... [1] 'contr.treatment' attr(,'contrasts')$dye [1] 'contr.treatment' attr(,'contrasts')$trt ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 30

Provided by: mariomed

Learn more at: http://eh3.uc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Estimating parameters in a statistical model source"http:eh3'uc'eduteachingcfg2006RNickelLinearModel

1
Estimating parameters in a statistical
modelsource("http//eh3.uc.edu/teaching/cfg/2006/
R/NickelLinearModels.R")

Likelihood and Maximum likelihood estimation
General Linear Model - the general framework that
will allow us to perform complicated analysis
relatively easily

2
Random Sample

Random sample A set of independently generated
observations by the statistical model.
For example, n replicated measurements of the
differences in expression levels for a gene under
two different treatments x1,x2,....,xn iid
N(?,?2)
Given parameters, the statistical model defines
the probability of observing any particular
combination of the values in this sample
Since the observations are independent, the
probability distribution function describing the
probability of observing a particular combination
is the product of probability distribution
functions

3
Probability distribution function vs probability

In the case of the discrete random variables
which have a countable number of potential values
(can assume finitely many for now), probability
density function is equal to the probability of
each value (outcome)
In the case of a continuous random variable which
can yield any number in a particular interval,
the probability distribution function is
different from the probability
Probability of any particular number for a
continuous random variable is equal to zero
Probability density function defines the
probability of the number falling into any
particular sub-interval as the area under the
curve defined by the probability density
function.

4
Probability distribution function vs probability

Example The assumption of our Normal model is
that there the outcome can be pretty much any
real number. This is obviously a wrong
assumption, but it turns out that this model is a
good approximation of the reality.
We could "discretize" this random variable.
Define r.v. y1 if xgtc and 0 otherwise for
some constant c
This random variable can be assume 2 different
values and the probability distribution function
is define by p(y1)
Although the probability distribution function in
the case of a continuous random variable does not
give probabilities, it satisfies key properties
of the probability.

5
Back to basics Probability, Conditional
Probability and Independence

Discrete pdf (y)

Continuous pdf (x)

For x1,...,xn iid of f(x)

For y1,...,yn iid of p(y)

From now on, we will talk in terms of just a pdf
and things will hold for both discrete and
continuous random variables

6
Expectation, Expected value and Variance

Discrete pdf (y)
Expectation of any function g of the random
variable y (average value of the function after a
large number of experiments

Continuous pdf (x)
Expectation of any function g of the random
variable x

Expected value - average x after a very large
number of experiments

Expected value - average y after a very large
number of experiments

Variance - Expected value of (y-E(y))2

Variance - Expected value of (x-E(x))2

7
Expected Value and Variance of a Normal Random
Variable

Normal pdf

Expected value - average x after a very large
number of experiments

Variance - Expected value of (x-E(x))2

8
Maximum Likelihood

x1,x2,....,xn iid N(?,?2)
Joint pdf for the whole random sample

Likelihood function is basically the pdf for the
fixed sample

Maximum likelihood estimates of the model
parameters ? and ?2 are numbers that maximize the
joint pdf for the fixed sample which is called
the Likelihood function

9
Simple Linear Model (Use y for x)
Statistical Model Trtj"Effect due to treatmet
j" Ctl for j1 and Nic for j2 yij
log-intensity for replicate i and
treatment j, where i1,2,3 and j1,2
Maximum Likelihood Estimation
We are interested in d ?2 - ?1
10
Alternative Linear Model
Statistical Model Tj"Effect due to treatmet
j"Ctl for j1 and Nic for j2 ?Overall Mean
Expression Level yij log-intensity for
replicate i and treatment j, where
i1,2,3 and j1,2
Maximum Likelihood Estimation
We are interested in d ?2 - ?1
11
We can specify different linear models and still
get the same answer if we know how to ask the
right question
M1 Estimating Two Parameters
M2 Estimating Three Parameters - need additional
"constraint"Set T10
Specifying the model for limma and calculating
gt designMatrix1lt-cbind(Ctlc(1,0,0,0,1,1), Nicc(0
,1,1,1,0,0)) gt designMatrix1 Ctl Nic 1,
1 0 2, 0 1 3, 0 1 4, 0 1 5,
1 0 6, 1 0 gt contrastlt-c(-1,1) gt
FitLM1lt-lmFit(LinearModelData,designMatrix1) gt
FitCLM1lt-contrasts.fit(FitLM1,contrast) gt
EFitLM1lt-eBayes(FitCLM1)
gt designMatrix2lt-cbind(Intc(1,1,1,1,1,1), Nicc(0
,1,1,1,0,0)) gt designMatrix2 Int Nic 1,
1 0 2, 1 1 3, 1 1 4, 1 1 5,
1 0 6, 1 0 gt FitLM2lt-lmFit(LinearModelDa
ta,designMatrix2) gt EFitLM2lt-eBayes(FitLM2) gt
12
Pretending That Data is Single Channel Again

gt LinearModelDatalt-cbind(NLNBLimmaDataNickel,NLNBL
immaDataNickel)
gt dim(LinearModelData)
1 14112 6
gt LinearModelDataM1,
Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2
Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3
Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1
Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2
-0.1216908
1.2155011
-0.4833670 -0.1216908
Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3
Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1
1.2155011
-0.4833670
gt LinearModelDataMlt-cbind((NLNBLimmaDataNickelA
NLNBLimmaDataNickelM/2),NLNBLimmaDataNickelA-(NL
NBLimmaDataNickelM/2))
gt LinearModelDataM1,
Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2
Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3
Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1
Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2
11.354734
11.842679
9.485143 476425
Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3
Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1
10.627177
9.968510
gt
This will "split" ratios back into the two
channel data
We could use original R and G data, but want to
make use of loess normalization

13
Comparing Results of Two Models
gt par(mfrowc(1,2)) gt plot(EFitLM1coefficients,EF
itLM2coefficients,"Nic",main"Estimated
Differential Expressions", xlab"Model
1",ylab"Model 2") gt gt plot(-log10(EFitLM1p.valu
e),-log10(EFitLM2p.value),"Nic",main"-log10(P-
values)", xlab"Model 1",ylab"Model 2")
14
General Linear Model
y X? ?
Maximum Likelihood Estimation
15
General Linear Model
y X? ?
gt XpXlt-t(designMatrix2)designMatrix2 gt XpX
Int Nic Int 6 3 Nic 3 3 gt
XpXInvlt-solve(XpX) gt XpXInv Int
Nic Int 0.3333333 -0.3333333 Nic -0.3333333
0.6666667 gt XpXInvXpX Int Nic Int 1
0 Nic 0 1 gt BetaHatlt-(XpXInvt(designMatrix2
))LinearModelDataM1, gt BetaHat
,1 Int 10.6501404 Nic 0.2846083 gt
EFitLM2coefficients1, Int Nic
10.6501404 0.2846083 gt
16
Linear Models for paired t-test(The model
changes but the Linear Model estimation
"machinery" is the same)
y X? ?
yi is the log-ratio of Nic-measurement
toCtl-measurement at the ith microarray
Set T10 A10
y X? ?
17
General Linear Model
gt PairedTTSimpleDesignlt-c(-1,1,1) gt
PairedTTlt-lmFit(NLNBLimmaDataNickel,PairedTTSimple
Design) gt EPairedTTlt-eBayes(PairedTT) gt gt
PairedTTDesignlt-cbind(Intc(1,1,1,1,1,1),Array2c(
0,1,0,0,1,0),Array3c(0,0,1,0,0,1),Nicc(0,1,1,1,0
,0)) gt PairedTTDesign Int Array2 Array3
Nic 1, 1 0 0 0 2, 1 1
0 1 3, 1 0 1 1 4, 1
0 0 1 5, 1 1 0 0 6, 1
0 1 0 gt LMPairedTTlt-lmFit(LinearModelDa
ta,PairedTTDesign) gt ELMPairedTTlt-eBayes(LMPairedT
T)
18
General Linear Model
gt par(mfrowc(1,2)) gt plot(EPairedTTcoefficients,
ELMPairedTTcoefficients,"Nic",main"Estimated
Differential Expressions", xlab"Simple
t-test",ylab"Linear Model") gt gt
plot(-log10(EPairedTTp.value),-log10(ELMPairedTT
p.value),"Nic",main"-log10(P-values)",
xlab"Simple t-test",ylab"Linear Model") gt
19
Why Bother? Turns out that we can do things in
the GLM framework that would be really
complicated without it - Factoring Out
Gene-Specific Dye Effect
D1Cy3 effectD2Cy5 effect
y X? ?
Set T10 A10 D10
20
General Linear Model
gt designMatrixDyelt-cbind(Intc(1,1,1,1,1,1),Array2
c(0,1,0,0,1,0),Array3c(0,0,1,0,0,1),
Nicc(0,1,1,1,0,0),Cy5c(1,1,1,0,0,0)) gt
designMatrixDye Int Array2 Array3 Nic
Cy5 1, 1 0 0 0 1 2, 1
1 0 1 1 3, 1 0 1 1
1 4, 1 0 0 1 0 5, 1 1
0 0 0 6, 1 0 1 0 0 gt
FitLMADlt-lmFit(LinearModelData,designMatrixDye) gt
EFitLMADlt-eBayes(FitLMAD) gt gt EBayesFDRDlt-p.adjus
t(EFitLMADp.value,"Nic",method"fdr") gt
DyeFDRlt-p.adjust(EFitLMADp.value,"Cy5",method"
fdr") gt ELMPairedTTFDRlt-p.adjust(ELMPairedTTp.val
ue,"Nic",method"fdr") gt
21
General Linear Model
gt plot(EFitLMADcoefficients,"Nic",ELMPairedTTc
oefficients,"Nic",xlab"Linear Model with
Dye",ylab"Linear Model without Dye") gt
22
General Linear Model
gt boxplot(list(NoArrayFitLM2sigma,NoDyeEffectLM
PairedTTsigma,WithDyeEffectFitLMADsigma,NoDyeEf
fectLMPairedTTsigma),xlab"Linear Model with
Dye",ylab"Linear Model without Dye") gt
boxplot(list(NoArrayFitLM2sigma,NoDyeEffectLMPa
iredTTsigma,WithDyeEffectFitLMADsigma),xlab"Li
near Model with Dye",ylab"Linear Model without
Dye") gt boxplot(list(NoArrayFitLM2sigma,NoDyeEff
ectLMPairedTTsigma,WithDyeEffectFitLMADsigma),
xlab"",ylab"Estimated Sigma") gt
23
Factoring out Dye effect - what it really means
D1Cy3 effectD2Cy5 effect
y X? ?
Set T10 A10 D10
Set T10 A10
y X? ?
24
What happens if we don't factor out Dye effect?
D1Cy3 effectD2Cy5 effect
25
Factoring out Dye effect - what it really means
y X? ?
Set T10 A10 D10
26
To Dye or not to Dye

If there is a significant dye effect we need to
factor it out or the estimates of effects of
interest will be biased
The estimates of variance are likely going to be
smaller after factoring the Dye effect even if it
is not significant
The estimates of standard errors (denominator in
the t-statistic) would also likely be smaller
P-values could increase due to the lower
t-statistic or because of the loss of one degree
of freedom

27
General Linear Model
gt par(mfrowc(1,3)) gt plot(ELMPairedTTcoefficient
s,"Nic",-log(ELMPairedTTFDR,base10),main"Empir
ical Bayes t-test", xlab"Mean
Difference",ylab"-log10(FDRpvalue)",ylimc(0,3))
gt grid(nx NULL, ny NULL, col "lightgray",
lty "dotted",lwd NULL, equilogs TRUE) gt gt
plot(EFitLMADcoefficients,"Nic",-log(EBayesFDRD
,base10),main"Empirical Bayes t-test with
Dye", xlab"Mean Difference",ylab"-log10(FDRpva
lue)",ylimc(0,3)) gt grid(nx NULL, ny NULL,
col "lightgray", lty "dotted",lwd NULL,
equilogs TRUE) gt gt plot(EFitLMADcoefficients,
"Nic",-log(DyeFDR,base10),main"Empirical Bayes
t-test for Dye", xlab"Mean Difference",ylab"-l
og10(FDRpvalue)",ylimc(0,3)) gt grid(nx NULL,
ny NULL, col "lightgray", lty "dotted",lwd
NULL, equilogs TRUE) gt
28
What happens if we don't factor out Dye effect?

Because the design is unbalanced (i.e. all
treatments don't have equal number of Cy5 and Cy3
labelings) with respect to Dye assignment - the
existence of a dye-effect (i.e. D2?0) would
introduce "bias"
If the design was balanced with respect to Dye -
there would be no bias but the existence of a
dye-effect would increase the variability of the
estimates
The dye effect is not present in Affymetrix
technology, but the same issues arise in
multi-factorial experiments where the effect of
one treatment needs to be assessed after
factoring out the effect of other experimental
factors
For example Effect of nickel on two different
strains of mice - the overall effects of Nickel
needs to be adjusted for the effects of the mouse
genotype

29
Creating Design Matrices
gt arrayslt-factor(c(1,2,3,1,2,3)) gt
dyelt-factor(c("Cy5","Cy5","Cy5","Cy3","Cy3","Cy3")
) gt trtlt-factor(c("Ctl","Nic","Nic","Nic","Ctl","C
tl")) gt gt DMRDyelt-model.matrix(arraysdyetrt) gt
DMRDye (Intercept) arrays2 arrays3 dyeCy5
trtNic 1 1 0 0 1
0 2 1 1 0 1 1 3
1 0 1 1 1 4
1 0 0 0 1 5 1
1 0 0 0 6 1 0
1 0 0 attr(,"assign") 1 0 1 1 2
3 attr(,"contrasts") attr(,"contrasts")arrays 1
"contr.treatment" attr(,"contrasts")dye 1
"contr.treatment" attr(,"contrasts")trt 1
"contr.treatment" gt

Write a Comment

User Comments (0)