Estimating parameters in a statistical model source"http:eh3'uc'eduteachingcfg2006RNickelLinearModel - PowerPoint PPT Presentation

About This Presentation
Title:

Estimating parameters in a statistical model source"http:eh3'uc'eduteachingcfg2006RNickelLinearModel

Description:

source('http://eh3.uc.edu/teaching/cfg/2006/R/NickelLinearModels.R' ... [1] 'contr.treatment' attr(,'contrasts')$dye [1] 'contr.treatment' attr(,'contrasts')$trt ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 30
Provided by: mariomed
Learn more at: http://eh3.uc.edu
Category:

less

Transcript and Presenter's Notes

Title: Estimating parameters in a statistical model source"http:eh3'uc'eduteachingcfg2006RNickelLinearModel


1
Estimating parameters in a statistical
modelsource("http//eh3.uc.edu/teaching/cfg/2006/
R/NickelLinearModels.R")
  • Likelihood and Maximum likelihood estimation
  • General Linear Model - the general framework that
    will allow us to perform complicated analysis
    relatively easily

2
Random Sample
  • Random sample A set of independently generated
    observations by the statistical model.
  • For example, n replicated measurements of the
    differences in expression levels for a gene under
    two different treatments x1,x2,....,xn iid
    N(?,?2)
  • Given parameters, the statistical model defines
    the probability of observing any particular
    combination of the values in this sample
  • Since the observations are independent, the
    probability distribution function describing the
    probability of observing a particular combination
    is the product of probability distribution
    functions

3
Probability distribution function vs probability
  • In the case of the discrete random variables
    which have a countable number of potential values
    (can assume finitely many for now), probability
    density function is equal to the probability of
    each value (outcome)
  • In the case of a continuous random variable which
    can yield any number in a particular interval,
    the probability distribution function is
    different from the probability
  • Probability of any particular number for a
    continuous random variable is equal to zero
  • Probability density function defines the
    probability of the number falling into any
    particular sub-interval as the area under the
    curve defined by the probability density
    function.

4
Probability distribution function vs probability
  • Example The assumption of our Normal model is
    that there the outcome can be pretty much any
    real number. This is obviously a wrong
    assumption, but it turns out that this model is a
    good approximation of the reality.
  • We could "discretize" this random variable.
    Define r.v. y1 if xgtc and 0 otherwise for
    some constant c
  • This random variable can be assume 2 different
    values and the probability distribution function
    is define by p(y1)
  • Although the probability distribution function in
    the case of a continuous random variable does not
    give probabilities, it satisfies key properties
    of the probability.

5
Back to basics Probability, Conditional
Probability and Independence
  • Discrete pdf (y)
  • Continuous pdf (x)
  • For x1,...,xn iid of f(x)
  • For y1,...,yn iid of p(y)
  • From now on, we will talk in terms of just a pdf
    and things will hold for both discrete and
    continuous random variables

6
Expectation, Expected value and Variance
  • Discrete pdf (y)
  • Expectation of any function g of the random
    variable y (average value of the function after a
    large number of experiments
  • Continuous pdf (x)
  • Expectation of any function g of the random
    variable x
  • Expected value - average x after a very large
    number of experiments
  • Expected value - average y after a very large
    number of experiments
  • Variance - Expected value of (y-E(y))2
  • Variance - Expected value of (x-E(x))2

7
Expected Value and Variance of a Normal Random
Variable
  • Normal pdf
  • Expected value - average x after a very large
    number of experiments
  • Variance - Expected value of (x-E(x))2

8
Maximum Likelihood
  • x1,x2,....,xn iid N(?,?2)
  • Joint pdf for the whole random sample
  • Likelihood function is basically the pdf for the
    fixed sample
  • Maximum likelihood estimates of the model
    parameters ? and ?2 are numbers that maximize the
    joint pdf for the fixed sample which is called
    the Likelihood function

9
Simple Linear Model (Use y for x)
Statistical Model Trtj"Effect due to treatmet
j" Ctl for j1 and Nic for j2 yij
log-intensity for replicate i and
treatment j, where i1,2,3 and j1,2
Maximum Likelihood Estimation
We are interested in d ?2 - ?1
10
Alternative Linear Model
Statistical Model Tj"Effect due to treatmet
j"Ctl for j1 and Nic for j2 ?Overall Mean
Expression Level yij log-intensity for
replicate i and treatment j, where
i1,2,3 and j1,2
Maximum Likelihood Estimation
We are interested in d ?2 - ?1
11
We can specify different linear models and still
get the same answer if we know how to ask the
right question
M1 Estimating Two Parameters
M2 Estimating Three Parameters - need additional
"constraint"Set T10
Specifying the model for limma and calculating
gt designMatrix1lt-cbind(Ctlc(1,0,0,0,1,1), Nicc(0
,1,1,1,0,0)) gt designMatrix1 Ctl Nic 1,
1 0 2, 0 1 3, 0 1 4, 0 1 5,
1 0 6, 1 0 gt contrastlt-c(-1,1) gt
FitLM1lt-lmFit(LinearModelData,designMatrix1) gt
FitCLM1lt-contrasts.fit(FitLM1,contrast) gt
EFitLM1lt-eBayes(FitCLM1)
gt designMatrix2lt-cbind(Intc(1,1,1,1,1,1), Nicc(0
,1,1,1,0,0)) gt designMatrix2 Int Nic 1,
1 0 2, 1 1 3, 1 1 4, 1 1 5,
1 0 6, 1 0 gt FitLM2lt-lmFit(LinearModelDa
ta,designMatrix2) gt EFitLM2lt-eBayes(FitLM2) gt
12
Pretending That Data is Single Channel Again
  • gt LinearModelDatalt-cbind(NLNBLimmaDataNickel,NLNBL
    immaDataNickel)
  • gt dim(LinearModelData)
  • 1 14112 6
  • gt LinearModelDataM1,
  • Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2
    Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3
    Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1
    Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2
  • -0.1216908
    1.2155011
    -0.4833670 -0.1216908
  • Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3
    Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1
  • 1.2155011
    -0.4833670
  • gt LinearModelDataMlt-cbind((NLNBLimmaDataNickelA
    NLNBLimmaDataNickelM/2),NLNBLimmaDataNickelA-(NL
    NBLimmaDataNickelM/2))
  • gt LinearModelDataM1,
  • Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2
    Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3
    Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1
    Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2
  • 11.354734
    11.842679
    9.485143 476425
  • Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3
    Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1
  • 10.627177
    9.968510
  • gt
  • This will "split" ratios back into the two
    channel data
  • We could use original R and G data, but want to
    make use of loess normalization

13
Comparing Results of Two Models
gt par(mfrowc(1,2)) gt plot(EFitLM1coefficients,EF
itLM2coefficients,"Nic",main"Estimated
Differential Expressions", xlab"Model
1",ylab"Model 2") gt gt plot(-log10(EFitLM1p.valu
e),-log10(EFitLM2p.value),"Nic",main"-log10(P-
values)", xlab"Model 1",ylab"Model 2")
14
General Linear Model
y X? ?
Maximum Likelihood Estimation
15
General Linear Model
y X? ?
gt XpXlt-t(designMatrix2)designMatrix2 gt XpX
Int Nic Int 6 3 Nic 3 3 gt
XpXInvlt-solve(XpX) gt XpXInv Int
Nic Int 0.3333333 -0.3333333 Nic -0.3333333
0.6666667 gt XpXInvXpX Int Nic Int 1
0 Nic 0 1 gt BetaHatlt-(XpXInvt(designMatrix2
))LinearModelDataM1, gt BetaHat
,1 Int 10.6501404 Nic 0.2846083 gt
EFitLM2coefficients1, Int Nic
10.6501404 0.2846083 gt
16
Linear Models for paired t-test(The model
changes but the Linear Model estimation
"machinery" is the same)
y X? ?
yi is the log-ratio of Nic-measurement
toCtl-measurement at the ith microarray
Set T10 A10
y X? ?
17
General Linear Model
gt PairedTTSimpleDesignlt-c(-1,1,1) gt
PairedTTlt-lmFit(NLNBLimmaDataNickel,PairedTTSimple
Design) gt EPairedTTlt-eBayes(PairedTT) gt gt
PairedTTDesignlt-cbind(Intc(1,1,1,1,1,1),Array2c(
0,1,0,0,1,0),Array3c(0,0,1,0,0,1),Nicc(0,1,1,1,0
,0)) gt PairedTTDesign Int Array2 Array3
Nic 1, 1 0 0 0 2, 1 1
0 1 3, 1 0 1 1 4, 1
0 0 1 5, 1 1 0 0 6, 1
0 1 0 gt LMPairedTTlt-lmFit(LinearModelDa
ta,PairedTTDesign) gt ELMPairedTTlt-eBayes(LMPairedT
T)
18
General Linear Model
gt par(mfrowc(1,2)) gt plot(EPairedTTcoefficients,
ELMPairedTTcoefficients,"Nic",main"Estimated
Differential Expressions", xlab"Simple
t-test",ylab"Linear Model") gt gt
plot(-log10(EPairedTTp.value),-log10(ELMPairedTT
p.value),"Nic",main"-log10(P-values)",
xlab"Simple t-test",ylab"Linear Model") gt
19
Why Bother? Turns out that we can do things in
the GLM framework that would be really
complicated without it - Factoring Out
Gene-Specific Dye Effect
D1Cy3 effectD2Cy5 effect
y X? ?
Set T10 A10 D10
20
General Linear Model
gt designMatrixDyelt-cbind(Intc(1,1,1,1,1,1),Array2
c(0,1,0,0,1,0),Array3c(0,0,1,0,0,1),
Nicc(0,1,1,1,0,0),Cy5c(1,1,1,0,0,0)) gt
designMatrixDye Int Array2 Array3 Nic
Cy5 1, 1 0 0 0 1 2, 1
1 0 1 1 3, 1 0 1 1
1 4, 1 0 0 1 0 5, 1 1
0 0 0 6, 1 0 1 0 0 gt
FitLMADlt-lmFit(LinearModelData,designMatrixDye) gt
EFitLMADlt-eBayes(FitLMAD) gt gt EBayesFDRDlt-p.adjus
t(EFitLMADp.value,"Nic",method"fdr") gt
DyeFDRlt-p.adjust(EFitLMADp.value,"Cy5",method"
fdr") gt ELMPairedTTFDRlt-p.adjust(ELMPairedTTp.val
ue,"Nic",method"fdr") gt
21
General Linear Model
gt plot(EFitLMADcoefficients,"Nic",ELMPairedTTc
oefficients,"Nic",xlab"Linear Model with
Dye",ylab"Linear Model without Dye") gt
22
General Linear Model
gt boxplot(list(NoArrayFitLM2sigma,NoDyeEffectLM
PairedTTsigma,WithDyeEffectFitLMADsigma,NoDyeEf
fectLMPairedTTsigma),xlab"Linear Model with
Dye",ylab"Linear Model without Dye") gt
boxplot(list(NoArrayFitLM2sigma,NoDyeEffectLMPa
iredTTsigma,WithDyeEffectFitLMADsigma),xlab"Li
near Model with Dye",ylab"Linear Model without
Dye") gt boxplot(list(NoArrayFitLM2sigma,NoDyeEff
ectLMPairedTTsigma,WithDyeEffectFitLMADsigma),
xlab"",ylab"Estimated Sigma") gt
23
Factoring out Dye effect - what it really means
D1Cy3 effectD2Cy5 effect
y X? ?
Set T10 A10 D10
Set T10 A10
y X? ?
24
What happens if we don't factor out Dye effect?
D1Cy3 effectD2Cy5 effect
25
Factoring out Dye effect - what it really means
y X? ?
Set T10 A10 D10
26
To Dye or not to Dye
  • If there is a significant dye effect we need to
    factor it out or the estimates of effects of
    interest will be biased
  • The estimates of variance are likely going to be
    smaller after factoring the Dye effect even if it
    is not significant
  • The estimates of standard errors (denominator in
    the t-statistic) would also likely be smaller
  • P-values could increase due to the lower
    t-statistic or because of the loss of one degree
    of freedom

27
General Linear Model
gt par(mfrowc(1,3)) gt plot(ELMPairedTTcoefficient
s,"Nic",-log(ELMPairedTTFDR,base10),main"Empir
ical Bayes t-test", xlab"Mean
Difference",ylab"-log10(FDRpvalue)",ylimc(0,3))
gt grid(nx NULL, ny NULL, col "lightgray",
lty "dotted",lwd NULL, equilogs TRUE) gt gt
plot(EFitLMADcoefficients,"Nic",-log(EBayesFDRD
,base10),main"Empirical Bayes t-test with
Dye", xlab"Mean Difference",ylab"-log10(FDRpva
lue)",ylimc(0,3)) gt grid(nx NULL, ny NULL,
col "lightgray", lty "dotted",lwd NULL,
equilogs TRUE) gt gt plot(EFitLMADcoefficients,
"Nic",-log(DyeFDR,base10),main"Empirical Bayes
t-test for Dye", xlab"Mean Difference",ylab"-l
og10(FDRpvalue)",ylimc(0,3)) gt grid(nx NULL,
ny NULL, col "lightgray", lty "dotted",lwd
NULL, equilogs TRUE) gt
28
What happens if we don't factor out Dye effect?
  • Because the design is unbalanced (i.e. all
    treatments don't have equal number of Cy5 and Cy3
    labelings) with respect to Dye assignment - the
    existence of a dye-effect (i.e. D2?0) would
    introduce "bias"
  • If the design was balanced with respect to Dye -
    there would be no bias but the existence of a
    dye-effect would increase the variability of the
    estimates
  • The dye effect is not present in Affymetrix
    technology, but the same issues arise in
    multi-factorial experiments where the effect of
    one treatment needs to be assessed after
    factoring out the effect of other experimental
    factors
  • For example Effect of nickel on two different
    strains of mice - the overall effects of Nickel
    needs to be adjusted for the effects of the mouse
    genotype

29
Creating Design Matrices
gt arrayslt-factor(c(1,2,3,1,2,3)) gt
dyelt-factor(c("Cy5","Cy5","Cy5","Cy3","Cy3","Cy3")
) gt trtlt-factor(c("Ctl","Nic","Nic","Nic","Ctl","C
tl")) gt gt DMRDyelt-model.matrix(arraysdyetrt) gt
DMRDye (Intercept) arrays2 arrays3 dyeCy5
trtNic 1 1 0 0 1
0 2 1 1 0 1 1 3
1 0 1 1 1 4
1 0 0 0 1 5 1
1 0 0 0 6 1 0
1 0 0 attr(,"assign") 1 0 1 1 2
3 attr(,"contrasts") attr(,"contrasts")arrays 1
"contr.treatment" attr(,"contrasts")dye 1
"contr.treatment" attr(,"contrasts")trt 1
"contr.treatment" gt
Write a Comment
User Comments (0)
About PowerShow.com