Design of experiment II: ANOVA and multiple hypotheses testing

About This Presentation

Title:

Design of experiment II: ANOVA and multiple hypotheses testing

Description:

For each vi we want to test if all levels have the same effect (H0) ... Cox, GM and Cochran WG. Experimental design. Dalgaard, P. Introductory Statistics with R ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 22

Provided by: gar115

Category:

more less

Transcript and Presenter's Notes

Title: Design of experiment II: ANOVA and multiple hypotheses testing

1
Design of experiment II ANOVA and multiple
hypotheses testing

ANOVA
Interactions
Contrasts
Multiple hypothesis tests

2
ANOVA tables

Standard ANOVA tables look like
Where v1,,,vp are different factors, each vi is a
vector of levels. For each vi we want to test if
all levels have the same effect (H0). df-s is
degrees of freedom corresponding hypothesis.
SSh-s are corresponding sum of the squares (h
denotes hypothesis). F is F-value used for F
distribution. Its degrees of freedom are (di,de).
Prob-s are corresponding probability. If a
particular probability is very low (less than
0.05) then we reject hypothesis that all levels
of a factor have the same effect. These values
are calculated using likelihood ratio test. Let
us say we want to test hypothesis
H0 effect of all vi-s are equal to each other
vs H1at least one of the effects is different
Then we maximise likelihood under null hypothesis
to find corresponding variance and then we
maximise the likelihood under alternative
hypothesis and find corresponding variance. Then
we calculate sum of the squares for null and
alternative hypotheses and find F-statistics

3
LR test for ANOVA

Suppose variances are
Then mean sum of the squares for the null and
alternative hypotheses as
Since the first sum of the squares is multiple of
?2 with degrees of freedom dfh and the second sum
of squares is multiple of ?2 with degrees of
freedom dfe and they are independent and
multiplicative factorz for both of the are same
then their ratio has F-distribution with degrees
of freedom (dfh,dfe). Degrees of freedom of
hypothesis is found using the number of levels of
the factor minus 1 in the simplest case.
Degrees of freedom of hypothesis is defined by
the number of constraints it implies. Degrees of
freedom of error is the number of observations
minus the number of (estimated) parameters
Using this type of ANOVA tables we can only tell
if there is significant differences between
means. It does not tell which one is
significantly different.

4
Example Two way ANOVA

Let us consider an example taken from Box, Hunter
and Hunter. Experiment was done on animals.
Survival times of the animals for various poisons
and treatment was tested. Table is
treatment
A B C D
poisons
I 0.31 0.82 0.43
0.45
0.45 1.10 0.45
0.71
0.46 0.88 0.63
0.66
0.43 0.72 0.76
0.62
II 0.36 0.92 0.44
0.56
0.29 0.61 0.35
1.02
0.40 0.49 0.31
0.71
0.23 1.24 0.40
0.38
III 0.22 0.30 0.23
0.30
0.21 0.37 0.25
0.36
0.18 0.38 0.24
0.31
0.23 0.29 0.22
0.33

5
ANOVA table

ANOVA table produced by R
Df Sum Sq Mean Sq
F value Pr(gtF)
pois 2 1.03828 0.51914
22.5135 4.551e-07
treat 3 0.92569
0.30856 13.3814 5.057e-06
poistreat 6 0.25580 0.04263
1.8489 0.1170
Residuals 36 0.83013 0.02306
Most important values are F and Pr(gtF).
In this table we have tests for main effects -
pois. and treat. Moreover we have interactions
between these two factors. If there are
significant interactions then it would be
difficult to separate effects of these two
factors. They should be considered
simultaneously. In this case pr. for interaction
is not very small and it is not large enough to
discard interaction effects with confidence. In
these situations transformation of the
observations might help. Let us consider ANOVA
table for the transformed observations. Let us
use transformation 1/y. Now ANOVA table looks
like
Df Sum Sq Mean Sq
F value Pr(gtF)
pois 2 34.903 17.452
72.2347 2.501e-13
treat 3 20.449 6.816
28.2131 1.457e-09
poistreat 6 1.579 0.263
1.0892 0.3874
Residuals 36 8.697 0.242

6
ANOVA table

According to the table after transformation pr.
corresponding to the interaction terms is high.
It means that interaction for the transformed
observations is not significant. We could remove
interaction terms. We can build ANOVA table
without the interactions. It will look like
Df Sum Sq Mean Sq
F value Pr(gtF)
pois 2 34.903 17.452
71.326 3.124e-14
treat 3 20.449 6.816
27.858 4.456e-10
Residuals 42 10.276 0.245
Now we can say that there is significant
differences between poisons as well as
treatments.
Sometimes it is useful to use transformation to
reduce the effect of interactions. For this
several different transformations (inverse,
inverse square, log) could be used. Box and Cox
transformation (boxcox command in R) may help to
find transformation.

7
Interactions

For one way ANOVA we need 1) to find out if
there is significant effect (difference between
means) 2) to analyse and find out where this
difference is located
For two way or higher levels we need first to
consider interactions. We would like to remove
interactions if we can. Otherwise tests for the
main effects (differences between means of
observations corresponding to different levels of
factors) may not be reliable. If we see
significant interactions then we can try
transformations of the data. Box and Cox family
of variance stabilising transformations is one
way of removing interactions. If we find
necessary transformation then we can carry out
further analysis using the transformed data or
use GLM with corresponding link function.
If interactions can not be removed then we may
need to combine all pairs together and work with
them as if we have one way anova with IJ levels,
where I is the number of levels for the first
factor and J is the number of levels for the
second factor.

8
What to do if there is an effect

ANOVA tables will tell us if there is an effect
somewhere. But it does not say where is this
effect. If we can make conclusions that there is
significant differences then we should carry on
and find out where are these differences. One way
(not recommended) is to look at the confidence
intervals using confint. For example for poison
data
2.5 97.5
(Intercept) 0.22201644 0.40631690
poison1 -0.09299276 0.01986776
poison2 -0.13414253 -0.06898247
treatB 0.23217989 0.49282011
treatC -0.05198677 0.20865344
treatD 0.08967989 0.35032011
Since these intervals were calculated after
making decision that there are significant main
effects they meant to be more reliable (we know
that there are effects so effects we see on this
table may correspond to actual effects that
exist). It is called Fisher least significant
differences - Fishers LSD. Confidence intervals
based directly on t distribution are very
optimistic when many tests are performed
simultaneously.

9
Contrasts

Let us say we have m parameters - ? to estimate
and we have a vector c with m elements and with
the property ?ici0. If we are dealing with
unbalanced anova then ?inici0 should be
considered. Then we can build hypothesis H0 ?i
?ici0.
?i ?ici is called contrast. If we have a mxk
matrix C then we can form k contrasts. Matrix C
is called contrasts matrix. Usually this matrix
is orthogonal. I.e.
?jcij 0
?jcij ckj 0
In anova we are interested in effects, i.e.
differences between effects of different levels
of some factors. They can be written using
contrasts matrix. Default contrasts matrix used
in R - contr.treatment uses this type of contrast
matrix.
Note that R uses contrasts by default and during
the estimation usually not the parameters
themselves but those corresponding to contrasts
are estimated.
Finding where the effect is multiple hypotheses
testing problem.

10
Multiple comparison

One comparison - use t-test or equivalent
Few comparisons - use Bonferroni
Many comparisons - Tukeys honest significant
differences, Holm, Scheffe or others

11
Bonferroni correction

If there is only one comparison then we can use
t-test or intervals based on t distribution.
However if the number of tests increases then
probability that significant effect will be
observed when there is no significant effect
becomes very large. It can be calculated using
1-(1-?)n, where ? is significance level and n is
the number of comparisons. For example if the
significance level is 0.05 and the number of
comparisons (tests) is 10 then the probability
that at least one significant effect will be
detected by chance is 1-(1-0.05)100.40.
Bonferroni suggested using ?/n instead of ? for
designing simultaneous confidence intervals. It
means that the intervals will be calculated using
?i-?j ? t?/(2n)(se of comparison)0.5
Clearly when n becomes very large these intervals
will become very conservative. Bonferroni
correction is recommended when only few effects
are compared. pairwise.t.test in R can do
Bonferroni correction.
If Bonferoni correction is used then p values are
multiplied by the number of comparisons (Note
that of we are testing effects of I levels of
factor then the number of comparisons is I(I-1)/2

12
Bonferroni correction Example

Let us take the example dataset - poisons and
try Bonferroni correction for each factor
pairwise.t.test(poison,treat,none,datapoisons)
1 2
2 0.32844 -
3 3.3e-05 0.00074
pairwise.t.test(poison,treat,bonferroni,datapoi
sons)
1 2
2 0.9853 -
3 1e-04 0.0022
As it is seen each p-value is multiplied by the
number of comparisons 32/2 3. If the
corresponding adjusted p-value becomes more than
one then it is truncated to one.
It says that there are significant differences
between effects of poisons 1 and 3 and between 2
and 3. Difference between effects of poisons 1
and 2 is not significant.
Note Command in R - pairwise.t.test can be used
for one way anova only.

13
Holm correction

Another correction for multiple tests Holms
correction is less conservative than Bonferroni
correction. It is a modification of Bonferroni
correction. It works in a sequential manner.
Let us say we need to make n comparisons and
significant level is ?. Then we calculate p
values for all of them and sort them in ascending
order and apply the following procedure
set i 1
If pilt ?/(n-i1) then it is significant,
otherwise it is not.
If a comparison number i is significant then
increment i by one and if i ? n go to the step 2
The number of significant effects will be equal
to i where the procedure stops.
When reporting p-values Holm correction works
similar to Bonferroni but in a sequential manner.
If we have m comparisons then the smallest p
value is multiplied by m, the second smallest is
multiplied by m-1, j-th comparison is multiplied
by m-j1

14
Holm correction example

Let us take the example - the data set poisons
and try Holm correction for each factor
pairwise.t.test(poison,treat,none,datapoisons)
1 2
2 0.32844 -
3 3.3e-05 0.00074
pairwise.t.test(poison,treat,holm,datapoisons)
this correction is the default in R
1 2
2 0.3284 -
3 1e-04 0.0015
The smallest is multiplied by 3 the second by 2
and the largest by 1
It says that there is significant differences
between effects of poisons 1 and 3 and between 2
and 3. Difference between effects of poisons 1
and 2 is not significant.

15
Tukeys honest significant difference

This test is used to calculate simultaneous
confidence intervals for differences of all
effects.
Tukeys range distribution. If we have a random
sample of size N from normal distribution then
the distribution of stundentised range -
(maxi(xi)-mini(xi))/sd is called Tukeys
distribution.
Let us say we want to test if ?i- ?j is 0. For
simultaneous 100? confidence intervals we need
to calculate for all pairs lower and upper limits
of the interval using
difference ql,? sd (1/Ji1/Jj)0.5 /?2
Where q is the ?-quantile of Tukeys
distribution, Ji and Jj are the numbers of
observations used to calculate ?i and ?j, sd is
the standard deviation, l is the number of levels
to be compared and ? is the degree of freedom
used to calculate sd.

16
Tukeys honest significant difference

R command to perform this test is TukeyHSD. It
takes an object derived using aov as an input and
gives confidence intervals for all possible
differences. For example for poison data (if you
want to use this command you should use aov for
analysis)
lm1 aov(timepoisontreat,datapoisons)
TukeyHSD(lm1)
poison
diff lwr upr p adj
2-1 -0.073125 -0.2089936 0.0627436 0.3989657
insignifacnt
3-1 -0.341250 -0.4771186 -0.2053814 0.0000008
significant
3-2 -0.268125 -0.4039936 -0.1322564 0.0000606
significant
treat
diff lwr upr p adj
B-A 0.36250000 0.18976135 0.53523865 0.0000083
sginificant
C-A 0.07833333 -0.09440532 0.25107198 0.6221729
insgnific
D-A 0.22000000 0.04726135 0.39273865 0.0076661
significant
C-B -0.28416667 -0.45690532 -0.11142802 0.0004090
significant
D-B -0.14250000 -0.31523865 0.03023865 0.1380432
insignific
D-C 0.14166667 -0.03107198 0.31440532 0.1416151
insignific

17
Scheffes simultaneous confidence intervals

If we have a parameter vector ? then a linear
combination cT? is estimatable if there exists a
vector a so that E(aTy) cT?.
Scheffes theorem states that simultaneous
100(1-?) confidence interval for all estimatable
? is
? (q Fq,n-r(?) )1/2 (var(?)1/2
q is the dimension of the space of all possible
contrasts, r is the rank of X (design matrix), n
is the number of observations. It can also be
applied for regression surface confidence
intervals
xT? (q Fq,n-r(?) )1/2 (var(xT(XTX)-1x))1/2

18
Testing for homogeneity of variances

One of the simplest tests for homogeneity is
using Levenes test. It can be done using the
algorithm 1) fit linear model 2) calculate
residuals 3) fit linear model using absolute
values of residuals as a response vector and 4)
test for differences. If p-values are very small
then variances are not equal (homogenic)
otherwise variances are homogenic. There is also
an R command levene.test
Another test for homogeneity of variances is
Bartlett test. It can be done using bartlett.test
in R. Bartlett test is sensitive to violation
normality assumption. Levenes test is less
sensitive to such violation.

19
Testing for homogeneity of variances

Example
lm2 lm(timepoisontreat)
summary(lm(abs(lm2res)poisontreat))
Residual standard error 0.09448 on 42 degrees of
freedom
Multiple R-squared 0.2533, Adjusted R-squared
0.1644
F-statistic 2.849 on 5 and 42 DF, p-value
0.02649
P value is 0.026 small enough (usually you would
look around 0.05).
lm2 lm(1/timepoisontreat)
summary(lm(abs(lm2res)poisontreat))
Residual standard error 0.2634 on 42 degrees of
freedom
Multiple R-squared 0.1494, Adjusted R-squared
0.04812
F-statistic 1.475 on 5 and 42 DF, p-value
0.2183
Now p-value is sufficiently big. So we can say
that variances are homogenic.

20
References

Stuart, A., Ord, KJ, Arnold, S (1999) Kendalls
advanced theory of statistics, Volume 2A
Box, GEP, Hunter, WG and Hunter, JS (1978)
Statistics for experimenters
Berthold, MJ and Hand, DJ. Intelligent Data
Analysis
Cox, GM and Cochran WG. Experimental design
Dalgaard, P. Introductory Statistics with R

21
Exercise 4