Title: Two Factor ANOVA and the BACI sampling design
1- Two Factor ANOVA and the BACI sampling design
- Non-parametric two-factor tests
- Resampling method for two-factor tests.
2Two Factor Designs
- Consider studying the impact of two factors on
the yield (response) - Note The 1 and 2, etc, mean Level 1, Level
2, etc.., NOT metric values. - Here we have R 3 rows (levels of the Row
factor), C 4 (levels of the column factor), and
n 2 replicates per cell nu for cell if not
all equal
3Model
- i 1,, R
- j 1,, C
- k 1,, n
- In general, n observations per cell, R? C cells
4 5- ALL the terms are somewhat intuitive, except
for - The term is more
intuitively written as
Adjustment for row membership
Adjustment for column membership
How a cell differs from grand mean
6- We can, without loss of generality, assume (for a
moment) that there is no error why then might
the above equation be non-zero? - Answer INTERACTION
- Two basic ways to look at interaction
1)
If AHBH 13, no interaction If AHBH 13,
interaction If AHBH - When B goes from BL?BH, yield goes up by 3 (5
?8). - When A goes from AL?AH, yield goes up by 5
(5 ?10). - When both changes of level occur, does
yield go up by the
sum of 3 and 5?
7- Interaction degree of difference from sum of
separate effects - Holding BL, what happens as A goes from AL?AH?
5 - Holding BH, what happens as A goes from AL?AH?
9 - Is the effect of one factor (i.e., the impact of
changing its level) is DIFFERENT for all levels
of another factor, then INTERACTION exists
between the two factors
2)
NOTE - Holding AL, BL ? BH has impact 3
- Holding AH, BL ? BH has impact 7
(AB) (BA) or (9-5) (7-3)
8Means in a 2-factor ANOVA, with various effects
of the factors and the interaction.
- a) No effect of factor A, small effect of factor
B. - b) Large effect of factor A, small effect of
factor B, and no interaction - c) No effect of factor A, small effect of factor
B, and no interaction - d) Large effect of factor A, large effect of B,
and no interaction
9(e)
(f)
B2
X
X
B2
B1
B1
A1
A2
A1
A2
B1
(h)
(g)
B1
X
X
B2
B2
A1
A2
A1
A2
- e) No effect of A, no effect of B, but
interaction between A and B - f) Large effect of A, but no effect of B, with
slight interaction - g) No effect of A, large effect of B, with large
interaction - h) Effect of A, effect of B, with large
interaction
10- Going back to the (model) equation bringing
to the other side of the equation, we get - if we then square both sides, triple sum both
sides over i, j, and k, we get, (after noting
that all cross-product terms cancel)
11- Or,
- And in terms of degrees of freedom,
- In our example
12(No Transcript)
13ANOVA
H0 All Row Means Equal Hi Not all Row Means
Equal
FTV(2,12) 3.88 Reject H0
H0 All Col. Means Equal Hi Not all Col.
Means Equal
FTV(3,12) 3.49 Accept H0
H0 No Intn between factors Hi There is
intn between factors
FTV(6,12) 3.00 Accept H0
14- An issue to think about
- Since Vintn cannot be negative, and
MSI1.83strong evidence that Vintn is not 0. - If this is true, E(MSI) ??2, and we should
combine MSI and MSW (i.e.. pool) estimates.
This gives
We have
15Another Issue
- The table of 4 pages ago assumes what is called a
Fixed Model. There is also what is called a
Random Model (and a Mixed Model).
Column fixed row random
16- Fixed
- Random
- Fixed
- Random
- Fixed
- Random
Specific levels chosen by the experimenter
Levels chosen randomly from a large number of
possibilities.
All levels about which inferences are to be made
are included in the experiment.
Levels are some of a large number possible.
A definite number of qualitatively
distinguishable levels, and we plan to study them
all. Or a continuous set of quantitative
settings, but we choose a suitable, definite
subset in a limited region and confine inferences
to that subset.
Levels are a random sample from an infinite
population
17- In a great number of cased the investigator may
argue either way, depending on his mood and his
handling of the subject matter. In other words,
it is more a matter of assumption than of
reality. - Some authors say that if in doubt, assume fixed
model. Others say things like I think in most
experimental situations the random model is
applicable. The latter quote is from a person
whose experiments are in the field of biology.
18Two Factors with No Replication, No Interaction
- When theres no replication, there is no pure
way to estimate ERROR. - Error is measured by considering more than one
observation (i.e. replication) at the same
treatment combination (i.e. experimental
conditions).
19- Our model for analysis is technically
- We can write
- After bringing to the other side of the
equation, squaring both sides, and double summing
over i and j, We find
20- Degrees of freedom
- We know,
- If we assume
- and we can call
21- And our may be rewritten
- and the labels would become
- in our problem
22- And
- What if were wrong about there being no
interaction?
ANOVA
At ?.01
FTV(3,6) 9.78
FTV(2,6) 10.93
TSS 62 11
23- If we think our ratio is, in Expectation,
- (Say,
for ROWS) - and it really is (because theres interaction)
- being wrong can lead only to giving us an
underestimated Fcalc. - Thus, if weve REJECTED Ho, we can feel confident
of our conclusion, even if theres interaction. - If weve ACCEPTED Ho, only then could the no
interaction assumption be CRITICAL.
24Non-parametric 2 Factor ANOVA with replications
- If assumptions of normality and constant variance
are not met by the data, rank the data, then use
the usual parametric ANOVA on the ranked data. - Using ranks is more robust than finding a
transformation that works. - If there are no interaction, you can use the
2-factor with replication procedure given in
Conover (1980).
25Non-parametric alternative to 2-Factor ANOVA
without replication Friedmans Test
- Example TSS at 9 sites during 4 seasons.
H0 MAMBMCMD HA Not all medians TSS equal
during the 4 seasons
26Convert to ranks within each row
27- Test Statistic
- For the given problem, R 9, C 4,
- Under the null hypothesis, FR may be approximated
by a Chi-Square distribution with (C-1)d.f. - For our problem with 3 d.f., critical value of
Chi-Square distribution at ? 5, 7.815. - Since 20.037.815, we reject the null hypothesis
and conclude that there are differences among the
seasons with respect to TSS.
28Stratified Shuffling
- Shuffling or randomization in its simplest form
is used to test the generic null hypothesis that
one variable (or groups of variables) is
unrelated to another variable (or groups of
variables). - Significance is assessed by shuffling one
variable (or set of variables) relative to
another variable (or sets of variables).
Shuffling ensures that there is in fact no
relationship between the variables. - If the variables are related, then the original
unshuffled data should be unusual relative to the
values of the test statistic shuffling because of
the presence of the blocking factor. Hence each
block must be considered as a strata (or block).
29- Consider the case of 2 blocks (nests) and 3
Treatments (Distance) with R.V. (changes in
exposure times in seconds) as given below
30- To test whether distance has an effect, we can
use the test statistic given by the pairwise sum
of squared differences of the mean exposure times
at each distance. That is - The observed SSD for the example above is
SSD (mean _at_ 0.75 - mean _at_ 1.25)2 (mean _at_ 0.75
- mean _at_ 2.5)2 (mean _at_ 1.25 - mean _at_ 2.5)2
31- The question now is how likely is it that the
observed value of 12.48 is equaled or exceeded by
chance alone I.e. if in fact there is no distance
effect? If the probability is very low (less
than 0.05, we say that it is unlikely that there
is no distance effect. Hence the hypothesis of
no distance effect is rejected. - If there is no distance effect, we can in fact
combine the data for each strata (nest) and
shuffle them. - For example, for nest 1, the combined data are
2, -1, 6, 5, 7, 0, and 8. If we shuffle them
once (randomly rearrange them), we may get 0, 7,
6, 2, 8, -1, 5. Hence, the values 0, 7, 6 could
have been at the 0.75nm distance, 2 and 8 could
have been at the 1.25nm distance, and -1, and 5
could have been at the 2.5nm distance.
32- Similarly, we do the same for the nest 2 data.
After one cycle of shuffling, we would get one
value of SSD. Repeat say 10,000 times, we will
get 10,000 values of SSD giving us a sampling
distribution of SSD. - An estimate of the p-value is obtained by
counting the proportion of SSDs greater or equal
to the observed SSD.