Title: Multiple comparisons
1Chapter 13
- Multiple comparisons
- Also known as
- Post hoc tests
2What ANOVA does and doesnt do
- Does
- Prevents the large number of t tests from
generating a type I error - Tells us if the variance between groups is
significantly greater than the variance within
groups - In other words, whether your groups mean anything
- Doesnt
- Tell us which levels are different from other
levels
3What ANOVA does and doesnt do
- This chapter will present a number of tests to
help investigate specific differences. - These are called post hoc tests from the Latin
meaning after this. - These tests are done after the ANOVA.
4What specific level difference tests need to do
- Tell us which levels are significantly different
from which other levels - Continue to protect us from type I errors, as did
ANOVA
5Fishers protected t-test
- Fishers idea is that, if we run the ANOVA first,
and only run pair-wise t tests if the AVOVA shows
significance, then - we can be protected against the type I error
which can occur due to repetitive tests. - The ANOVA shows that there is in fact a
significant difference somewhere in the many
pairs of levels - There is a problem hidden in this however, which
is?
6Fishers protected t-test
- Fishers test is similar to the t-test except for
the estimate of the variance
7Fishers protected t-test
- MSW can substitute for the pooled variance when
we assume homogeneity of variance - Under these conditions, it is actually a better
estimate - Why?
8Fishers protected t-test
- Because if all the variances are the same,
- And we are averaging over more samples,
- We get a better estimate than if we were
averaging over just two variances.
9Fishers protected t-test
- The other advantage of MSW is that the degrees
of freedom are dfW NT-k
10Fishers protected t-test
- Fisher can be applied to as many pairs of levels
as you are interested in
11Fishers protected t-test
- Now for the weakness
- Suppose that ANOVA allows for the pair-wise test.
- It will do so if F is large.
- But F can be large based on as few as one pair
difference.
12Fishers protected t-test
- Suppose there are a large number of pairs
- Say a few of these have different means
- ?1??2, ?1??3, ?3??2
- All of the other pairs could follow the null
hypothesis - ?1??4, ?1??5 , ?1??6, ?2??4, ?2??5 , ?2??6,
?3??4, ?3??5, ?3??6, ?4??5 , ?4??6, etc. etc.
etc. - There are so many of these other pairs that we
could still get a type I error in a pair for
which the null hypothesis is true.
13Fishers protected t-test
- So, with Fisher, the ANOVA prevents you from
finding different pairs when there are none, - But it doesnt keep you from finding a few more
different pairs than you should.
14Fishers protected t-test
- Nevertheless, Fisher is the post hoc test with
the most power for k3. - And because we do not have a large number of
pairs, the increase in type I error is not
significant. - When k3 there are 3 pairs.
- Generally, given k levels, how many pairs are
there?
15Fishers protected t-test
- Number of pairs in k levels
16Fishers protected t-testExample
- A researcher wants to know if strenuous exercise
tends to delay the onset of puberty (see exercise
12B10). - The age of the onset of puberty is measured for 6
young athletes, 4 violin players, and 7 controls. - The collected data appears as follows
17Fishers protected t-testExample
- An ANOVA (conducted using the methods from
chapter 12) shows significant variance between
these groups. - Next we want to know which groups are different
from other specific groups.
18Fishers protected t-testExample
- The ANOVA provided all the information needed to
compute t.
19Fishers protected t-testExample
- Plug and chug for each pair
20Fishers protected t-testExample
- Find tcrit for ?.05, two tailed, and dfW NT-k
17-3 14 - tcrit 2.145
21Fishers protected t-testExample
- Thus
- Athletes are different from controls and
musicians, - But musicians are not significantly different
from controls.
22What if kgt3?Introduction to Tukey
- If kgt3, Fisher may be too liberal in finding
significance. - Resort to Tukeys honestly significant difference
(HSD) test. - Key idea if the largest and and smallest means
are close together, all the means must be close
together.
23TukeyRange spread
- Recall that the pair (min,max) is called the
range. - In our case we are going to consider the range of
the means. - This range will be a new test statistic called
the studentized test statistic q. - As k increases, it is likely that the range will
spread out. - Imagine throwing darts at a dart board.
- More darts also means a wider spread because
there is more opportunity for wild shots to
occur. - This is true even under the null hypothesis (all
darts thrown at a single bulls eye).
24TukeyExample of range spread with increasing k
- A psychologist is studying the relationship
between food color and appetite. - Cookies are baked with 3 colors of frosting
(independent variable factor) - Green
- Red
- Blue
- The cookies are provided to 3 groups of students
while they perform a boring task. - The number of cookies eaten by each student is
recorded (dependent variable).
25TukeyExample of range spread with increasing k
- We will assume that there is, in fact, no
difference in appetite caused by the color (null
is true).
26TukeyExample of range spread with increasing k
27TukeyExample of range spread with increasing k
- An ANOVA gives the following results
- F .794 (which is not significant)
- Xgreen 3.7
- Xred 4.7
- Xblue 2.8
- The range of the means is then
28TukeyExample of range spread with increasing k
- Now suppose we add another color (sample)
29TukeyExample of range spread with increasing k
- Now, in accordance with our thought experiment,
the null hypothesis is still true. - However, the range has now increased
- Xgreen 3.7
- Xred 4.7
- Xblue 2.8
- Xpurple 5.0
- However, the range has now increased
- This is purely by chance.
30Tukey
- So, how does Tukey take this range spreading into
account? - He simply modifies the t distribution, creating a
new one.
31Tukeys test
- Test statistic is called q, the studentized range
statistic - This is the same as the Fisher test, except
- For a factor of square root of 2
- A different distribution table is used
32Tukeys test
- The missing 2 in Tukey is factored into his new
distribution. - He didnt have to do it this way (but he did).
- The main difference between doing the Tukey test
vs. Fisher is t (or in this case q) critical. - Since qcrit is defined within a new distribution,
we must look it up in a new table (A11). - To use table A11, use k to select the column.
- Use dfW to select the row.
33Tukeys test
- Notice that in calculating the test statistic
there are no ni, just n. - This is because Tukey works best when the sample
sizes are equal. - This is the situation where it is usually used.
34Tukeys test
- However, if the ni are not equal, but close to
equal, we can use the harmonic mean for n. - Recall that when we introduced summary statistics
and measures of central tendencies, we learned
that there were some exotic variants on the mean. - Well, here is your chance to use one.
35TukeyExample
- Lets use the data from the Fisher example.
36Tukeys test
- Since the ni are not equal we will have to
compute the harmonic mean. - We have everything else from doing an ANOVA.
37Tukeys test
- Plug the ni into the formula for harmonic mean.
- Note that if ni were all equal , we could just
use n.
38Tukeys test
- Plug everything into the formula for q.
39Tukeys test
- k 3, dfW (467)-3 14.
- From table A11, we get qcrit 3.7
40Tukeys test
- So, athletes are different from musicians and
controls. - qcrit 3.7
41Tukeys test
- The conclusions are the same as Fisher gave.
- We didnt really need to use Tukey here because
k3 suggests using Fisher. - Also, we might not have used Tukey because of the
different nis. - However, this example shows how to do the
computation. - Results were consistent with Fisher but, as
expected, more conservative. - We barely made significance between musicians and
athletes.
42Exercises
- Page 349
- Compute Fisher and Tukey post hocs for the data
in problem 7. - Page 372
- 1, 3, 7, 9, 10