Title: Tests after a significant F
1Tests after a significant F
- 1. The F test is only a preliminary analysis
- 2. Planned comparisons vs. post-hoc comparisons
- 3. What goes in the denominator of our test?
- 4. What happens to a when we make multiple
comparisons among means? - 5. t-test for planned comparisons
- 6. Tukeys HSD test for post-hoc comparisons
- 7. Newman-Keuls test for post-hoc comparisons
2An aside We have a set of treatment means, e.g.
From this set, we can form a number of pairs for
comparisons of treatment means here are just a
few examples of the possible pairs
vs.
vs.
vs.
3The F test is only a preliminary
- You have a number of treatments (levels of the
independent variable). - Each treatment produces a treatment mean.
- The significant F tells you only that there is a
difference among these means somewhere. - Pairwise comparisons of the means are then
necessary to pinpoint exactly where your effect
is.
4Planned comparisons
- Planned comparisons are tests of differences
among the treatment means that you designed your
experiment to make possible - Is different from ?
- We usually dont do all possible comparisons
among the entire set of treatment means. - We choose a few specific comparisons on the
basis of a theory of the behavior being studied.
5Planned comparisons
- Doing only a few comparisons is important for two
reasons - 1. With a .05, we would expect to reject H0 by
mistake once in 20 tests. - If you do all possible comparison, you might do
20 tests for one experiment so the odds are
good that one of them will be significant by
chance
6Planned comparisons
- 2. When you select a few comparisons out of the
set of all possible comparisons, you put your
theory in jeopardy. - Such specific predictions (of differences
between means) are unlikely to be correct by
chance. - If you put your theory in jeopardy and it
survives, you have more confidence in your theory - If it doesnt survive, at least you know the
theory was wrong
7Planned comparisons
- Because we only do a few comparisons when using
planned comparisons, we do not need to adjust
a. - We do not correct for a higher probability of
Type 1 error, when doing a small number of
planned comparisons.
8The denominator of our t-test
- Completely Randomized design planned comparison
uses an independent groups t-test. - The t-test requires an estimate of ?2 for the
denominator. - Where should that estimate come from?
9The denominator of our t-test
- Previously, to estimate ?2, we used a pooled
variance based on the two sample variances (SP). - In the CRD ANOVA, each sample variance gives an
independent estimate of ?2 - But the average of the sample variances gives a
better estimate of ?2.
2
10The denominator of our t-test
- In the ANOVA design, we have multiple samples,
so we have multiple sample variances. - We can use all of these sample variances to
compute an estimate of ?2. - In fact, we have already computed such an
estimate in the Mean Square Error produced for
the ANOVA.
11Planned Comparisons t-test
- t ( )
- vMSE 1 1
- ni nj
- Choose pair of means you want to test
- Find MSE in ANOVA summary table
- Feed these values into the equation above
- Evaluate tobt against ta (df MSE)
(
)
12Post-hoc tests
- Post-hoc tests are also tests of differences
among treatment means. - Here, you decide which means you want to test
post-hoc that is, after looking at the data. - Post-hoc means after the fact after
collecting and looking at the data. - A priori comparisons are those decided on
before data collection differences predicted on
the basis of theory
13Post-hoc tests
- The problem for post-hoc tests is a
- If you do one test with a .05, the long-run
probability of a Type 1 error is .05. - But when you do many such comparisons, the
probability of one Type 1 error is no longer .05.
It is roughly (.05 k) where k of
comparisons.
14Post-hoc tests
- IMPORTANT POINT
- Even if you do not do all possible comparisons
among a set of means explicitly if you just
test the biggest difference among all the pairs
of means you have implicitly tested all the
others. - This means that the problem alluded to on the
previous slides always exists for post-hoc tests.
15Two types of Post-hoc Tests
- 1. Tukeys Honestly Significant Difference
- compares all possible pairs of means
- maintains Type 1 error rate at a for the entire
set of comparisons - Qobt (Xi Xj)
- vMSE/n
- (n sample size)
16Tukeys HSD test
- To evaluate Qobt, get Qcrit from table. You will
need - df df for MSE
- k of samples in experiment
- a
- In Tukeys HSD tests, use same Qcrit for all the
comparisons in the experiment.
17Tukeys HSD test
- NOTE
- If sample sizes are not equal, use the harmonic
mean of the sample sizes - n k
- S(1/ni)
- (k of samples)
18Two types of Post-hoc tests
- 2. Newman-Keuls test
- The N-K is like Tukeys HSD in that it makes all
possible comparisons among the sample means, and
in that it uses the Q statistic. - N-K differs from HSD in that Qcrit varies for
different comparisons.
19Newman-Keuls test
- As with HSD,
- Qobt (Xi Xj)
- vMSE/n
- n sample size
- Evaluate Qobt against Qcrit obtained from table,
using df, a, and r. - r may vary for different comparisons.
20Newman-Keuls test
- To find r for a given comparison, begin by
ordering the sample means from highest to lowest. - r is then the number of means spanned by the
comparison you want to make. - X1 X3 X2 X4
- 77 74 72.5 58.75
r 4
r 2
r 3
21Example 1
- 1. Students taking Summer School courses
sometimes attempt to take more than one course at
the same time and/or have a full time job on top
of their course(s). To study the effect that
these situations may have on a students
performance, four randomly selected students in
each of four conditions are compared on their
final exam grades in the statistics course they
all took.
22Example 1
- a. Prior to data collection, it was predicted
that students taking just one course (no job)
would obtain a significantly higher mean final
exam grade than students in the
two-courses-plus-job group. It was also predicted
that the mean final exam grade of students in the
two courses (no job) group would not differ
significantly from that of students in the
one-course-plus-job group. Perform the necessary
analyses to determine whether these predictions
are borne out by the data, using ? ? .01 for each
prediction.
23Example 1a
- Notice these words
- Prior to data collection, it was predicted that
- That means this question calls for a planned
comparison so to answer the question, you do
not have to do the ANOVA first, as you would if
this were a post-hoc test. But you do need MSE.
24Example 1
- We have the raw data, so we can use the
computational formulas learned last week - CM (SXi)2 11292 79665.6625
- n 16
- SSTotal SXi2 CM
- SSE SSTotal SSTreat
- SSTreat STi2 CM
- ni
25Example 1
- The data
- S only S C.S. S Job S C.S. J
- 78 67 74 59
- 69 72 63 62
- 86 74 81 68
- 75 77 78 46
- 308 290 246 235
26Example 1
- SSE SXi2 STi2
- ni
- SXi2 782 692 462 81099
- STi2 3082 2902 2462 2352 80451.25
- ni 4 4 4 4
27Example 1
- SSE SSTotal SST
- (SXi CM) (STi CM)
- ni
- (SXi STi ) CM CM
- ni
- (SXi STi )
- ni
2
2
2
2
2
2
28Example 1
- SSE 81099 80451.25 647.75
- MSE SSE SSE 647.75 53.979
- df np 12
- Now, were ready to make the comparisons
29Example 1
- HO µ1 µ4
- HA µ1 gt µ4
- Rejection region tobt gt tn-p,a t12,.01 2.681
- Reject HO if tobt gt 2.681
30Example 1
- 1 vs 4
- t 77 58.75
- 53.979 53.979
- 4 4
- t 18.25 3.513. Reject HO.
- 5.195 (prediction is supported)
v
31Example 1
- HO µ2 µ3
- HA µ2 ? µ3
- Rejection region tobt gt tn-p,a/2 t12,.005
3.055 - Reject HO if tobt gt 3.055
32Example 1
- 2 vs. 3
- t 72.5 74
- 5.195
- t 0.29
- Do not reject HO.
33Example 1
- b. After data collection, it was decided to
compare the mean final exam grades of the one
course (no job) and two courses (no job) groups,
and also to compare the mean grade of the
one-course-plus-job group with the
two-courses-plus-job group. Each comparison was
to be tested with ? ? .05. Perform the
appropriate procedures.
34Example 1b
- Notice these words
- After data collection, it was decided to
compare - This is a post-hoc test. That means we have to do
the ANOVA first (by definition the ANOVA is the
hoc this test is post).
35Example 1
- HO µ1 µ2 µ3 µ4
- HA At least two means differ significantly
- Rejection region Fobt gt F3,12,.05 3.49
- SSTreat 80451.25 79665.6625 786.1875
- SSTotal 81099 79665.6625 1433.9375
36Example 1
- Source df SS MS F
- Treatment 3 786.1875 262.0625 4.85
- Error 12 647.75 53.979
- Total 15 1433.9375
- Decision Reject HO now, do the post-hoc test.
37Example 1
- Using the Newman-Keuls procedure
- X1 X3 X2 X4
- 77 74.0 72.5 58.75
r 3
r 3
Comparison 1 One course no job vs. two courses
no job Comparison 2 One course plus job vs. two
courses plus job
38Example 1
- HO µi µj
- HO µi ? µj
- Rejection region
- Qobt gt Qr,n-p,a/2 Q3,12,.025 3.77
- Note this Qcrit applies to both following tests,
because both span 3 means.
39Example 1
- 1 vs. 2
- Qobt 77 72.5
- 53.979
- 4
- 4.5 1.23 (Do not reject HO.)
- 3.67
v
40Example 1
- 3 vs. 4
- Qobt 74 58.75
- 53.979
- 4
- 15.5 4.16 (Reject HO)
- 3.67
v
41Example 2a
- HO µ1 µ2 µ3
- HA At least two means differ significantly
- Rejection region Fobt gt F2,87,.05 F2,60,.05
3.15 - Note We cannot use computational formulas
because we do not have raw data. So, well use
the conceptual formulas.
42Example 2
- 1. Compute XG (the Grand Mean).
- Since ns are all equal XG 10.5 18.0 21.1
- 3
- 16.533
43Example 2
- SSTreat Sni(Xi XG)2
- 30 (10.5-16.53)2 (18.0-16.53)2
(21.1-16.53)2 - 1782.2
- Now we can create the summary table
44Example 2
- Source df SS MS F
- Treatment 2 1782.2 891.1 32.7
- Error 87 ???? 27.25
- Total 90
- Decision Reject HO Rotation skill differs
significantly across the grades.
45Example 2b
- HO µ8 µ4
- HA µ8 gt µ4
- Rejection region tobt gt t87,.05 t29,.05
1.699 - Reject HO if tobt gt 1.699
46Example 1
- 8 vs 4
- t 18.0 10.5
- 27.25 27.25
- 30 30
- t 7.5 5.56. Reject HO.
- 1.348 (prediction is supported)
v