Title: Comparing Many Means:
1Chapter 7
- Comparing Many Means
- One Way ANOVA
2Introduction
- Cant use a t-test if you want to compare three
or more groups - If certain assumptions are met, can use an F-test
3grand mean
total SS measures variability about grand mean
(ie total variability of the data)
error SS measures variability about the three
group means (ie variability not explained by the
existence of the groups)
MODEL 0 the three samples come from the same
population MODEL 1 the three samples come from
populations with possibly different means
4More on ANOVA
- SS(Total) is the sum of the squared residuals
from Model 0 - SS(Error) is the sum of the squared residuals
from Model 1 - F-test is comparing the fit of the data to the
two models
5SS(drug) SS(error)
Small Big Small Big Big Big Small Small
6ANOVA Assumptions
- As per the rule for sample means
-
- The population variances are equal
- Random sample
- Population of measurements
- Follows a bell-shaped curve
- - or -
- Not bell-shaped, but sample is large
Must hold for ALL samples
7Checking ANOVA assumptions
- Look at the within-group variances do they look
much the same? - Can test for equality of variances (robust?). If
fail, can do Welch ANOVA. - Look at the distribution of the residuals
(Save-gtSave centered) should be normal
8Which means are significantly different from
which other means?
- If the data are balanced, can use JMPs overlap
marks
To be significantly different at the 5 level,
two means must not overlap their overlap marks
9Means comparisons for unbalanced data
- Compare means ? Each pair, students t
- Click on a circle groups that are not
significantly different turn red
10Similar information in table-form
11Adjusting for Multiple Comparisons
- There is a problem with doing lots of pairwise
t-tests (e.g. when comparing 10 treatments, there
are 45 t-tests!) - At the 5 level, each test has a 5 chance of
committing a Type I error - If the null hypothesis is true and you perform 45
t-tests, chance of at least one Type I error is
90 - Tukey-Kramer Honestly Significant Difference
(HSD) method makes an adjustment to control the
overall Type I error rate
12bigger!
13Power
- Especially valuable in designing studies
- For example, in the Colposcopy study, we needed
to make sure that we had enough data to detect a
clinically meaningful difference if one existed - First revisit power in the matched pairs context
14Power in the Matched Pairs Context
- Power is the probability of getting a certain
p-value (eg 0.05 or 0.01) if the true average
difference and the true standard deviation of the
difference are specified - Example (colpo) patient pain score - physician
pain score - Expect standard deviation of the difference to be
about 8 (based on previous studies) - Therefore if we study n women, expect the
standard error of the sample average difference
to be 8/?n
15Matched Pair Power cont.
- Lets say we will reject the null of no
difference if the p-value is 0.05 or less - Need an absolute standardized score of 2 or more
to get this - So, need abs(sample mean 0)/(8/?n) to be at
least 2 - That is, sample mean of at least 16/?n
- What is the probability that this happens???
- e.g. for n100, what is the chance that the
sample mean is at least 1.6?
Hypothesized mean
16Matched Pair Power cont.
- Depends on what the true difference beteen
patient and phsyician scores really is!
True Average 95 of sample
probability of Difference
averages getting score gt1.6
- (3.4,6.6)
close to 1 - (1.4,4.6)
96 - 2 (0.4,3.6)
60 - 1 (-0.6,2.6)
23
So, well powered to detect a true difference of
3 or more
17Matched Pair Power cont.
- If the sample size is 500
True Average 95 of sample
probability of Difference
averages getting score gt0.7
- (4.3,5.7)
close to 1 - (2.3,3.7)
close to 1 - 2 (1.3,2.7)
close to 1 - 1 (0.3,1.7)
70
With 500 have a good chance to detect a true
difference of 1
18Power
- General Idea make sure you have enough data to
have a good chance (e.g. 80) of detecting a
meaningful difference - detecting here means getting a small p-value
- Ingredients effect size, standard deviation,
alpha, sample size - Effect size in ANOVA is complicated
19average distance of group means to the grand mean
square root of the mean square error
20Unequal Variances
- ANOVA assumes the population variances are equal
- Can test this
- Welch ANOVA does not make the equal variance
assumption but is less powerful
21Nonparametric Tests
- Relax the normality assumption
- Wilcoxon uses ranks
- Median test dichotomizes each observation (above
median or below median)