One-Way Analysis of Variance - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

One-Way Analysis of Variance

Description:

model puzzles = setting; means ... Scheffe's test for variable: PUZZLES ... Dependent Variable: PUZZLES. Contrast DF Contrast SS Mean Square F Value Pr F ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 33
Provided by: robertas5
Category:

less

Transcript and Presenter's Notes

Title: One-Way Analysis of Variance


1
One-Way Analysis of Variance
2
Recapitulation1. Comparing differences
among three or more subsamples requires a
different statistical test than either
z-tests or t-tests.2. The solution is to
perform an analysis of variance (ANOVA).3.
ANOVA involves the comparison of two estimates
for the population variance.4. One variance
estimate captures only the random
differences among sampled units, the other these
random differences plus the effects of being
in the different subsamples.5. The ratio
between the two estimated variances is
evaluated using the F-statistic sampling
distributions.
3
Recapitulation (continued)6. ANOVA is based
on the general linear model. 7. The general
linear model is Yij ? ?jXij
?ij where Xij is the subgroup difference and
?j is a constant estimating its effect on
Yij. 8. When subgroup differences do not exist,
?j 0.0. 9. The null hypothesis is H0
?1 ?2 ?3 . . . ?j
4
As an example, consider an experiment on worker
productivity in an introductory psychology class.
Thirty students were randomly selected for the
experiment from PSYCH 100 and randomly assigned
to one of three subgroups. The productivity
measure (Yij) was the number of puzzles that
these students solved in a fixed period of time.
The three experimental conditions (treatments,
Xij) were left alone to solve puzzles solving
puzzles in the presence of the other nine group
members (so that each subject could observe her
or his own rate of puzzle solving) and solving
puzzles in the presence of other subjects AND in
the presence of a monitor, meant to simulate a
supervisor. The results look like this
5
Not
Monitored Alone Not Monitored Together
MonitoredSubject Yi,1 Subject
Yi,2 Subject Yi,3
1 13 1 9 1
8 2 14 2 11 2 6 3
10 3 10 3 9 4
11 4 8 4 7 5 12 5
10 5 8 6 10 6
12 6 10 7 12 7
11 7 8 8 12 8
10 8 9 9 13 9 9 9 610
11 10 10 10
11  N1
10 N2 10 N3 10 ?1
118 ?2 100 ?3 82 _
_ _ Y1 11.8 Y2 10.0 Y3
8.2_Y 10.0
6
Our hypothesis (H1) is that working conditions
affect worker performance in ways that we do not
fully understand H1 ?1 ? ?2 ? ?3 Our null
hypothesis (H0) is that worker performance is
unaffected by working conditions H0 ?1
?2 ?3Since a comparison of THREE subgroup
means is required, t-tests are inappropriate.
The approach known generically as the analysis of
variance must be used.
7
Not
Monitored Alone Not Monitored Together
MonitoredSubject Yi,1 Subject
Yi,2 Subject Yi,3
1 13 1 9 1
8 2 14 2 11 2 6 3
10 3 10 3 9 4
11 4 8 4 7 5 12 5
10 5 8 6 10 6
12 6 10 7 12 7
11 7 8 8 12 8
10 8 9 9 13 9 9 9 610
11 10 10 10
11  N1
10 N2 10 N3 10 ?1
118 ?2 100 ?3 82 _
_ _ Y1 11.8 Y2 10.0 Y3
8.2_Y 10.0
8
First we calculate the total sum of
squaresWe begin with the first score in the
first group and continue through the 30th score
in the third group, as followsSSTotal (13 -
10.0)2 (14 - 10.0)2 ... (11 - 10.0)2
(9 - 10.0)2 ... (10 - 10.0)2 (8 - 10.0)2
... (11 - 10.0)2 116
9
Next we calculate the sum of squares between, as
followsFor the first of the three subgroups,
we find the difference between the group mean and
the grand mean, square that difference then
multiply it by the size of the subgroup, then do
the same for the other two subgroups. Then we
sum these three products, as followsSSBetween
10(11.8 - 10.0)2 10(10.0 - 10.0)2 10(8.2
- 10.0)2 64.8
10
Not
Monitored Alone Not Monitored Together
MonitoredSubject Yi,1 Subject
Yi,2 Subject Yi,3
1 13 1 9 1
8 2 14 2 11 2 6 3
10 3 10 3 9 4
11 4 8 4 7 5 12 5
10 5 8 6 10 6
12 6 10 7 12 7
11 7 8 8 12 8
10 8 9 9 13 9 9 9 610
11 10 10 10
11  N1
10 N2 10 N3 10 ?1
118 ?2 100 ?3 82 _
_ _ Y1 11.8 Y2 10.0 Y3
8.2_Y 10.0
11
Finally, we calculate the sum of squares
withinThis means that we find the squared
difference between each of the ten scores in the
first group and the mean for the first group,
then the squared difference between the ten
scores in the second group and the mean for the
SECOND group, then the squared difference between
the ten scores in the third group and the mean
for the THIRD group, and finally add all 30
squared differences togetherSSWithin (13 -
11.8)2 (14 - 11.8)2 ... (11 - 11.8)2
(9 - 10.0)2 ... (10 - 10.0)2 (8 - 8.2)2
... (11 - 8.2)2 51.2
12
To check our calculations, remember the
identity TotalSS BetweenSS WithinSS
116 64.8 51.2Next, we need the
degrees of freedom. Total degrees of freedom is
simply number of cases less one, N - 1. Here,
there are 30 cases, so there are 29 total degrees
of freedom. For degrees of freedom between, the
three subgroup means are treated as scores, so
there are J - 1 across subgroups, here 3 - 1,
giving us 2 degrees of freedom between. Finally,
we lose a degree of freedom by partitioning into
subgroups, i.e., N - J. Here we have three
subgroups, so we lose a degree of freedom for
each giving us 30 - 3 or 27 degrees of freedom
within.
13
Analysis of variance results by convention are
reported in what is called an "ANOVA summary
table"So
urce SS df Mean Square
F Between
64.80 2 32.40
17.05Groups Within 51.20
27 1.90Groups Total
116.00 29 

14
We perform a significance test in the usual way,
first by selecting alpha, then locating the
appropriate sampling distribution, finding the
critical value, and comparing this value to the
value of the F-statistic. With alpha 0.05, we
find Appendix 3, p. 544. In this example we have
2 and 27 degrees of freedom. The table of
critical values has degrees of freedom between as
COLUMN headings (n1) and degrees of freedom
within as ROW headings (n2). In column 2, row
27 we find the critical value to be 3.35. Since
our F-value is 17.05, GREATER than 3.35, we know
that it lies well inside the region of rejection,
hence we REJECT the null hypothesis at the 0.05
level. Substantively, this means that we infer
that the conditions under which one performs a
task DO have an effect on performance.
15
(No Transcript)
16
The F-test is a significance test, an inferential
statistic. It tells us only whether or not
exposure to the treatment variable has measurable
consequences that are different from chance. It
does NOT tell us about the strength of
association between the treatment (Xij) and the
dependent variable, Yij. For this we need a
measure of association.The sum of squares
BETWEEN represents the variance attributable to
the treatment variable, Xij. The TOTAL sum of
squares expresses the total amount of variance in
the dependent variable, Yij, that is, the total
variance "to be explained" statistically. A
ratio of the two is a straightforward description
of the percentage of variance in Yij accounted
for by its association with Xij. Statistically
this is called R-square.
17
From the example above, the sum of squares
between is 64.80 and the total sum of squares is
116.00. Thus, R-square isThe F-test tells
us that treatment categories (working conditions)
differ in ways that cannot be explained as
chance. R-square tells us that 56 percent of the
variation in task performance is associated with
differences in working conditions.
18
Knowing that the treatment variable has a
statistically significant effect does not tell us
WHICH specific treatment category or categories
have greater impact than others. In our example,
we know only that AT LEAST ONE of the
puzzle-solving conditions differs from one (or
both) of the remaining two, but we do not know
which. In other words, we do not know which of
the following alternative hypotheses is (are)
supported ?1 ? ?2 ?3 ?1 ?2 ?
?3 ?1 ? ?3 ?2 ?1 ? ?2 ? ?3 We
need a way to statistically compare the subgroups.
19
There are two strategies comparisons explicitly
planned in advance are called a priori tests
those performed after an initial ANOVA are called
post hoc comparison tests. Of the latter, we
will use only the method known as the Scheffé
test.The Scheffé method creates a threshold
for comparing subgroup means (once an ANOVA null
hypothesis has been rejected) called the minimum
significant difference. Differences between two
subgroup means that exceed this minimum
significant difference are statistically
significant that is, their difference appears to
be real rather than due to chance.The
algorithm is in Sirkin (1999), p. 333.
20
where _ _ Yj and Yj1 are the
subsample means being compared dfBetween
is degrees of freedom between in the
ANOVA F? is the critical value of F at the
significance level (?) chosen for the
comparison MSWithin is the ANOVA mean square
within and nj and nj1 are the sizes two
subsamples being compared
21
In the puzzle-solving example, _ _
_ Y1 11.8, Y2 10.0, and Y3 8.2
dfBetween 2 F? 2.51 (? .10, df 2, 27)
MSWithin 1.90 and n1 n2 n2
10Hence,
22
  • The value 1.381 is the minimum significant
    difference,
  • the threshold we use to compare subsample means
    with
  • ? set at 0.10. Sirkin (1999) contains no 0.10 F
    table.
  • Here is how Sirkin would organize our comparison
    tests
  • _ _
  • H0 Yj Yj1 Critical
    Value Conclusion
  • ?1 ?2 11.8 10.0 1.80 gt 1.381
    Reject H0
  • ?2 ?3 10.0 8.2 1.80 gt 1.381
    Reject H0
  • ?1 ?3 11.8 8.2 3.60 gt 1.381
    Reject H0

23
Sample SAS Program Puzzle-Solving
Example libname old 'a\'libname library
'a\' options nodate nonumber ps66 proc
glm dataold.exampleclass settingmodel
puzzles settingmeans setting / scheffe alpha
0.1contrast 'Alone vs. Together' setting 1
-1 0contrast 'Alone vs. Monitor' setting 1
0 -1contrast 'Together vs. Monitor' setting 0
1 -1contrast 'Alone vs. Others' setting 2
-1 -1contrast 'Together vs. Others' setting -1
2 -1title1 'ANOVA With Comparison Tests'run
24
ANOVA With Comparison Tests 
General Linear Models Procedure
Class Level Information 
Class Levels Values 
SETTING 3 (1) alone (2)
monitor (3) together  
Number of observations in data set
30  
25
ANOVA With Comparison Tests 
General Linear Models Procedure 
Dependent Variable PUZZLES
Sum of MeanSource
DF Squares Square
F Value Pr gt F Model 2
64.80000000 32.40000000 17.09
0.0001 Error 27
51.20000000 1.89629630 Corrected Total
29 116.00000000 
R-Square C.V. Root MSE
PUZZLES Mean  0.558621
13.77061 1.3770607
10.000000  Source DF
Type I SS Mean Square F Value Pr gt
F SETTING 2 64.80000000
32.40000000 17.09 0.0001 Source
DF Type III SS Mean Square
F Value Pr gt F SETTING 2
64.80000000 32.40000000 17.09
0.0001 
26
ANOVA With Comparison Tests 
General Linear Models
Procedure  Scheffe's
test for variable PUZZLES  NOTE This
test controls the type I experimentwise error
rate but generally has a higher
type II error rate than REGWF for all
pairwise comparisons 
Alpha 0.1 df 27 MSE 1.896296
Critical Value of F 2.51061
Minimum Significant Difference
1.38  Means with the same letter are
not significantly different. 
Scheffe Grouping Mean N
SETTING  A
11.8000 10 alone 
B 10.0000 10 together 
C 8.2000
10 monitor
27
ANOVA With Comparison Tests 
General Linear Models
Procedure  Scheffe's
test for variable PUZZLES  NOTE This
test controls the type I experimentwise error
rate but generally has a higher
type II error rate than REGWF for all
pairwise comparisons 
Alpha 0.1 df 27 MSE 1.896296
Critical Value of F 2.51061
Minimum Significant Difference
1.38  Means with the same letter are
not significantly different. 
Scheffe Grouping Mean N
SETTING  A
11.8000 10 alone 
B 10.0000 10 together

B 8.2000 10 monitor
28
ANOVA With Comparison Tests 
General Linear Models
Procedure  Scheffe's
test for variable PUZZLES  NOTE This
test controls the type I experimentwise error
rate but generally has a higher
type II error rate than REGWF for all
pairwise comparisons 
Alpha 0.1 df 27 MSE 1.896296
Critical Value of F 2.51061
Minimum Significant Difference
1.38  Means with the same letter are
not significantly different. 
Scheffe Grouping Mean N
SETTING  A
11.8000 10 alone
A  A
10.0000 10 together
A  A
8.2000 10 monitor
29
ANOVA With Comparison Tests 
General Linear Models Procedure 
Dependent Variable PUZZLES Contrast
DF Contrast SS Mean Square F Value
Pr gt F Alone vs. Together 1
64.80000000 64.80000000 34.17
0.0001Alone vs. Monitor 1
16.20000000 16.20000000 8.54
0.0069Together vs. Monitor 1
16.20000000 16.20000000 8.54
0.0069Alone vs. Others 1
48.60000000 48.60000000 25.63
0.0001Together vs. Others 1
48.60000000 48.60000000 25.63
0.0001
30
(No Transcript)
31
One-Way Analysis of Variance
Exercise Four groups of randomly selected and
randomly assigned students were taught a basic
course in statistics by four different methods.
A standardized test was given at the end of the
semester to all four groups. Evaluate the
differences in teaching approaches using the
Analysis of Variance. Assume that a 0.05, and
use the F distribution (Appendix 3, p.
544).   Group 1 Group 2 Group 3 Group 4 
20 15 22 19 22 18 21 23
21 20 24 20 20 18 25 18
19 19 24 15  1. Expressed
symbolically, what is the null hypothesis?
______________2. What is the value of the
sum of squares between? ______________3.
What is the value of the sum of squares within?
______________ 4. How many degrees of
freedom between? ______________ 5. How
many degrees of freedom within?
______________ 6. What is the value of the
mean square between? ______________ 7.
What is the value of the mean square within?
______________ 8. What is the value of the
F-ratio? ______________ 9. What is the
critical value of F? ______________ 10.
Do you reject the null hypothesis?
______________
32
One-Way Analysis of Variance Exercise
Answers Four groups of randomly selected and
randomly assigned students were taught a basic
course in statistics by four different methods.
A standardized test was given at the end of the
semester to all four groups. Evaluate the
differences in teaching approaches using the
Analysis of Variance. Assume that a 0.05, and
use the F distribution (Appendix 3, p.
544).   Group 1 Group 2 Group 3 Group 4 
20 15 22 19 22 18 21 23
21 20 24 20 20 18 25 18
19 19 24 15  1. Expressed
symbolically, what is the null hypothesis?
?1?2?3?4 2. What is the value of the
sum of squares between? 76.55 3. What
is the value of the sum of squares within?
64.00  4. How many degrees of freedom
between? 3  5. How many degrees of
freedom within? 16  6. What is the
value of the mean square between? 25.517
 7. What is the value of the mean square
within? 4.000  8. What is the value of
the F-ratio? 6.379  9. What is the
critical value of F? 3.24  10. Do you
reject the null hypothesis? Yes, Reject
Write a Comment
User Comments (0)
About PowerShow.com