Title: Ch' 10, part II
1Ch. 10, part II
- Analysis of Variance
- (ANOVA)
2An Introduction to Analysis of Variance
- Analysis of Variance (ANOVA) can be used to test
for the equality of three or more population
means using data obtained from observational or
experimental studies. - We want to use the sample results to test the
following hypotheses. - H0 ?1???2???3??. . . ?k?
- Ha Not all population means are equal
- If H0 is rejected, we cannot conclude that all
population means are different. - Rejecting H0 means that at least two population
means have different values.
3Example Reed Manufacturing
- Analysis of Variance
- J. R. Reed would like to know if the mean
number of hours worked per week is the same for
the department managers at her three
manufacturing plants (Buffalo, Pittsburgh, and
Detroit). - A simple random sample of 5 managers from each
of the three plants was taken and the number of
hours worked by each manager for the previous
week is shown on the next slide.
4Example Reed Manufacturing
- Analysis of Variance
-
- Plant 1 Plant 2 Plant 3
- Observation Buffalo Pittsburgh
Detroit - 1 48 73 51
- 2 54 63 63
- 3 57 66 61
- 4 54 64 54
- 5 62 74 56
- Sample Mean 55 68 57
- Sample Variance 26.0 26.5 24.5
5Example Reed Manufacturing
- Analysis of Variance
- Hypotheses
- H0 ?1???2???3?
- Ha Not all the means are equal
- where
- ????1 mean number of hours worked per
week by the managers at Plant 1 - ?2 mean number of hours worked per
week by the managers at Plant 2 - ????3 mean number of hours worked per
week by the managers at Plant 3
6Example Reed Manufacturing
- Two variables
- Average hours worked - Dependent or response
variable. - Plant location - Independent or factor variable.
- The values of a factor selected for investigation
are referred to as the levels of the factor or
treatments. - Three treatments Buffalo, Pittsburgh, Detroit.
7Assumptions for Analysis of Variance
- For each population, the response variable is
normally distributed. - The variance of the response variable, denoted ?
2, is the same for all of the populations. - The observations must be independent.
8Conceptual Overview
- If the null hypothesis (H0 ?1???2???3) is
true - Each sample will have come from the same normal
probability distribution with mean ? and variance
? 2.
9Conceptual Overview
- If the null hypothesis (H0 ?1???2???3) is
true - Thus, we can think of each of the 3 sample means
as values drawn at random from the following
sampling distribution
?
10Conceptual Overview
- If the null hypothesis (H0 ?1???2???3) is
true - Thus, we can think of each of the 3 sample means
as values drawn at random from the following
sampling distribution
?
11Conceptual Overview
- If the null hypothesis (H0 ?1???2???3) is
true - And, we can use the mean and variance of the
three values to estimate the mean and
variance of the sampling distribution
?
12Conceptual Overview
- Estimate of the mean of the sampling distribution
of - Overall Sample Mean
- Estimate of the variance of the sampling
distribution of - Because , solving for ? 2 gives
Between-treatments estimate of ?2
Between-treatments estimate of ?2
13Conceptual Overview
- Between-treatments estimate of ?2 is based on the
assumption that H0 is true (H0 ?1???2???3). - If H0 is false, 2 or more samples will be from
normal populations with different means.,
resulting in 3 different sampling distributions.
14Conceptual Overview
- Between-treatments estimate of ?2 is based on the
assumption that H0 is true (H0 ?1???2???3). - If H0 is false, 2 or more samples will be from
normal populations with different means.,
resulting in 3 different sampling distributions.
x1
?1
15Conceptual Overview
- Between-treatments estimate of ?2 is based on the
assumption that H0 is true (H0 ?1???2???3). - If H0 is false, 2 or more samples will be from
normal populations with different means.,
resulting in 3 different sampling distributions.
x1
x2
?1
?2
16Conceptual Overview
- Between-treatments estimate of ?2 is based on the
assumption that H0 is true (H0 ?1???2???3). - If H0 is false, 2 or more samples will be from
normal populations with different means.,
resulting in 3 different sampling distributions.
x1
x3
x2
?1
?
?3
?2
17Conceptual Overview
- Between-treatments estimate of ?2 is based on the
assumption that H0 is true (H0 ?1???2???3). - If H0 is false, 2 or more samples will be from
normal populations with different means.,
resulting in 3 different sampling distributions. - Therefore, when the populations are not equal,
the between-treatments estimate will overestimate
?2.
18Conceptual Overview
Within-Treatments Estimate of ?2
- Each s2 is a point-estimator of ?2
- Pooled or within-treatments estimate of ?2
- Within-treatments estimate is not affected by
whether the population means are equal. - If H0 is true, the between-treatments estimate
and the within-treatments estimate will be close. - If H0 is false, the between treatments estimate
will overestimate ?2 and will be larger than the
within-treatments estimate.
19Conceptual Overview
- Between-treatments estimate of ?2 245
- Within-treatments estimate of ?2 25.67
- The ratio 245/25.67 9.5
- If H0 is true, the estimates will be similar
- and the ratio will be close to 1.
- If H0 is false, ratio will be large.
- How large must the ratio be to reject H0?
20Analysis of VarianceTesting for the Equality of
k Population Means
H0 ?1 ?2 ... ?k Ha Not all population
means are equal
- Between-Treatments Estimate of Population
Variance - Within-Treatments Estimate of Population Variance
- Comparing the Variance Estimates The F Test
- The ANOVA Table
21Between-Treatments Estimate of Population Variance
- A between-treatment estimate of ? 2 is called
the mean square treatment and is denoted MSTR. - Where
- k the number of treatments
- nj the number of observations in sample j
- the mean of sample j
- the overall sample mean
22Between-Treatments Estimate of Population Variance
- A between-treatment estimate of ? 2 is called
the mean square treatment and is denoted MSTR. -
- The numerator of MSTR is called the sum of
squares treatment and is denoted SSTR. - The denominator of MSTR represents the degrees of
freedom associated with SSTR.
23Example Reed Manufacturing
- Analysis of Variance
-
- Plant 1 Plant 2 Plant 3
- Observation Buffalo Pittsburgh
Detroit - 1 48 73 51
- 2 54 63 63
- 3 57 66 61
- 4 54 64 54
- 5 62 74 56
- Sample Mean 55 68 57
- Sample Variance 26.0 26.5 24.5
24Example Reed Manufacturing, MSTR
25Within-Treatments Estimate of Population Variance
- The estimate of ? 2 based on the variation of the
sample observations within each sample is called
the mean square error and is denoted by MSE. - Where
- s2j The variance of sample j
- nT The sum of all sample sizes
26Within-Treatments Estimate of Population Variance
- The estimate of ? 2 based on the variation of the
sample observations within each sample is called
the mean square error and is denoted by MSE. - The numerator of MSE is called the sum of squares
error and is denoted by SSE. - The denominator of MSE represents the degrees of
freedom associated with SSE.
27Example Reed Manufacturing, MSE
28Comparing the Variance Estimates The F Test
- If the null hypothesis is true and the ANOVA
assumptions are valid, the sampling distribution
of MSTR/ MSE is an F distribution with MSTR d.f.
equal to k - 1 and MSE d.f. equal to nT - k. - If the means of the k populations are not equal,
the value of MSTR/ MSE will be inflated because
MSTR overestimates ? 2. - Hence, we will reject H0 if the resulting value
of MSTR/ MSE appears to be too large to have
been selected at random from the appropriate F
distribution.
29Test for the Equality of k Population Means
- Hypotheses
- H0 ?1???2???3??. . . ?k?
- Ha Not all population means are equal
- Test Statistic
- F MSTR/MSE
- Rejection Rule
- Reject H0 if F gt F?
- where the value of F?? is based on an F
distribution with k - 1 numerator degrees of
freedom and nT - k denominator degrees of freedom.
30Example Reed Manufacturing
- Analysis of Variance
- F - Test
- If H0 is true, the ratio MSTR/ MSE should
be near - 1 since both MSTR and MSE are estimating ?
2. If - Ha is true, the ratio should be
significantly larger - than 1 since MSTR tends to overestimate ?
2. - Rejection Rule
- Assuming ? .05, F.05 3.89 (2 d.f.
numerator, - 12 d.f. denominator). Reject H0 if F gt
3.89 - Test Statistic
- F MSTR/ MSE 245/ 25.667 9.55
31Example Reed Manufacturing
- Analysis of Variance
- Conclusion
- F 9.55 gt F.05 3.89, so we reject H0.
The mean - number of hours worked per week by
department - managers is not the same at each plant.
- ANOVA Table
- Source of Sum of Degrees of
Mean - Variation Squares Freedom
Square F - Treatments 490 2 245
9.55 - Error 308 12 25.667
- Total 798 14
32Using Excel to Test for the Equality of k
Population Means
- Excels Anova Single Factor Tool
- Step 1 Select the Tools pull-down menu
- Step 2 Choose the Data Analysis option
- Step 3 Choose Anova Single Factor
- from the list of Analysis Tools
- continued
33Using Excel to Test for the Equality of k
Population Means
- Excels Anova Single Factor Tool
- Step 4 When the Anova Single Factor dialog box
appears - Enter B1D6 in the Input Range box
- Select Grouped By Columns
- Select Labels in First Row
- Enter .05 in the Alpha box
- Select Output Range
- Enter A8 (your choice) in the Output
Range box - Select OK
34Using Excel to Test for the Equality of k
Population Means
- Value Worksheet (top portion)
35Using Excel to Test for the Equality of k
Population Means
- Value Worksheet (bottom portion)
36Now You Try. Page 436, 29
37End of Chapter 10