Title: Week 12 objectives
1Week 12 objectives
- 1. Overview of hypothesis tests
- 2. ANOVA (a) Example
- (b) Minitab
- (c) ANOVA Table
- (d) Conceptual view of ANOVA
- 3. Assumptions and Conditions for ANOVA
21. Overview of hypothesis tests
3When may ANOVA needed?
- A supermarket chain store executive needs to
determine whether the sales of a new product are
affected by the aisle in which the product is
stored. - Possible experiment
- 10 aisles in the store
- Locate the product in each of the 10 aisles for a
week and record daily sales - Are there differences among mean daily sales?
- A claim about ten populations of quantitative
data is to be tested using sample information
42. ANOVA introduction
- An Analysis of variance (ANOVA) is a test about
population means - The usual null hypothesis states that some group
of population means are equal - The alternative hypothesis is simply that those
means are not all equal - This test about means is carried out by
decomposing certain sample variances
5ANOVA hypotheses
62(a) Example 1
- Random samples of 15 students from each of three
faculties record the dollar amounts spent on
textbooks and stationery - Are the mean spending levels different across the
three faculties? - How to answer the question?
7The boxplots for the faculty spending example
8Discussion
- Conclusions about population mean spending levels
are to be based on samples - This requires a Statistical inference method
- Which inference method?
- The question to be investigated is about whether
population mean spending levels are the same or
not, i.e. need a yes/no answer - This indicates a hypothesis test
- For differences between more than two population
means, ANOVA is required
9ANOVA the F ratio test statistic
The form of the test statistic is the F ratio
10ANOVA the F ratio test statistic (cont.)
- The ratio is sensitive to whether or not the null
hypothesis is true - The null distribution of the F-ratio is the
F-distribution - Using the F-ratio to carry out a test is called
an F-test - The results are laid out in a very convenient
ANOVA Table
112(b) ANOVA using Minitab
- For one-way ANOVA, data can either be presented
in separate columns - (use Stat gt ANOVA gt One-way (unstacked))
- or
- stacked in a single column, in which case another
column is needed to contain the sample labels. - Then use Stat gt ANOVA gt One-way.
122(c) Use the ANOVA Table to write out a six
steps solution Minitab output for Example 1
(Note the within samples SSQ is called the
error SSQ in Minitab).
13Six steps Solution
14The six steps (cont.)
- P-value 0.003
- Decision rule Reject H0 if P-value lt 0.05, but
if P-value gt 0.05 then we cannot reject H0. In
the present case P-value 0.003 lt .05, so H0
is rejected. - There is strong evidence to conclude that at
least two population mean expenditure levels are
different.
15More comments P-values and the distribution of
the F ratio
- tends to have a value around 1 if H0 is true,
but becomes inflated if H0 is not true. - Thus large values of F are significant.
- The P-value will be the probability Pr an F
distributed variable gt observed F value - From Fk-1,N-k distribution tables, or by Minitab.
16Example F distribution
17More about the F distribution
182(d) Conceptual View of ANOVA (1)
Consider the following two experiments to examine
the effectiveness of three different teaching
methods on two campuses (City West and Mawson
Lakes). Here is the raw data
Which experiment has better evidence of a
difference in the true (POPULATION) average
results among the methods?
19Conceptual view of ANOVA (2)
Could variations among the means this large be
plausibly due to chance
OR Is it a good evidence that
POPULATION means differ?
It seems that in experiment 1, it is easier to
justify the differences between the levels of the
factor because the results are so consistent. The
heart of ANOVA is to compare the variability
among the group means to the variability within
each group.
20Conceptual View of ANOVA (3)
- In experiment 1, the variability among the group
means is much larger than the variability of
individual observations within each single group. - This is the basic idea behind ANOVA.
- This technique examines the data for evidence of
differences in the corresponding population means
by looking at the ratio
21Review of SSQ (from week 2)
In ANOVA a variance is called a mean square.
222(a) ANOVA decomposition of SSQ (leave out as
non-examinable)
23ANOVA decomposition of SSQ, contd.
- The decomposition
- Total SSQ Between samples SSQ Within samples
SSQ applies also to degrees of freedom - N1 (k1) (Nk)
- Dividing SSQ terms by degrees of freedom gives
mean-square, or MSQ terms - If some of the SSQ terms become inflated more
than others, so will the corresponding MSQ terms - But if the null hypothesis is true, both MSQ
terms, are estimates of the same thing, the
natural experimental variability of the data.
24ANOVA how the decomposition leads to a test
- If the null hypothesis of equal population means
is true, the corresponding MSQ terms both
estimate error variance and are approximately
equal. - But if the population means differ, the Between
samples SSQ becomes inflated and its MSQ tends to
be bigger than the Within samples, or error, MSQ.
- Thus the ratio of MSQ terms provides a test
statistic if H0 is true, both MSQ terms estimate
the same thing, and the ratio is about 1. - So the value 1 is roughly in the middle of the
null distribution of the test ratio.
25Lecture Exercise 1 the textbook example (12.2.1)
26Will there be evidence of different population
means?
Boxplots in the Lecture Exercise 1
27Lecture Exercise 1 cont.
Analysis of Variance Source DF SS MS
F P Brands 2 5.09 2.54 0.87
0.437 Error 18 52.87 2.94 Total 20
57.95
- The between and within samples MSQ terms are
5.09/2 2.54 and 52.87/18 2.94 - The ratio of these MSQ terms is 2.54/2.94 0.87,
which is less than 1.
28Solution Steps (i), (ii), (iii)
29Solution Steps (iv), (v) and (vi)
- P-value P-value 0.437
- Decision rule Reject H0 if P-value lt 0.05, but
if P-value gt 0.05 then we cannot reject H0. In
the present case P-value 0.437 gt .05, so H0
cannot be rejected. - There is no evidence suggesting differences
between population mean brand levels of toxin.
30Lecture Exercise 2
- The XYZ Corporation is interested in possible
differences in days worked by salaried employees
in three different departments in the financial
area. - A survey of 23 randomly chosen employees reveals
the data shown below. - At 1 significance level, are the mean annual
attendance rates different for employees in these
three departments?
31Boxplots for Lecture Example 2
32Minitab output for Exercise 2
One-way ANOVA Budgets, Payables,
Pricing Analysis of Variance Source DF
SS MS F P Factor 2
1804 902 3.43 0.052 Error
20 5257 263 Total 22
7060
Level N
Mean StDev Budgets 5 261.20
11.95 Payables 10 238.00 21.24 Pricing
8 244.38 9.46
Pooled StDev 16.21
33Solution Steps (i), (ii), (iii)
34Solution Steps (iv), (v) and (vi)
- P-value P-value 0.052
- Decision rule Reject H0 if P-value lt 0.01, but
if P-value gt 0.01 then we cannot reject H0. In
the present case P-value 0.052 gt 0.01, so H0
cannot be rejected. - There is not enough evidence to suggest
differences between mean annual attendance rates
in the three departments.
353. Assumptions and Conditions
- Well defined continuous variables?
- Representative sample?
- Large sample sizes or normally distributed
variables? - Look at a normal probability plot of residuals,
but large degrees of freedom for error term helps
the CLT to work - Equal variances?
- Look at the sample sizes. If equal, this
protects against adverse consequences. If not
equal, look at sample s.d.s. - Independence?
36Assumptions and conditions in Lecture Exercise 1
(Toxin readings)
- Well defined continuous variables?
- Toxin readings are continuous variables
- Equal variances?
- Equal sample sizes, so equal variances can be
assumed - Representative sample?
- Yes, as a result of random sampling.
- Independence?
- It would be OK, say, if readings carried out
separately
37Normality condition?
- The Error (Within Samples) degrees of freedom is
18 lt 30, which is not large enough - However, a normal probability plot of the
residuals shown on the next slide confirms the
normal distribution of the residuals.
38Normal probability plot of residuals
39The case k 2, comparing two means
- Can be tested either with
- a 2-sample t-test or
- a 2-sample z-test, or
- one way ANOVA based on two samples
- The t-test and z-test use a CI for difference
between means see Week 10 - The P-value is identical for both 2-sample t-test
and ANOVA - See the textbook for an example, and more
explanation