Week 12 objectives - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Week 12 objectives

Description:

A claim about ten populations of quantitative data is to be tested using sample information ... one way ANOVA based on two samples ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 37
Provided by: mishal8
Category:

less

Transcript and Presenter's Notes

Title: Week 12 objectives


1
Week 12 objectives
  • 1. Overview of hypothesis tests
  • 2. ANOVA (a) Example
  • (b) Minitab
  • (c) ANOVA Table
  • (d) Conceptual view of ANOVA
  • 3. Assumptions and Conditions for ANOVA

2
1. Overview of hypothesis tests
3
When may ANOVA needed?
  • A supermarket chain store executive needs to
    determine whether the sales of a new product are
    affected by the aisle in which the product is
    stored.
  • Possible experiment
  • 10 aisles in the store
  • Locate the product in each of the 10 aisles for a
    week and record daily sales
  • Are there differences among mean daily sales?
  • A claim about ten populations of quantitative
    data is to be tested using sample information

4
2. ANOVA introduction
  • An Analysis of variance (ANOVA) is a test about
    population means
  • The usual null hypothesis states that some group
    of population means are equal
  • The alternative hypothesis is simply that those
    means are not all equal
  • This test about means is carried out by
    decomposing certain sample variances

5
ANOVA hypotheses
6
2(a) Example 1
  • Random samples of 15 students from each of three
    faculties record the dollar amounts spent on
    textbooks and stationery
  • Are the mean spending levels different across the
    three faculties?
  • How to answer the question?

7
The boxplots for the faculty spending example
8
Discussion
  • Conclusions about population mean spending levels
    are to be based on samples
  • This requires a Statistical inference method
  • Which inference method?
  • The question to be investigated is about whether
    population mean spending levels are the same or
    not, i.e. need a yes/no answer
  • This indicates a hypothesis test
  • For differences between more than two population
    means, ANOVA is required

9
ANOVA the F ratio test statistic
The form of the test statistic is the F ratio
10
ANOVA the F ratio test statistic (cont.)
  • The ratio is sensitive to whether or not the null
    hypothesis is true
  • The null distribution of the F-ratio is the
    F-distribution
  • Using the F-ratio to carry out a test is called
    an F-test
  • The results are laid out in a very convenient
    ANOVA Table

11
2(b) ANOVA using Minitab
  • For one-way ANOVA, data can either be presented
    in separate columns
  • (use Stat gt ANOVA gt One-way (unstacked))
  • or
  • stacked in a single column, in which case another
    column is needed to contain the sample labels.
  • Then use Stat gt ANOVA gt One-way.

12
2(c) Use the ANOVA Table to write out a six
steps solution Minitab output for Example 1
(Note the within samples SSQ is called the
error SSQ in Minitab).
13
Six steps Solution
14
The six steps (cont.)
  • P-value 0.003
  • Decision rule Reject H0 if P-value lt 0.05, but
    if P-value gt 0.05 then we cannot reject H0. In
    the present case P-value 0.003 lt .05, so H0
    is rejected.
  • There is strong evidence to conclude that at
    least two population mean expenditure levels are
    different.

15
More comments P-values and the distribution of
the F ratio
  • The F-ratio,
  • tends to have a value around 1 if H0 is true,
    but becomes inflated if H0 is not true.
  • Thus large values of F are significant.
  • The P-value will be the probability Pr an F
    distributed variable gt observed F value
  • From Fk-1,N-k distribution tables, or by Minitab.

16
Example F distribution
17
More about the F distribution
18
2(d) Conceptual View of ANOVA (1)
Consider the following two experiments to examine
the effectiveness of three different teaching
methods on two campuses (City West and Mawson
Lakes). Here is the raw data
Which experiment has better evidence of a
difference in the true (POPULATION) average
results among the methods?
19
Conceptual view of ANOVA (2)
Could variations among the means this large be
plausibly due to chance
OR Is it a good evidence that
POPULATION means differ?
It seems that in experiment 1, it is easier to
justify the differences between the levels of the
factor because the results are so consistent. The
heart of ANOVA is to compare the variability
among the group means to the variability within
each group.
20
Conceptual View of ANOVA (3)
  • In experiment 1, the variability among the group
    means is much larger than the variability of
    individual observations within each single group.
  • This is the basic idea behind ANOVA.
  • This technique examines the data for evidence of
    differences in the corresponding population means
    by looking at the ratio

21
Review of SSQ (from week 2)
In ANOVA a variance is called a mean square.
22
2(a) ANOVA decomposition of SSQ (leave out as
non-examinable)
23
ANOVA decomposition of SSQ, contd.
  • The decomposition
  • Total SSQ Between samples SSQ Within samples
    SSQ applies also to degrees of freedom
  • N1 (k1) (Nk)
  • Dividing SSQ terms by degrees of freedom gives
    mean-square, or MSQ terms
  • If some of the SSQ terms become inflated more
    than others, so will the corresponding MSQ terms
  • But if the null hypothesis is true, both MSQ
    terms, are estimates of the same thing, the
    natural experimental variability of the data.

24
ANOVA how the decomposition leads to a test
  • If the null hypothesis of equal population means
    is true, the corresponding MSQ terms both
    estimate error variance and are approximately
    equal.
  • But if the population means differ, the Between
    samples SSQ becomes inflated and its MSQ tends to
    be bigger than the Within samples, or error, MSQ.
  • Thus the ratio of MSQ terms provides a test
    statistic if H0 is true, both MSQ terms estimate
    the same thing, and the ratio is about 1.
  • So the value 1 is roughly in the middle of the
    null distribution of the test ratio.

25
Lecture Exercise 1 the textbook example (12.2.1)
26
Will there be evidence of different population
means?
Boxplots in the Lecture Exercise 1
27
Lecture Exercise 1 cont.
Analysis of Variance Source DF SS MS
F P Brands 2 5.09 2.54 0.87
0.437 Error 18 52.87 2.94 Total 20
57.95
  • The between and within samples MSQ terms are
    5.09/2 2.54 and 52.87/18 2.94
  • The ratio of these MSQ terms is 2.54/2.94 0.87,
    which is less than 1.

28
Solution Steps (i), (ii), (iii)
29
Solution Steps (iv), (v) and (vi)
  • P-value P-value 0.437
  • Decision rule Reject H0 if P-value lt 0.05, but
    if P-value gt 0.05 then we cannot reject H0. In
    the present case P-value 0.437 gt .05, so H0
    cannot be rejected.
  • There is no evidence suggesting differences
    between population mean brand levels of toxin.

30
Lecture Exercise 2
  • The XYZ Corporation is interested in possible
    differences in days worked by salaried employees
    in three different departments in the financial
    area.
  • A survey of 23 randomly chosen employees reveals
    the data shown below.
  • At 1 significance level, are the mean annual
    attendance rates different for employees in these
    three departments?

31
Boxplots for Lecture Example 2
32
Minitab output for Exercise 2
One-way ANOVA Budgets, Payables,
Pricing Analysis of Variance Source DF
SS MS F P Factor 2
1804 902 3.43 0.052 Error
20 5257 263 Total 22
7060
Level N
Mean StDev Budgets 5 261.20
11.95 Payables 10 238.00 21.24 Pricing
8 244.38 9.46
Pooled StDev 16.21
33
Solution Steps (i), (ii), (iii)
34
Solution Steps (iv), (v) and (vi)
  • P-value P-value 0.052
  • Decision rule Reject H0 if P-value lt 0.01, but
    if P-value gt 0.01 then we cannot reject H0. In
    the present case P-value 0.052 gt 0.01, so H0
    cannot be rejected.
  • There is not enough evidence to suggest
    differences between mean annual attendance rates
    in the three departments.

35
3. Assumptions and Conditions
  • Well defined continuous variables?
  • Representative sample?
  • Large sample sizes or normally distributed
    variables?
  • Look at a normal probability plot of residuals,
    but large degrees of freedom for error term helps
    the CLT to work
  • Equal variances?
  • Look at the sample sizes. If equal, this
    protects against adverse consequences. If not
    equal, look at sample s.d.s.
  • Independence?

36
Assumptions and conditions in Lecture Exercise 1
(Toxin readings)
  • Well defined continuous variables?
  • Toxin readings are continuous variables
  • Equal variances?
  • Equal sample sizes, so equal variances can be
    assumed
  • Representative sample?
  • Yes, as a result of random sampling.
  • Independence?
  • It would be OK, say, if readings carried out
    separately

37
Normality condition?
  • The Error (Within Samples) degrees of freedom is
    18 lt 30, which is not large enough
  • However, a normal probability plot of the
    residuals shown on the next slide confirms the
    normal distribution of the residuals.

38
Normal probability plot of residuals
39
The case k 2, comparing two means
  • Can be tested either with
  • a 2-sample t-test or
  • a 2-sample z-test, or
  • one way ANOVA based on two samples
  • The t-test and z-test use a CI for difference
    between means see Week 10
  • The P-value is identical for both 2-sample t-test
    and ANOVA
  • See the textbook for an example, and more
    explanation
Write a Comment
User Comments (0)
About PowerShow.com