Analysis of Variance - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis of Variance

Description:

Chapter 15 Analysis of Variance 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested in ... – PowerPoint PPT presentation

Number of Views:217
Avg rating:3.0/5.0
Slides: 83
Provided by: sbae9
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Variance


1
Analysis of Variance
  • Chapter 15

2
15.1 Introduction
  • Analysis of variance compares two or more
    populations of interval data.
  • Specifically, we are interested in determining
    whether differences exist between the population
    means.
  • The procedure works by analyzing the sample
    variance.

3
15.2 One Way Analysis of Variance
  • The analysis of variance is a procedure that
    tests to determine whether differences exits
    between two or more population means.
  • To do this, the technique analyzes the sample
    variances

4
One Way Analysis of Variance
  • Example 15.1
  • An apple juice manufacturer is planning to
    develop a new product -a liquid concentrate.
  • The marketing manager has to decide how to market
    the new product.
  • Three strategies are considered
  • Emphasize convenience of using the product.
  • Emphasize the quality of the product.
  • Emphasize the products low price.

5
One Way Analysis of Variance
  • Example 15.1 - continued
  • An experiment was conducted as follows
  • In three cities an advertisement campaign was
    launched .
  • In each city only one of the three
    characteristics (convenience, quality, and price)
    was emphasized.
  • The weekly sales were recorded for twenty weeks
    following the beginning of the campaigns.

6
One Way Analysis of Variance
  • See file
  • Xm15 -01

Weekly sales
Weekly sales
Weekly sales
7
One Way Analysis of Variance
  • Solution
  • The data are interval
  • The problem objective is to compare sales in
    three cities.
  • We hypothesize that the three population means
    are equal

8
Defining the Hypotheses
  • Solution

H0 m1 m2 m3 H1 At least two means
differ To build the statistic needed to test
thehypotheses use the following notation
9
Notation
Independent samples are drawn from k populations
(treatments).
X11 x21 . . . Xn1,1
X12 x22 . . . Xn2,2
X1k x2k . . . Xnk,k
Sample size
Sample mean
X is the response variable. The variables
value are called responses.
10
Terminology
  • In the context of this problem
  • Response variable weekly salesResponses
    actual sale valuesExperimental unit weeks in
    the three cities when we record sales
    figures.Factor the criterion by which we
    classify the populations (the treatments). In
    this problems the factor is the marketing
    strategy.
  • Factor levels the population (treatment)
    names. In this problem factor levels are the
    marketing trategies.

11
Two types of variability are employed when
testing for the equality of the population means
The rationale of the test statistic
12
Graphical demonstration Employing two types of
variability
13
20
16 15 14
11 10 9
The sample means are the same as before, but the
larger within-sample variability makes it harder
to draw a conclusion about the population means.
A small variability within the samples makes it
easier to draw a conclusion about the population
means.
Treatment 1
Treatment 2
Treatment 3
14
The rationale behind the test statistic I
  • If the null hypothesis is true, we would expect
    all the sample means to be close to one another
    (and as a result, close to the grand mean).
  • If the alternative hypothesis is true, at least
    some of the sample means would differ.
  • Thus, we measure variability between sample
    means.

15
Variability between sample means
  • The variability between the sample means is
    measured as the sum of squared distances between
    each mean and the grand mean.
  • This sum is called the
  • Sum of Squares for Treatments
  • SST

In our example treatments are represented by the
different advertising strategies.
16
Sum of squares for treatments (SST)
There are k treatments
The mean of sample j
The size of sample j
Note When the sample means are close toone
another, their distance from the grand mean is
small, leading to a small SST. Thus, large SST
indicates large variation between sample means,
which supports H1.
17
Sum of squares for treatments (SST)
  • Solution continuedCalculate SST

20(577.55 - 613.07)2 20(653.00 -
613.07)2 20(608.65 - 613.07)2 57,512.23
The grand mean is calculated by
18
Sum of squares for treatments (SST)
  • Is SST 57,512.23 large enough to reject H0 in
    favor of H1?See next.

19
The rationale behind test statistic II
  • Large variability within the samples weakens the
    ability of the sample means to represent their
    corresponding population means.
  • Therefore, even though sample means may markedly
    differ from one another, SST must be judged
    relative to the within samples variability.

20
Within samples variability
  • The variability within samples is measured by
    adding all the squared distances between
    observations and their sample means.
  • This sum is called the
  • Sum of Squares for Error
  • SSE

In our example this is the sum of all squared
differences between sales in city j and
the sample mean of city j (over all the three
cities).
21
Sum of squares for errors (SSE)
  • Solution continuedCalculate SSE

(n1 - 1)s12
(n2 -1)s22 (n3 -1)s32 (20 -1)10,774.44 (20
-1)7,238.61 (20-1)8,670.24 506,983.50
22
Sum of squares for errors (SSE)
  • Is SST 57,512.23 large enough relative to SSE
    506,983.50 to reject the null hypothesis that
    specifies that all the means are equal?

23
The mean sum of squares
To perform the test we need to calculate the mean
squares as follows
24
Calculation of the test statistic
Required Conditions 1. The populations tested
are normally distributed. 2. The variances
of all the populations tested are equal.
with the following degrees of freedom v1k -1
and v2n-k
25
The F test rejection region
the hypothesis test
And finally
26
The F test
Ho m1 m2 m3 H1 At least two means differ
Test statistic F MST/ MSE
3.23
Since 3.23 gt 3.15, there is sufficient evidence
to reject Ho in favor of H1, and argue that at
least one of the mean sales is different than
the others.
27
The F test p- value
  • Use Excel to find the p-value
  • fx Statistical
    FDIST(3.23,2,57) .0467

p Value P(Fgt3.23) .0467
28
Excel single factor ANOVA
Xm15-01.xls
SS(Total) SST SSE
29
15.3 Analysis of Variance Experimental Designs
  • Several elements may distinguish between one
    experimental design and others.
  • The number of factors.
  • Each characteristic investigated is called a
    factor.
  • Each factor has several levels.

30
One - way ANOVA Single factor
Two - way ANOVA Two factors
Response
Response
Treatment 3 (level 1)
Treatment 2 (level 2)
Treatment 1 (level 3)
Level 3
Level2
Factor A
Level 1
Level 1
Level2
Factor B
31
Independent samples or blocks
  • Groups of matched observations are formed into
    blocks, in order to remove the effects of
    unwanted variability.
  • By doing so we improve the chances of detecting
    the variability of interest.

32
Models of Fixed and Random Effects
  • Fixed effects
  • If all possible levels of a factor are included
    in our analysis we have a fixed effect ANOVA.
  • The conclusion of a fixed effect ANOVA applies
    only to the levels studied.
  • Random effects
  • If the levels included in our analysis represent
    a random sample of all the possible levels, we
    have a random-effect ANOVA.
  • The conclusion of the random-effect ANOVA applies
    to all the levels (not only those studied).

33
Models of Fixed and Random Effects.
  • In some ANOVA models the test statistic of the
    fixed effects case may differ from the test
    statistic of the random effect case.
  • Fixed and random effects - examples
  • Fixed effects - The advertisement Example
    (15.1) All the levels of the marketing
    strategies were included
  • Random effects - To determine if there is a
    difference in the production rate of 50 machines,
    four machines are randomly selected and there
    production recorded.

34
15.4 Randomized Blocks (Two-way) Analysis of
Variance
  • The purpose of designing a randomized block
    experiment is to reduce the within-treatments
    variation thus increasing the relative amount of
    between treatment variation.
  • This helps in detecting differences between the
    treatment means more easily.

35
Randomized Blocks
Block all the observations with some commonality
across treatments
Treatment 4
Treatment 3
Treatment 2
Treatment 1
Block 1
Block3
Block2
36
Randomized Blocks
Block all the observations with some commonality
across treatments
37
Partitioning the total variability
  • The sum of square total is partitioned into three
    sources of variation
  • Treatments
  • Blocks
  • Within samples (Error)

Recall. For the independent
samples design we have SS(Total) SST SSE
SS(Total) SST SSB SSE
38
Calculating the sums of squares
  • Formulai for the calculation of the sums of
    squares

39
Calculating the sums of squares
  • Formulai for the calculation of the sums of
    squares

40
Mean Squares
  • To perform hypothesis tests for treatments and
    blocks we need
  • Mean square for treatments
  • Mean square for blocks
  • Mean square for error

41
Test statistics for the randomized block design
ANOVA
42
The F test rejection regions
  • Testing the mean responses for treatments
  • F gt Fa,k-1,n-k-b1
  • Testing the mean response for blocks
  • Fgt Fa,b-1,n-k-b1

43
Randomized Blocks ANOVA - Example
  • Example 15.2
  • Are there differences in the effectiveness of
    cholesterol reduction drugs?
  • To answer this question the following experiment
    was organized
  • 25 groups of men with high cholesterol were
    matched by age and weight. Each group consisted
    of 4 men.
  • Each person in a group received a different drug.
  • The cholesterol level reduction in two months was
    recorded.
  • Can we infer from the data in Xm15-02 that there
    are differences in mean cholesterol reduction
    among the four drugs?

44
Randomized Blocks ANOVA - Example
  • Solution
  • Each drug can be considered a treatment.
  • Each 4 records (per group) can be blocked,
    because they are matched by age and weight.
  • This procedure eliminates the variability in
    cholesterol reduction related to different
    combinations of age and weight.
  • This helps detect differences in the mean
    cholesterol reduction attributed to the different
    drugs.

45
Randomized Blocks ANOVA - Example
Blocks
Treatments
b-1
MST / MSE
MSB / MSE
K-1
46
Analysis of Variance
  • Chapter 15 - continued

47
15.5 Two-Factor Analysis of Variance -
  • Example 15.3
  • Suppose in Example 15.1, two factors are to be
    examined
  • The effects of the marketing strategy on sales.
  • Emphasis on convenience
  • Emphasis on quality
  • Emphasis on price
  • The effects of the selected media on sales.
  • Advertise on TV
  • Advertise in newspapers

48
Attempting one-way ANOVA
  • Solution
  • We may attempt to analyze combinations of levels,
    one from each factor using one-way ANOVA.
  • The treatments will be
  • Treatment 1 Emphasize convenience and advertise
    in TV
  • Treatment 2 Emphasize convenience and advertise
    in newspapers
  • .
  • Treatment 6 Emphasize price and advertise in
    newspapers

49
Attempting one-way ANOVA
  • Solution
  • The hypotheses tested are
  • H0 m1 m2 m3 m4 m5 m6
  • H1 At least two means differ.

50
Attempting one-way ANOVA
  • Solution
  • In each one of six cities sales are recorded
    for ten weeks.
  • In each city a different combination of
    marketing emphasis and media usage is
    employed.
  • City1 City2 City3 City4 City5 City6Convn
    ce Convnce Quality Quality
    Price Price
  • TV Paper TV Paper
    TV Paper

51
Attempting one-way ANOVA
  • Solution

Xm15-03
  • The p-value .0452.
  • We conclude that there is evidence that
    differences exist in the mean weekly sales
    among the six cities.

52
Interesting questions no answers
  • These result raises some questions
  • Are the differences in sales caused by the
    different marketing strategies?
  • Are the differences in sales caused by the
    different media used for advertising?
  • Are there combinations of marketing strategy and
    media that interact to affect the weekly sales?

53
Two-way ANOVA (two factors)
  • The current experimental design cannot provide
    answers to these questions.
  • A new experimental design is needed.

54
Two-way ANOVA (two factors)
Convenience
Quality
Price
City 1 sales
City3 sales
City 5 sales
TV
City 2 sales
City 4 sales
City 6 sales
Newspapers
Are there differences in the mean sales caused
by different marketing strategies?
55
Two-way ANOVA (two factors)
  • Test whether mean sales of Convenience,
    Quality,
  • and Price significantly differ from one
    another.
  • H0 mConv. mQuality mPrice
  • H1 At least two means differ

56
Two-way ANOVA (two factors)
Factor A Marketing strategy
Convenience
Quality
Price
City 1 sales
City 3 sales
City 5 sales
TV
Factor B Advertising media
City 2 sales
City 4 sales
City 6 sales
Newspapers
Are there differences in the mean sales caused
by different advertising media?
57
Two-way ANOVA (two factors)
Test whether mean sales of the TV, and
Newspapers significantly differ from one
another. H0 mTV mNewspapers H1 The means
differ
58
Two-way ANOVA (two factors)
Factor A Marketing strategy
Convenience
Quality
Price
City 1 sales
City 5 sales
City 3 sales
TV
Factor B Advertising media
City 2 sales
City 4 sales
City 6 sales
Newspapers
Are there differences in the mean sales caused
by interaction between marketing strategy and
advertising medium?
59
Two-way ANOVA (two factors)
  • Test whether mean sales of certain cells
  • are different than the level expected.
  • Calculation are based on the sum of square for
    interaction SS(AB)

60
Graphical description of the possible
relationships between factors A and B.
61
Difference between the levels of factor A No
difference between the levels of factor B
Difference between the levels of factor A,
and difference between the levels of factor B
no interaction
M R e e s a p n o n s e
M R e e s a p n o n s e
Level 1 of factor B
Level 1and 2 of factor B
Level 2 of factor B
Levels of factor A
Levels of factor A
1
2
3
1
2
3
M R e e s a p n o n s e
M R e e s a p n o n s e
No difference between the levels of factor
A. Difference between the levels of factor B
Interaction
Levels of factor A
Levels of factor A
1
2
3
1
2
3
62
Sums of squares
63
F tests for the Two-way ANOVA
  • Test for the difference between the levels of the
    main factors A and
    B

SS(A)/(a-1)
SS(B)/(b-1)
SSE/(n-ab)
Rejection region F gt Fa,a-1 ,n-ab
F gt Fa, b-1, n-ab
  • Test for interaction between factors A and B

SS(AB)/(a-1)(b-1)
Rejection region F gt Fa,(a-1)(b-1),n-ab
64
Required conditions
  1. The response distributions is normal
  2. The treatment variances are equal.
  3. The samples are independent.

65
F tests for the Two-way ANOVA
  • Example 15.3 continued( Xm15-03)

66
F tests for the Two-way ANOVA
  • Example 15.3 continued
  • Test of the difference in mean sales between the
    three marketing strategies
  • H0 mconv. mquality mprice
  • H1 At least two mean sales are different

Factor A Marketing strategies
67
F tests for the Two-way ANOVA
  • Example 15.3 continued
  • Test of the difference in mean sales between the
    three marketing strategies
  • H0 mconv. mquality mprice
  • H1 At least two mean sales are different
  • F MS(Marketing strategy)/MSE 5.33
  • Fcritical Fa,a-1,n-ab F.05,3-1,60-(3)(2)
    3.17 (p-value .0077)
  • At 5 significance level there is evidence to
    infer that differences in weekly sales exist
    among the marketing strategies.

MS(A)/MSE
68
F tests for the Two-way ANOVA
  • Example 15.3 - continued
  • Test of the difference in mean sales between the
    two advertising media
  • H0 mTV. mNespaper
  • H1 The two mean sales differ

Factor B Advertising media
69
F tests for the Two-way ANOVA
  • Example 15.3 - continued
  • Test of the difference in mean sales between the
    two advertising media
  • H0 mTV. mNespaper
  • H1 The two mean sales differ
  • F MS(Media)/MSE 1.42
  • Fcritical Fa,a-1,n-ab F.05,2-1,60-(3)(2)
    4.02 (p-value .2387)
  • At 5 significance level there is insufficient
    evidence to infer that differences in weekly
    sales exist between the two advertising media.

MS(B)/MSE
70
F tests for the Two-way ANOVA
  • Example 15.3 - continued
  • Test for interaction between factors A and B
  • H0 mTVconv. mTVquality mnewsp.price
  • H1 At least two means differ

Interaction AB MarketingMedia
71
F tests for the Two-way ANOVA
  • Example 15.3 - continued
  • Test for interaction between factor A and B
  • H0 mTVconv. mTVquality mnewsp.price
  • H1 At least two means differ
  • F MS(MarketingMedia)/MSE .09
  • Fcritical Fa,(a-1)(b-1),n-ab
    F.05,(3-1)(2-1),60-(3)(2) 3.17 (p-value .9171)
  • At 5 significance level there is insufficient
    evidence to infer that the two factors interact
    to affect the mean weekly sales.

MS(AB)/MSE
72
15.7 Multiple Comparisons
  • When the null hypothesis is rejected, it may be
    desirable to find which mean(s) is (are)
    different, and at what ranking order.
  • Three statistical inference procedures, geared at
    doing this, are presented
  • Fishers least significant difference (LSD)
    method
  • Bonferroni adjustment
  • Tukeys multiple comparison method

73
15.7 Multiple Comparisons
  • Two means are considered different if the
    difference between the corresponding sample means
    is larger than a critical number. Then, the
    larger sample mean is believed to be associated
    with a larger population mean.
  • Conditions common to all the methods here
  • The ANOVA model is the one way analysis of
    variance
  • The conditions required to perform the ANOVA are
    satisfied.
  • The experiment is fixed-effect

74
Fisher Least Significant Different (LSD) Method
  • This method builds on the equal variances t-test
    of the difference between two means.
  • The test statistic is improved by using MSE
    rather than sp2.
  • We can conclude that mi and mj differ (at a
    significance level if mi - mj gt LSD, where

75
Experimentwise Type I error rate (aE)(the
effective Type I error)
  • The Fishers method may result in an increased
    probability of committing a type I error.
  • The experimentwise Type I error rate is the
    probability of committing at least one Type I
    error at significance level of a. It is
    calculated by aE 1-(1 a)Cwhere C is the
    number of pairwise comparisons (I.e. C
    k(k-1)/2
  • The Bonferroni adjustment determines the required
    Type I error probability per pairwise comparison
    (a) , to secure a pre-determined overall aE.

76
Bonferroni Adjustment
  • The procedure
  • Compute the number of pairwise comparisons
    (C)Ck(k-1)/2, where k is the number of
    populations.
  • Set a aE/C, where aE is the true probability of
    making at least one Type I error (called
    experimentwise Type I error).
  • We can conclude that mi and mj differ (at a/C
    significance level if

77
Fisher and Bonferroni Methods
  • Example 15.1 - continued
  • Rank the effectiveness of the marketing
    strategies(based on mean weekly sales).
  • Use the Fishers method, and the Bonferroni
    adjustment method
  • Solution (the Fishers method)
  • The sample mean sales were 577.55, 653.0, 608.65.
  • Then,

78
Fisher and Bonferroni Methods
  • Solution (the Bonferroni adjustment)
  • We calculate Ck(k-1)/2 to be 3(2)/2 3.
  • We set a .05/3 .0167, thus t.0167/2, 60-3
    2.467 (Excel).

Again, the significant difference is between m1
and m2.
79
Tukey Multiple Comparisons
  • The test procedure
  • Find a critical number w as follows

k the number of samples n degrees of
freedom n - k ng number of observations per
sample (recall, all the sample sizes are
the same) a significance level qa(k,n) a
critical value obtained from the studentized
range table
80
Tukey Multiple Comparisons
  • Select a pair of means. Calculate the difference
    between the larger and the smaller mean.
  • If there is
    sufficient evidence to conclude that mmax gt mmin
    .
  • Repeat this procedure for each pair of samples.
    Rank the means if possible.

81
Tukey Multiple Comparisons
  • Example 15.1 - continued We had three populations
    (three marketing strategies).K 3,
  • Sample sizes were equal. n1 n2 n3 20,n
    n-k 60-3 57,MSE 8894.

Take q.05(3,60) from the table.
Population Sales - City 1 Sales - City 2 Sales -
City 3
Mean 577.55 653 698.65
City 1 vs. City 2 653 - 577.55 75.45 City 1
vs. City 3 608.65 - 577.55 31.1 City 2 vs.
City 3 653 - 608.65 44.35
82
Excel Tukey and Fisher LSD method
Xm15-01
Fishers LDS
a .05
Bonferroni adjustments
a .05/3 .0167
Write a Comment
User Comments (0)
About PowerShow.com