Analysis of Variance - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Analysis of Variance

Description:

The Fisher Least Significant (LSD) method is one procedure designed to determine ... Calculating LSD: MSE = 8894.44; n1 = n2 = n3 =20. t.05/2,60-3 = tinv(.05, ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 40
Provided by: sba42
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Variance


1
Analysis of Variance
  • Chapter 15

2
Introduction
  • Analysis of variance helps compare two or more
    populations of quantitative data.
  • Specifically, we are interested in the
    relationships among the population means (are
    they equal or not).
  • The procedure works by analyzing the sample
    variance.

3
15.1 One - Way Analysis of Variance
  • The analysis of variance is a procedure that
    tests to determine whether differences exits
    among two or more population means.
  • To do this, the technique analyzes the sample
    variances

4
One - Way Analysis of Variance
  • Example 1
  • An apple juice manufacturer is planning to
    develop a new product -a liquid concentrate.
  • The marketing manager has to decide how to market
    the new product.
  • Three strategies are considered
  • Emphasize convenience of using the product.
  • Emphasize the quality of the product.
  • Emphasize the products low price.

5
One - Way Analysis of Variance
  • Example 1 - continued
  • An experiment was conducted as follows
  • In three cities an advertisement campaign was
    launched .
  • In each city only one of the three
    characteristics (convenience, quality, and price)
    was emphasized.
  • The weekly sales were recorded for twenty weeks
    following the beginning of the campaigns.

6
One - Way Analysis of Variance
  • See file
  • (Xm1.xls)

Weekly sales
Weekly sales
Weekly sales
7
One - Way Analysis of Variance
  • Solution
  • The data is quantitative.
  • Our problem objective is to compare sales in
    three cities.
  • We hypothesize on the relationships among the
    three mean weekly sales

8
Defining the Hypotheses
  • Solution

H0 m1 m2 m3 H1 At least two means
differ To build the statistic needed to test
thehypotheses use the following notation
9
Notation
Independent samples are drawn from k populations
(treatments).
X11 x12 . . . Xn1,1
X21 x22 . . . Xn2,1
Xk1 xk2 . . . Xnk,1
Sample size
Sample mean
X is the response variable. The variables
value are called responses.
10
Terminology
  • In the context of this problem
  • Response variable weekly salesResponses
    actual sale valuesExperimental unit weeks in
    the three cities when we record sales
    figures.Factor the criterion by which we
    classify the populations (the treatments). In
    this problems the factor is the marketing
    strategy.
  • Factor levels the population (treatment)
    names. In this problem factor levels are the
    marketing trategies.

11
Two types of variability are employed when
testing for the equality of the population means
The rationale of the test statistic
12
Graphical demonstration Employing two types of
variability
13
20
16 15 14
11 10 9
The sample means are the same as before, but the
larger within-sample variability makes it harder
to draw a conclusion about the population means.
A small variability within the samples makes it
easier to draw a conclusion about the population
means.
Treatment 1
Treatment 2
Treatment 3
14
The rationale behind the test statistic I
  • If the null hypothesis is true, we would expect
    all the sample means be close to one another (and
    as a result to the grand mean).
  • If the alternative hypothesis is true, at least
    some of the sample means would reside away from
    one another.
  • Thus, we measure variability among sample means.

15
Variability among sample means
  • The variability among the sample means is
    measured as the sum of squared distances between
    each mean and the grand mean.
  • This sum is called the
  • Sum of Squares for Treatments
  • SST

In our example treatments are represented by the
different advertising strategies.
16
Sum of squares for treatments (SSTR)
There are k treatments
The mean of sample j
The size of sample j
Note When the sample means are close toone
another, their distance from the grand mean is
small, leading to amall SST. Thus, large SST
indicates large variation among sample means,
which supports H1.
17
Sum of squares for treatments (SST)
  • Solution continuedCalculate SST

20(577.55 - 613.07)2 20(653.00 -
613.07)2 20(608.65 - 613.07)2 57,512.23
The grand mean is calculated by
18
Sum of squares for treatments (SST)
  • Is SST 57,512.23 large enough to favor
    H1?See next.

19
The rationale behind test statistic II
  • Large variability within the samples weakens the
    ability of the sample means to represent their
    corresponding population means.
  • Therefore, even-though sample means may markedly
    differ from one another, SST must be judged
    relative to the within samples variability.

20
Within samples variability
  • The variability within samples is measured by
    adding all the squared distances between
    observations and their sample means.
  • This sum is called the
  • Sum of Squares for Error -
  • SSE.

In our example this is the sum of all squared
differences between sales in city j and
the sample mean of city j (over all the three
cities).
21
Sum of squares for errors (SSE)
  • Solution continuedCalculate SSE

(n1 - 1)S12
(n2 -1)S22 (n3 -1)S32 (20 -1)10,774.44 (20
-1)7238.61 (20-1)8,669.47 506,967.88
22
Sum of squares for errors (SSE)
  • Is SST 57,512.23 small enough relative to SSE
    506,983.50 to avoid rejecting H0 all the means
    are equal?

23
The mean sum of squares
To perform the test we need to calculate the mean
sum of squares as follows
24
Calculation of the test statistic
For honors class Testing normality
For honors classTesting equal variances
We assume 1. The populations tested are
normally distributed. 2. The variances of all
the populations tested are equal.
with the following degrees of freedom v1k -1
and v2n-k
25
The F test rejection region
the hypothesis test
And finally
26
The F test
Ho m1 m2 m3 H1 At least two means differ
Test statistic F MST/ MSE
3.23
Since 3.23 3.15, there is sufficient evidence
to reject Ho in favor of H1, and argue that at
least one of the mean sales is different than
the others.
27
The F test p- value
  • Use Excel to find the p-value
  • FDIST(3.23,2,57) .0467

p Value P(F3.23) .0467
28
Excel single factor printout
See file (Xm1.xls)
SS(Total) SST SSE
29
15.3 Randomized Blocks Design
  • The purpose of designing a randomized block
    experiment is to reduce the within-treatments
    variation thus increasing the relative amount of
    among-treatment variation.
  • This helps in detecting differences among the
    treatment means more easily.

30
Randomized Blocks
Block all the observations with some commonality
across treatments
Treatment 4
Treatment 3
Treatment 2
Treatment 1
Block 1
Block3
Block2
31
Partitioning the total variability
  • The sum of square total is partitioned into three
    sources of variation
  • Treatments
  • Blocks
  • Within samples (Error)

Recall. For the independent
samples design we have SS(Total) SST SSE
SS(Total) SST SSB SSE
32
The mean sum of square
  • To perform hypothesis tests for treatments and
    blocks we need
  • Mean square for treatments
  • Mean square for blocks
  • Mean square for error

33
The test statistic for the randomized block
design ANOVA
Test statistics for treatments
Test statistics for blocks
34
The F test rejection region
  • Testing the mean responses for treatments F
    Fa,k-1,(k-1)(b-1)
  • Testing the mean response for blocks F
    Fa,b-1,(k-1)(b-1)

35
Randomized Blocks ANOVA - Example
Additional example
  • Example 2
  • Are there differences in the effectiveness of
    cholesterol reduction drugs?
  • To answer this question the following experiment
    was organized
  • 25 groups of men with high cholesterol were
    matched by age and weight. Each group consisted
    of 4 men.
  • Each person in a group received a different drug.
  • The cholesterol level reduction in two months was
    recorded.
  • Can we infer from the data in Xm2.xls that there
    are differences in mean cholesterol reduction
    among the four drugs?

36
Randomized Blocks ANOVA - Example
  • Solution
  • Each drug can be considered a treatment.
  • Each 4 records (per group) can be blocked,
    because they are matched by age and weight.
  • This procedure eliminates the variability in
    cholesterol reduction related to different
    combinations of age and weight.
  • This helps detect differences in the mean
    cholesterol reduction attributed to the different
    drugs.

37
Randomized Blocks ANOVA - Example
Blocks
Treatments
b-1
MSTR / MSE
MSBL / MSE
K-1
38
15.6 Multiple Comparisons
  • The Fisher Least Significant (LSD) method is one
    procedure designed to determine which mean
    difference is significant.
  • The hypotheses H0 mi mj 0 Ha mi
    mj ¹ 0.
  • The statistic

39
15.6 Multiple Comparisons
  • The rejection region


Example continued Calculating LSDMSE
8894.44 n1 n2 n3 20. t.05/2,60-3
tinv(.05,57) 2.002LSD(2.002)8894.44(1/201/20
).5 59.72
Write a Comment
User Comments (0)
About PowerShow.com