IE341: Introduction to Design of Experiments

About This Presentation

Title:

IE341: Introduction to Design of Experiments

Description:

The F distribution is a ratio of two chi-square variables. ... effect of four different tips on the readings from a hardness testing machine. ... – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 94

Provided by: ie1Ka

Category:

more less

Transcript and Presenter's Notes

Title: IE341: Introduction to Design of Experiments

1
IE341 Introduction to Design of Experiments
2

Last term we talked about testing the
difference between two independent means. For
means from a normal population, the test
statistic is

We also covered the case where the two means
are not independent, and what we must do to
account for the fact that they are dependent.

And finally, we talked about the difference
between two variances, where we used the F ratio.
The F distribution is a ratio of two chi-square
variables. So if s21 and s22 possess
independent chi-square distributions with v1 and
v2 df, respectively, then
has the F distribution with v1 and v2 df.

All of this is valuable if we are testing
only two means. But what if we want to test to
see if there is a difference among three means,
or four, or ten?
What if we want to know whether fertilizer
A or fertilizer B or fertilizer C is best? In
this case, fertilizer is called a factor, which
is the condition under test.
A, B, C, the three types of fertilizer
under test, are called levels of the factor
fertilizer.
Or what if we want to know if treatment A
or treatment B or treatment C or treatment D is
best? In this case, treatment is called a
factor.
A,B,C,D, the four types of treatment under
test, are called levels of the factor treatment.
It should be noted that the factor may be
quantitative or qualitative.

Enter the analysis of variance!
ANOVA, as it is usually called, is a way to
test the differences between means in such
situations.
Previously, we tested single-factor
experiments with only two treatment levels.
These experiments are called single-factor
because there is only one factor under test.
Single-factor experiments are more commonly
called one-way experiments.
Now we move to single-factor experiments with
more than two treatment levels.

Lets start with some notation.
Yij ith observation in the jth level
N total number of experimental
observations
the grand mean of all N
experimental observations
the mean of the observations
in the jth level
nj number of observations in the jth
level the nj are called replicates.
Replication of the design refers to using
more than one experimental unit for each level.

Designs are more powerful if they are
balanced, but balance is not always possible.
Suppose you are doing an experiment and the
equipment breaks down on one of the tests. Now,
not by design but by circumstance, you have
unequal numbers of replicates for the levels.
In all the formulas, we used nj as the number
of replicates in treatment j, not n, so there is
no problem.

Notation continued
the effect of the jth level
L number of treatment levels
eij the error associated with the ith
observation in the jth level,
assumed to be independent normally distributed
random variables with mean 0 and variance
s2, which are constant for all levels of the
factor.

For all experiments, randomization is
critical. So to draw any conclusions from the
experiment, we must require that the treatments
be applied in random order.
We must also assign the experimental units to
the treatments randomly.
If all this randomization occurs, the design
is called a completely randomized design.

ANOVA begins with a linear statistical model

This model is for a one-way or single-factor
ANOVA. The goal of the model is to test
hypotheses about the treatment effects and to
estimate them.
If the treatments have been selected by the
experimenter, the model is called a fixed-effects
model. In this case, the conclusions will apply
only to the treatments under consideration.

Another type of model is the random effects
model or components of variance model.
In this situation, the treatments used are a
random sample from large population of
treatments. Here the ti are random variables and
we are interested in their variability, not in
the differences among the means being tested.

First, we will talk about fixed effects,
completely randomized, balanced models.
In the model we showed earlier, the tj are
defined as deviations from the grand mean so
It follows that the mean of the jth treatment
is

Now the hypothesis under test is
Ho µ1 µ2 µ3 µL
Ha µj? µk for at least one j,k
pair
The test procedure is ANOVA, which is a
decomposition of the total sum of squares into
its components parts according to the model.

The total SS is
and ANOVA is about dividing it into its
component parts.
SS variability of the differences
among the L levels
SSe pooled variability of the random
error within levels

This is easy to see because
But the cross-product term vanishes because

So SStotal SS treatments SS error
Most of the time, this is called
SStotal SS between SS within
Each of these terms becomes an MS (mean
square) term when divided by the appropriate df.

The df for SSerror N-L because
and the df for SSbetween L-1 because
there are L levels.

Now the expected values of each of these terms
are
E(MSerror) s2
E(MStreatments)

Now if there are no differences among the
treatment means, then for all j.
So we can test for differences with our old
friend F
with L -1 and N -L df.
Under Ho, both numerator and denominator are
estimates of s2 so the result will not be
significant.
Under Ha, the result should be significant
because the numerator is estimating the treatment
effects as well as s2.

The results of an ANOVA are presented in an
ANOVA table. For this one-way, fixed-effects,
balanced model
Source SS df MS p
Model SSbetween L-1 MSbetween p
Error SSwithin N-L MSwithin
Total SStotal N-1

Lets look at a simple example.
A product engineer is investigating the
tensile strength of a synthetic fiber to make
mens shirts. He knows from prior experience
that the strength is affected by the weight
percent of cotton in the material. He also knows
that the percent should range between 10 and
40 so that the shirts can receive permanent
press treatment.

The engineer decides to test 5 levels
15, 20, 25, 30, 35
and to have 5 replicates in this design.
His data are

15 7 7 15 11 9 9.8
20 12 17 12 18 18 15.4
25 14 18 18 19 19 17.6
30 19 25 22 19 23 21.6
35 7 10 11 15 11 10.8
15.04
25

In this tensile strength example, the
ANOVA table is
In this case, we would reject Ho and declare
that there is an effect of the cotton weight
percent.

Source SS df MS p
Model 475.76 4 118.94 lt0.01 Error
161.20 20 8.06 Total 636.96
24
26

We can estimate the treatment parameters by
subtracting the grand mean from the treatment
means. In this example,
t1 9.80 15.04 -5.24
t2 15.40 15.04 0.36
t3 17.60 15.04 -2.56
t4 21.60 15.04 6.56
t5 10.80 15.04 -4.24
Clearly, treatment 4 is the best because it
provides the greatest tensile strength.

Now you could have computed these values from
the raw data yourself instead of doing the ANOVA.
You would get the same results, but you wouldnt
know if treatment 4 was significantly better.
But if you did a scatter diagram of the
original data, you would see that treatment 4 was
best, with no analysis whatsoever.
In fact, you should always look at the
original data to see if the results do make
sense. A scatter diagram of the raw data usually
tells as much as any analysis can.

28
(No Transcript)
29

How do you test the adequacy of the model?
The model assumes certain assumptions that must
hold for the ANOVA to be useful. Most
importantly, that the errors are distributed
normally and independently.
The error for each observation, sometimes
called the residual, is

A residual check is very important for testing
for nonconstant variance. The residuals should
be structureless, that is, they should have no
pattern whatsoever, which, in this case, they do
not.

These residuals show no extreme differences in
variation because they all have about the same
spread.
They also do not show the presence of any
outlier. An outlier is a residual value that is
vey much larger than any of the others. The
presence of an outlier can seriously jeopardize
the ANOVA, so if one is found, its cause should
be carefully investigated.

A histogram of residuals shows the
distribution is slightly skewed. Small
departures from symmetry are of less concern than
heavy tails.

Another check is for normality. If we do a
normal probability plot of the residuals, we can
see whether normality holds.

A normal probability plot is made with
ascending ordered residuals on the x-axis and
their cumulative probability points, 100(k-.5)/n,
on the y-axis. k is the order of the residual and
n number of residuals. There is no evidence of
an outlier here.
The previous slide is not exactly a normal
probability plot because the y-axis is not
scaled properly. But it does gives a pretty good
suggestion of linearity.

A plot of residuals vs run order is useful to
detect correlation between the residuals, a
violation of the independence assumption.
Runs of positive or of negative residuals
indicates correlation. None is observed here.

One of the goals of the analysis is to
estimate the level means. If the results of the
ANOVA shows that the factor is significant, we
know that at least one of the means stands out
from the rest. But which one or ones?
The procedures for making these mean
comparisons are called multiple comparison
methods. These methods use linear combinations
called contrasts.

A contrast is a particular linear combination
of level means, such as
to test the difference between level 4 and level
5.
Or if one wished to test the average of levels
1 and 3 vs levels 4 and 5, he would use
.
In general, where

An important case of contrasts is called
orthogonal contrasts. Two contrasts in a design
with coefficients cj and dj are orthogonal if

There are many ways to choose the orthogonal
contrast coefficients for a set of levels. For
example, if level 1 is a control and levels 2 and
3 are two real treatments, a logical choice is to
compare the average of the two treatments with
the control
and then the two treatments against one
another
These two contrasts are orthogonal because

Only L-1 orthogonal contrasts may be chosen
because the L levels have only L-1 df. So for
only three levels, the contrasts chosen exhaust
those available for this experiment.
Contrasts must be chosen before seeing the data
so that experimenters arent tempted to contrast
the levels with the greatest differences.

For the tensile strength experiment with 5
levels and thus 4 df, the 4 contrasts are
C1 0(5)(9.8)0(5)(15.4)0(5)(17.6)-1(5)(21.6)
1(5)(10.8) -54
C2 1(5)(9.8)0(5)(15.4)1(5)(17.6)-1(5)(21.6)-
1(5)(10.8) -25
C3 1(5)(9.8)0(5)(15.4)-1(5)(17.6)0(5)(21.6)
0(5)(10.8) -39
C4 -1(5)(9.8)4(5)(15.4)-1(5)(17.6)-1(5)(21.6)-
1(5)(10.8) 9
These 4 contrasts completely partition the
SStreatments. Then the SS for each contrast is
formed

So for the 4 contrasts we have

Now the revised ANOVA table is
Source SS df MS p
Weight 475.76 4 118.94 lt0.001
C1 291.60 1 291.60 lt0.001
C2 31.25 1 31.25 lt0.06
C3 152.10 1 152.10 lt0.001
C4 0.81 1 0.81 lt0.76
Error 161.20 20 8.06
Total 636.96 24

So contrast 1 (level 5 level 4) and contrast
3 (level 1 level 3) are significant.
Although the orthogonal contrast approach is
widely used, the experimenter may not know in
advance which levels to test or they may be
interested in more than L-1 comparisons. A
number of other methods are available for such
testing.

These methods include
Scheffes Method
Least Significant Difference Method
Duncans Multiple Range Test
Newman-Keuls test
There is some disagreement about which is the
best method, but it is best if all are applied
only after there is significance in the overall F
test.

Now lets look at the random effects model.
Suppose there is a factor of interest with an
extremely large number of levels. If the
experimenter selects L of these levels at random,
we have a random effects model or a components of
variance model.

The linear statistical model is
as before, except that both and
are random variables instead of simply .
Because and are independent, the variance
of any observation is
These two variances are called variance
components, hence the name of the model.

The requirements of this model are that the
are NID(0,s2), as before, and that the
are NID(0, ) and that and are
independent. The normality assumption is not
required in the random effects model.
As before,
SSTotal SStreatments SSerror
And the E(MSerror) s2.
But now E(MStreatments) s2 n
So the estimate of is

The computations and the ANOVA table are the
same as before, but the conclusions are quite
different.
Lets look at an example.
A textile company uses a large number of
looms. The process engineer suspects that the
looms are of different strength, and selects 4
looms at random to investigate this.

The results of the experiment are shown in the
table below.
The ANOVA table is
Source SS df MS
p
Looms 89.19 3 29.73 lt0.001
Error 22.75 12 1.90
Total 111.94 15

Loom
1 98 97 99 96 97.5
2 91 90 93 92 91.5
3 96 95 97 95 95.75
4 95 96 99 98 97.0
95.44
51

In this case, the estimates of the variances
are
1.90
Thus most of the variability in the
observations is due to variability in loom
strength. If you can isolate the causes of this
variability and eliminate them, you can reduce
the variability of the output and increase its
quality.

When we studied the differences between two
treatment means, we considered repeated measures
on the same individual experimental unit.
With three or more treatments, we can still do
this. The result is a repeated measures design.

Consider a repeated measures ANOVA partitioning
the SSTotal.
This is the same as
SStotal SSbetween subjects SSwithin
subjects
The within-subjects SS may be further
partitioned into SStreatment SSerror .

In this case, the first term on the RHS is the
differences between treatment effects and the
second term on the RHS is the random error.

Now the ANOVA table looks like this.
Source SS df MS p
Between subjects n-1
Within Subjects
n(L-1)
Treatments
L-1
Error
(L-1)(n-1)
Total
Ln-1

The test for treatment effects is the usual
but now it is done entirely within subjects.
This design is really a randomized complete
block design with subjects considered to be the
blocks.

Now what is a randomized complete blocks
design?
Blocking is a way to eliminate the effect of a
nuisance factor on the comparisons of interest.
Blocking can be used only if the nuisance factor
is known and controllable.

Lets use an illustration. Suppose we want to
test the effect of four different tips on the
readings from a hardness testing machine.
The tip is pressed into a metal test coupon,
and from the depth of the depression, the
hardness of the coupon can be measured.

The only factor is tip type and it has four
levels. If 4 replications are desired for each
tip, a completely randomized design would seem to
be appropriate.
This would require assigning each of the 4x4
16 runs randomly to 16 different coupons.
The only problem is that the coupons need to
be all of the same hardness, and if they are not,
then the differences in coupon hardness will
contribute to the variability observed.
Blocking is the way to deal with this problem.

In the block design, only 4 coupons are used
and each tip is tested on each of the 4 coupons.
So the blocking factor is the coupon, with 4
levels.
In this setup, the block forms a homogeneous
unit on which to test the tips.
This strategy improves the accuracy of the tip
comparison by eliminating variability due to
coupons.

Because all 4 tips are tested on each coupon,
the design is a complete block design. The data
from this design are shown below.

Test coupon Test coupon Test coupon Test coupon
Tip type 1 2 3 4
1 9.3 9.4 9.6 10.0
2 9.4 9.3 9.8 9.9
3 9.2 9.4 9.5 9.7
4 9.7 9.6 10.0 10.2
62

Now we analyze these data the same way we did
for the repeated measures design. The model is
where ßk is the effect of the kth block and the
rest of the terms are those we already know.

Since the block effects are deviations from the
grand mean,
just as

We can express the total SS as
which is equivalent to
SStotal SStreatments SSblocks SSerror
with df
N-1 L-1 B-1 (L-1)(B-1)

The test for equality of treatment means
is
and the ANOVA table is
Source SS df MS p
Treatments SStreatments L-1
MStreatments
Blocks SSblocks
B-1 MSblocks
Error SSerror
(L-1)(B-1) MSerror
Total SStotal
N-1

For the hardness experiment, the ANOVA table is
Source SS df MS p
Tip type 38.50 3 12.83 0.0009
Coupons 82.50 3 27.50
Error 8.00 9 .89
Total 129.00 15
As is obvious, this is the same analysis as the
repeated measures design.

Now lets consider the Latin Square design.
Well introduce it with an example.
The object of study is 5 different formulations
of a rocket propellant on the burning rate of
aircraft escape systems. Each formulation comes
from a batch of raw material large enough for
only 5 formulations. Moreover, the formulations
are prepared by 5 different operators, who differ
in skill and experience.

The way to test in this situation is with a
5x5 Latin Square, which allows for double
blocking and therefore the removal of two
nuisance factors. The Latin Square for this
example is

Batches of raw material Operators Operators Operators Operators Operators
Batches of raw material 1 2 3 4 5
1 A B C D E
2 B C D E A
3 C D E A B
4 D E A B C
5 E A B C D
69

Note that each row and each column has all 5
letters, and each letter occurs exactly once in
each row and column.
The statistical model for a Latin Square is
where Yjkl is the jth treatment observation in
the kth row and the lth column.

Again we have
SStotalSSrowsSScolumnsSStreatmentsSSerror
with df
N R-1 C-1 L-1 (R-2)(C-1)
The ANOVA table for propellant data is
Source SS df MS p
Formulations 330.00 4 82.50
0.0025
Material batches 68.00 4
17.00
Operators 150.00 4
37.50 0.04
Error 128.00 12
10.67
Total 676.00 24

So both the formulations and the operators
were significantly different. The batches of raw
material were not, but it still is a good idea to
block on them because they often are different.
This design was not replicated, and Latin
Squares often are not, but it is possible to put
n replicates in each cell.

Now if you superimposed one Latin Square on
another Latin Square of the same size, you would
get a Graeco-Latin Square. In one Latin Square,
the treatments are designated by roman letters.
In the other Latin Square, the treatments are
designated by Greek letters.
Hence the name Graeco-Latin Square.

A 5x5 Graeco-Latin Square is
Note that the five Greek treatments appear
exactly once in each row and column, just as the
Latin treatments did.

Batches of raw material Operators Operators Operators Operators Operators
Batches of raw material 1 2 3 4 5
1 Aa B? Ce Dß Ed
2 Bß Cd Da E? Ae
3 C? De Eß Ad Ba
4 Dd Ea A? Be Cß
5 Ee Aß Bd Ca D?
74

If Test Assemblies had been added as an
additional factor to the original propellant
experiment, the ANOVA table for propellant data
would be
Source SS df MS p
Formulations 330.00 4 82.50
0.0033
Material batches 68.00 4
17.00
Operators 150.00 4
37.50 0.0329
Test Assemblies 62.00 4 15.50
Error 66.00 8
8.25
Total 676.00 24
The test assemblies turned out to be
nonsignificant.

Note that the ANOVA tables for the Latin Square
and the Graeco-Latin Square designs are
identical, except for the error term.
The SS(error) for the Latin Square design was
decomposed to be both Test Assemblies and error
in the Graeco-Latin Square. This is a good
example of how the error term is really a
residual. Whatever isnt controlled falls into
error.

Before we leave one-way designs, we should look
at the regression approach to ANOVA. The model
is
Using the method of least squares, we rewrite
this as

Now to find the LS estimates of µ and tj,
When we do this differentiation with respect to
µ and tj, and equate to 0, we obtain
for all j

After simplification, these reduce to
In these equations,

These j 1 equations are called the least
squares normal equations.
If we add the constraint
we get a unique solution to these normal
equations.

It is important to see that ANOVA designs are
simply regression models. If we have a one-way
design with 3 levels, the regression model is
where Xi1 1 if from level 1
0 otherwise
and Xi2 1 if from level 2
0 otherwise
Although the treatment levels may be
qualitative, they are treated as dummy
variables.

Since Xi1 1 and Xi2 0,
so
Similarly, if the observations are from level
2,
so

Finally, consider observations from level 3,
for which Xi1 Xi2 0. Then the regression
model becomes
so
Thus in the regression model formulation of
the one-way ANOVA, the regression coefficients
describe comparisons of the first two level means
with the third.

So
Thus, testing ß1 ß2 0 provides a test of
the equality of the three means.
In general, for L levels, the regression model
will have L-1 variables
and

Now what if you have two factors under test?
Or three?
Here the answer is the factorial design. A
factorial design crosses all factors. Lets take
a two-way design. If there are L levels of
factor A and M levels of factor B, then all LM
treatment combinations appear in the experiment.
Most commonly, L M 2.

In a two-way design, with two levels of each
factor, we have
We can have as many replicates as we want in
this design. With n replicates, there are n
observations in each cell of the design.

Factor A Factor B Response
-1 (low level) -1 (low level) 20
1 (high level) -1 (low level) 50
-1 (low level) 1 (high level) 40
1 (high level) 1 (high level) 12
86

SStotal SSA SSB SSAB SSerror
This decomposition should be familiar by now
except for SSAB. What is this term? Its
official name is interaction.
This is the magic of factorial designs. We
find out about not only the effect of factor A
and the effect of factor B, but the effect of the
two factors in combination.

Now lets look at the main effects of the
factors graphically.

Now lets look at the interaction effect. This
is the effect of factors A and B in combination,
and is often the most important effect.

Interaction of factors is the key to the East,
as we say in the West.
Suppose you wanted the factor levels that give
the lowest possible response. If you picked by
main effects, you would pick A low and B high.
But look at the interaction plot and it will
tell you to pick A high and B high.

This is why, if the interaction term is
significant, you never interpret main effects.
They are meaningless in the presence of
interaction.
And it is because factorial designs provide
interactions that they are so popular and so
successful.