Title: IE341: Introduction to Design of Experiments
1IE341 Introduction to Design of Experiments
2- Last term we talked about testing the
difference between two independent means. For
means from a normal population, the test
statistic is -
-
-
3- We also covered the case where the two means
are not independent, and what we must do to
account for the fact that they are dependent.
4- And finally, we talked about the difference
between two variances, where we used the F ratio.
The F distribution is a ratio of two chi-square
variables. So if s21 and s22 possess
independent chi-square distributions with v1 and
v2 df, respectively, then -
- has the F distribution with v1 and v2 df.
5-
- All of this is valuable if we are testing
only two means. But what if we want to test to
see if there is a difference among three means,
or four, or ten? - What if we want to know whether fertilizer
A or fertilizer B or fertilizer C is best? In
this case, fertilizer is called a factor, which
is the condition under test. - A, B, C, the three types of fertilizer
under test, are called levels of the factor
fertilizer. -
- Or what if we want to know if treatment A
or treatment B or treatment C or treatment D is
best? In this case, treatment is called a
factor. - A,B,C,D, the four types of treatment under
test, are called levels of the factor treatment. - It should be noted that the factor may be
quantitative or qualitative. -
-
-
-
6- Enter the analysis of variance!
- ANOVA, as it is usually called, is a way to
test the differences between means in such
situations. - Previously, we tested single-factor
experiments with only two treatment levels.
These experiments are called single-factor
because there is only one factor under test.
Single-factor experiments are more commonly
called one-way experiments. -
- Now we move to single-factor experiments with
more than two treatment levels. -
-
7- Lets start with some notation.
- Yij ith observation in the jth level
-
- N total number of experimental
observations -
- the grand mean of all N
experimental observations -
-
- the mean of the observations
in the jth level -
- nj number of observations in the jth
level the nj are called replicates. -
- Replication of the design refers to using
more than one experimental unit for each level.
8- Designs are more powerful if they are
balanced, but balance is not always possible. - Suppose you are doing an experiment and the
equipment breaks down on one of the tests. Now,
not by design but by circumstance, you have
unequal numbers of replicates for the levels. - In all the formulas, we used nj as the number
of replicates in treatment j, not n, so there is
no problem.
9- Notation continued
- the effect of the jth level
- L number of treatment levels
- eij the error associated with the ith
observation in the jth level,
assumed to be independent normally distributed
random variables with mean 0 and variance
s2, which are constant for all levels of the
factor. -
10- For all experiments, randomization is
critical. So to draw any conclusions from the
experiment, we must require that the treatments
be applied in random order. - We must also assign the experimental units to
the treatments randomly. - If all this randomization occurs, the design
is called a completely randomized design.
11- ANOVA begins with a linear statistical model
12- This model is for a one-way or single-factor
ANOVA. The goal of the model is to test
hypotheses about the treatment effects and to
estimate them. - If the treatments have been selected by the
experimenter, the model is called a fixed-effects
model. In this case, the conclusions will apply
only to the treatments under consideration.
13- Another type of model is the random effects
model or components of variance model. - In this situation, the treatments used are a
random sample from large population of
treatments. Here the ti are random variables and
we are interested in their variability, not in
the differences among the means being tested.
14- First, we will talk about fixed effects,
completely randomized, balanced models. - In the model we showed earlier, the tj are
defined as deviations from the grand mean so
-
- It follows that the mean of the jth treatment
is
15- Now the hypothesis under test is
- Ho µ1 µ2 µ3 µL
-
- Ha µj? µk for at least one j,k
pair - The test procedure is ANOVA, which is a
decomposition of the total sum of squares into
its components parts according to the model.
16-
- The total SS is
- and ANOVA is about dividing it into its
component parts. - SS variability of the differences
among the L levels - SSe pooled variability of the random
error within levels -
17- This is easy to see because
- But the cross-product term vanishes because
18- So SStotal SS treatments SS error
- Most of the time, this is called
- SStotal SS between SS within
- Each of these terms becomes an MS (mean
square) term when divided by the appropriate df. -
19- The df for SSerror N-L because
-
- and the df for SSbetween L-1 because
there are L levels.
20-
- Now the expected values of each of these terms
are -
- E(MSerror) s2
-
- E(MStreatments)
-
-
21- Now if there are no differences among the
treatment means, then for all j. - So we can test for differences with our old
friend F -
-
- with L -1 and N -L df.
-
- Under Ho, both numerator and denominator are
estimates of s2 so the result will not be
significant. - Under Ha, the result should be significant
because the numerator is estimating the treatment
effects as well as s2. -
22- The results of an ANOVA are presented in an
ANOVA table. For this one-way, fixed-effects,
balanced model - Source SS df MS p
- Model SSbetween L-1 MSbetween p
- Error SSwithin N-L MSwithin
- Total SStotal N-1
23- Lets look at a simple example.
- A product engineer is investigating the
tensile strength of a synthetic fiber to make
mens shirts. He knows from prior experience
that the strength is affected by the weight
percent of cotton in the material. He also knows
that the percent should range between 10 and
40 so that the shirts can receive permanent
press treatment.
24- The engineer decides to test 5 levels
- 15, 20, 25, 30, 35
- and to have 5 replicates in this design.
- His data are
15 7 7 15 11 9 9.8
20 12 17 12 18 18 15.4
25 14 18 18 19 19 17.6
30 19 25 22 19 23 21.6
35 7 10 11 15 11 10.8
15.04
25- In this tensile strength example, the
ANOVA table is - In this case, we would reject Ho and declare
that there is an effect of the cotton weight
percent.
Source SS df MS p
Model 475.76 4 118.94 lt0.01 Error
161.20 20 8.06 Total 636.96
24
26- We can estimate the treatment parameters by
subtracting the grand mean from the treatment
means. In this example, - t1 9.80 15.04 -5.24
- t2 15.40 15.04 0.36
- t3 17.60 15.04 -2.56
- t4 21.60 15.04 6.56
- t5 10.80 15.04 -4.24
- Clearly, treatment 4 is the best because it
provides the greatest tensile strength.
27- Now you could have computed these values from
the raw data yourself instead of doing the ANOVA.
You would get the same results, but you wouldnt
know if treatment 4 was significantly better. - But if you did a scatter diagram of the
original data, you would see that treatment 4 was
best, with no analysis whatsoever. - In fact, you should always look at the
original data to see if the results do make
sense. A scatter diagram of the raw data usually
tells as much as any analysis can.
28(No Transcript)
29- How do you test the adequacy of the model?
- The model assumes certain assumptions that must
hold for the ANOVA to be useful. Most
importantly, that the errors are distributed
normally and independently. - The error for each observation, sometimes
called the residual, is
30- A residual check is very important for testing
for nonconstant variance. The residuals should
be structureless, that is, they should have no
pattern whatsoever, which, in this case, they do
not.
31- These residuals show no extreme differences in
variation because they all have about the same
spread. - They also do not show the presence of any
outlier. An outlier is a residual value that is
vey much larger than any of the others. The
presence of an outlier can seriously jeopardize
the ANOVA, so if one is found, its cause should
be carefully investigated.
32- A histogram of residuals shows the
distribution is slightly skewed. Small
departures from symmetry are of less concern than
heavy tails.
33- Another check is for normality. If we do a
normal probability plot of the residuals, we can
see whether normality holds.
34- A normal probability plot is made with
ascending ordered residuals on the x-axis and
their cumulative probability points, 100(k-.5)/n,
on the y-axis. k is the order of the residual and
n number of residuals. There is no evidence of
an outlier here. - The previous slide is not exactly a normal
probability plot because the y-axis is not
scaled properly. But it does gives a pretty good
suggestion of linearity.
35- A plot of residuals vs run order is useful to
detect correlation between the residuals, a
violation of the independence assumption. - Runs of positive or of negative residuals
indicates correlation. None is observed here.
36- One of the goals of the analysis is to
estimate the level means. If the results of the
ANOVA shows that the factor is significant, we
know that at least one of the means stands out
from the rest. But which one or ones? - The procedures for making these mean
comparisons are called multiple comparison
methods. These methods use linear combinations
called contrasts. -
37- A contrast is a particular linear combination
of level means, such as
to test the difference between level 4 and level
5. - Or if one wished to test the average of levels
1 and 3 vs levels 4 and 5, he would use
. - In general, where
38- An important case of contrasts is called
orthogonal contrasts. Two contrasts in a design
with coefficients cj and dj are orthogonal if -
39- There are many ways to choose the orthogonal
contrast coefficients for a set of levels. For
example, if level 1 is a control and levels 2 and
3 are two real treatments, a logical choice is to
compare the average of the two treatments with
the control - and then the two treatments against one
another - These two contrasts are orthogonal because
40- Only L-1 orthogonal contrasts may be chosen
because the L levels have only L-1 df. So for
only three levels, the contrasts chosen exhaust
those available for this experiment. - Contrasts must be chosen before seeing the data
so that experimenters arent tempted to contrast
the levels with the greatest differences.
41- For the tensile strength experiment with 5
levels and thus 4 df, the 4 contrasts are - C1 0(5)(9.8)0(5)(15.4)0(5)(17.6)-1(5)(21.6)
1(5)(10.8) -54 - C2 1(5)(9.8)0(5)(15.4)1(5)(17.6)-1(5)(21.6)-
1(5)(10.8) -25 - C3 1(5)(9.8)0(5)(15.4)-1(5)(17.6)0(5)(21.6)
0(5)(10.8) -39 - C4 -1(5)(9.8)4(5)(15.4)-1(5)(17.6)-1(5)(21.6)-
1(5)(10.8) 9 - These 4 contrasts completely partition the
SStreatments. Then the SS for each contrast is
formed
42- So for the 4 contrasts we have
43- Now the revised ANOVA table is
- Source SS df MS p
- Weight 475.76 4 118.94 lt0.001
- C1 291.60 1 291.60 lt0.001
- C2 31.25 1 31.25 lt0.06
- C3 152.10 1 152.10 lt0.001
- C4 0.81 1 0.81 lt0.76
- Error 161.20 20 8.06
- Total 636.96 24
44- So contrast 1 (level 5 level 4) and contrast
3 (level 1 level 3) are significant. - Although the orthogonal contrast approach is
widely used, the experimenter may not know in
advance which levels to test or they may be
interested in more than L-1 comparisons. A
number of other methods are available for such
testing.
45- These methods include
- Scheffes Method
- Least Significant Difference Method
- Duncans Multiple Range Test
- Newman-Keuls test
-
- There is some disagreement about which is the
best method, but it is best if all are applied
only after there is significance in the overall F
test.
46- Now lets look at the random effects model.
- Suppose there is a factor of interest with an
extremely large number of levels. If the
experimenter selects L of these levels at random,
we have a random effects model or a components of
variance model.
47- The linear statistical model is
-
- as before, except that both and
- are random variables instead of simply .
- Because and are independent, the variance
of any observation is -
- These two variances are called variance
components, hence the name of the model.
48- The requirements of this model are that the
are NID(0,s2), as before, and that the
are NID(0, ) and that and are
independent. The normality assumption is not
required in the random effects model. - As before,
- SSTotal SStreatments SSerror
- And the E(MSerror) s2.
- But now E(MStreatments) s2 n
- So the estimate of is
49- The computations and the ANOVA table are the
same as before, but the conclusions are quite
different. - Lets look at an example.
- A textile company uses a large number of
looms. The process engineer suspects that the
looms are of different strength, and selects 4
looms at random to investigate this.
50- The results of the experiment are shown in the
table below. - The ANOVA table is
- Source SS df MS
p - Looms 89.19 3 29.73 lt0.001
- Error 22.75 12 1.90
- Total 111.94 15
Loom
1 98 97 99 96 97.5
2 91 90 93 92 91.5
3 96 95 97 95 95.75
4 95 96 99 98 97.0
95.44
51- In this case, the estimates of the variances
are - 1.90
- Thus most of the variability in the
observations is due to variability in loom
strength. If you can isolate the causes of this
variability and eliminate them, you can reduce
the variability of the output and increase its
quality.
52- When we studied the differences between two
treatment means, we considered repeated measures
on the same individual experimental unit. - With three or more treatments, we can still do
this. The result is a repeated measures design.
53- Consider a repeated measures ANOVA partitioning
the SSTotal. - This is the same as
- SStotal SSbetween subjects SSwithin
subjects - The within-subjects SS may be further
partitioned into SStreatment SSerror . -
54- In this case, the first term on the RHS is the
differences between treatment effects and the
second term on the RHS is the random error.
55- Now the ANOVA table looks like this.
- Source SS df MS p
- Between subjects n-1
-
- Within Subjects
n(L-1) -
- Treatments
L-1 -
- Error
(L-1)(n-1) -
- Total
Ln-1
56- The test for treatment effects is the usual
- but now it is done entirely within subjects.
- This design is really a randomized complete
block design with subjects considered to be the
blocks.
57- Now what is a randomized complete blocks
design? -
- Blocking is a way to eliminate the effect of a
nuisance factor on the comparisons of interest.
Blocking can be used only if the nuisance factor
is known and controllable.
58- Lets use an illustration. Suppose we want to
test the effect of four different tips on the
readings from a hardness testing machine. - The tip is pressed into a metal test coupon,
and from the depth of the depression, the
hardness of the coupon can be measured.
59- The only factor is tip type and it has four
levels. If 4 replications are desired for each
tip, a completely randomized design would seem to
be appropriate. -
- This would require assigning each of the 4x4
16 runs randomly to 16 different coupons. - The only problem is that the coupons need to
be all of the same hardness, and if they are not,
then the differences in coupon hardness will
contribute to the variability observed. - Blocking is the way to deal with this problem.
-
60- In the block design, only 4 coupons are used
and each tip is tested on each of the 4 coupons.
So the blocking factor is the coupon, with 4
levels. - In this setup, the block forms a homogeneous
unit on which to test the tips. - This strategy improves the accuracy of the tip
comparison by eliminating variability due to
coupons.
61- Because all 4 tips are tested on each coupon,
the design is a complete block design. The data
from this design are shown below.
Test coupon Test coupon Test coupon Test coupon
Tip type 1 2 3 4
1 9.3 9.4 9.6 10.0
2 9.4 9.3 9.8 9.9
3 9.2 9.4 9.5 9.7
4 9.7 9.6 10.0 10.2
62- Now we analyze these data the same way we did
for the repeated measures design. The model is - where ßk is the effect of the kth block and the
rest of the terms are those we already know.
63- Since the block effects are deviations from the
grand mean, - just as
-
64- We can express the total SS as
-
- which is equivalent to
- SStotal SStreatments SSblocks SSerror
- with df
- N-1 L-1 B-1 (L-1)(B-1)
65- The test for equality of treatment means
- is
- and the ANOVA table is
- Source SS df MS p
- Treatments SStreatments L-1
MStreatments - Blocks SSblocks
B-1 MSblocks - Error SSerror
(L-1)(B-1) MSerror - Total SStotal
N-1
66- For the hardness experiment, the ANOVA table is
- Source SS df MS p
- Tip type 38.50 3 12.83 0.0009
- Coupons 82.50 3 27.50
- Error 8.00 9 .89
- Total 129.00 15
- As is obvious, this is the same analysis as the
repeated measures design.
67- Now lets consider the Latin Square design.
Well introduce it with an example. - The object of study is 5 different formulations
of a rocket propellant on the burning rate of
aircraft escape systems. Each formulation comes
from a batch of raw material large enough for
only 5 formulations. Moreover, the formulations
are prepared by 5 different operators, who differ
in skill and experience.
68- The way to test in this situation is with a
5x5 Latin Square, which allows for double
blocking and therefore the removal of two
nuisance factors. The Latin Square for this
example is
Batches of raw material Operators Operators Operators Operators Operators
Batches of raw material 1 2 3 4 5
1 A B C D E
2 B C D E A
3 C D E A B
4 D E A B C
5 E A B C D
69- Note that each row and each column has all 5
letters, and each letter occurs exactly once in
each row and column. - The statistical model for a Latin Square is
- where Yjkl is the jth treatment observation in
the kth row and the lth column.
70- Again we have
- SStotalSSrowsSScolumnsSStreatmentsSSerror
- with df
- N R-1 C-1 L-1 (R-2)(C-1)
- The ANOVA table for propellant data is
- Source SS df MS p
- Formulations 330.00 4 82.50
0.0025 - Material batches 68.00 4
17.00 - Operators 150.00 4
37.50 0.04 - Error 128.00 12
10.67 - Total 676.00 24
71- So both the formulations and the operators
were significantly different. The batches of raw
material were not, but it still is a good idea to
block on them because they often are different. - This design was not replicated, and Latin
Squares often are not, but it is possible to put
n replicates in each cell.
72- Now if you superimposed one Latin Square on
another Latin Square of the same size, you would
get a Graeco-Latin Square. In one Latin Square,
the treatments are designated by roman letters.
In the other Latin Square, the treatments are
designated by Greek letters. - Hence the name Graeco-Latin Square.
73- A 5x5 Graeco-Latin Square is
- Note that the five Greek treatments appear
exactly once in each row and column, just as the
Latin treatments did.
Batches of raw material Operators Operators Operators Operators Operators
Batches of raw material 1 2 3 4 5
1 Aa B? Ce Dß Ed
2 Bß Cd Da E? Ae
3 C? De Eß Ad Ba
4 Dd Ea A? Be Cß
5 Ee Aß Bd Ca D?
74- If Test Assemblies had been added as an
additional factor to the original propellant
experiment, the ANOVA table for propellant data
would be -
- Source SS df MS p
- Formulations 330.00 4 82.50
0.0033 - Material batches 68.00 4
17.00 - Operators 150.00 4
37.50 0.0329 - Test Assemblies 62.00 4 15.50
- Error 66.00 8
8.25 - Total 676.00 24
- The test assemblies turned out to be
nonsignificant.
75- Note that the ANOVA tables for the Latin Square
and the Graeco-Latin Square designs are
identical, except for the error term. - The SS(error) for the Latin Square design was
decomposed to be both Test Assemblies and error
in the Graeco-Latin Square. This is a good
example of how the error term is really a
residual. Whatever isnt controlled falls into
error.
76- Before we leave one-way designs, we should look
at the regression approach to ANOVA. The model
is - Using the method of least squares, we rewrite
this as
77- Now to find the LS estimates of µ and tj,
- When we do this differentiation with respect to
µ and tj, and equate to 0, we obtain - for all j
-
78- After simplification, these reduce to
-
- In these equations,
-
79- These j 1 equations are called the least
squares normal equations. -
- If we add the constraint
- we get a unique solution to these normal
equations.
80- It is important to see that ANOVA designs are
simply regression models. If we have a one-way
design with 3 levels, the regression model is -
- where Xi1 1 if from level 1
- 0 otherwise
- and Xi2 1 if from level 2
- 0 otherwise
- Although the treatment levels may be
qualitative, they are treated as dummy
variables.
81- Since Xi1 1 and Xi2 0,
- so
- Similarly, if the observations are from level
2, - so
82- Finally, consider observations from level 3,
for which Xi1 Xi2 0. Then the regression
model becomes - so
- Thus in the regression model formulation of
the one-way ANOVA, the regression coefficients
describe comparisons of the first two level means
with the third.
83- So
-
- Thus, testing ß1 ß2 0 provides a test of
the equality of the three means. - In general, for L levels, the regression model
will have L-1 variables -
- and
-
84- Now what if you have two factors under test?
Or three? - Here the answer is the factorial design. A
factorial design crosses all factors. Lets take
a two-way design. If there are L levels of
factor A and M levels of factor B, then all LM
treatment combinations appear in the experiment. - Most commonly, L M 2.
85- In a two-way design, with two levels of each
factor, we have - We can have as many replicates as we want in
this design. With n replicates, there are n
observations in each cell of the design. -
Factor A Factor B Response
-1 (low level) -1 (low level) 20
1 (high level) -1 (low level) 50
-1 (low level) 1 (high level) 40
1 (high level) 1 (high level) 12
86- SStotal SSA SSB SSAB SSerror
- This decomposition should be familiar by now
except for SSAB. What is this term? Its
official name is interaction. - This is the magic of factorial designs. We
find out about not only the effect of factor A
and the effect of factor B, but the effect of the
two factors in combination.
87- Now lets look at the main effects of the
factors graphically.
88- Now lets look at the interaction effect. This
is the effect of factors A and B in combination,
and is often the most important effect.
89- Interaction of factors is the key to the East,
as we say in the West. - Suppose you wanted the factor levels that give
the lowest possible response. If you picked by
main effects, you would pick A low and B high. - But look at the interaction plot and it will
tell you to pick A high and B high.
90- This is why, if the interaction term is
significant, you never interpret main effects.
They are meaningless in the presence of
interaction. - And it is because factorial designs provide
interactions that they are so popular and so
successful.
91- Now what if the interaction term is not
significant? What if the results instead were
92- and the interaction is
-
- The clearest indication of no interaction is
the parallel lines.
93- So this time, if you wanted the lowest
response, you would pick A low and B low and that
would be correct.