Title: 7' Design of Experiments I Diseo de Experimentos I
17. Design of Experiments I Diseño de
Experimentos I
- Profesor Simon Wilson
- Departamento de Estadística y Econometría
2The Real World and the Laboratory...
- Are not the same!!
- In the real world in industry etc. we
collect data under the same conditions and we
often get very different results - For example, a company makes hard drives in
batches / lotes of 10, using identical machines
in the same factory. - The reliability / fiabilidad of the hard drive is
the amount of time until the disc fails (i.e.
fails to read or write a file, etc.) - The company thinks that the temperature of the
room where the drives are assembled can affect
the reliability
3The Real World and the Laboratory...
- It does an experiment to see how much temperature
affects reliability. It makes 3 batches of 10
hard drives at 20C, 30C and 35C. It observes how
long until they fail. The data is on the next
slide. - This is an ANOVA problem we have 3 levels of
temperature and want to see if there is a
difference in the mean time to failure between
the groups. - However, the experimental error in these data
is very large relative to the difference in
failure due to temperature . We cannot detect
any difference between the groups. We would
always accept H0, even though there might be a
difference in reliability with temp.
4The Real World and the Laboratory...
5What does Design of Experiments do?
- This is very common in the F-test. When we
accept H0 - Perhaps there really is no difference in the
means - Or, there is a difference in the means, but we
cannot see it because s is too big. - It is impossible to know which of these is
actually true. - However, if we can reduce the experimental error,
perhaps we can detect the difference. - Design of experiments tries to study problems
like these, so that the experimental error is as
small as possible.
6What does Design of Experiments do?
- Design of experiments is a method for making
comparisons as equal as possible, so that there
is a better chance of detecting factors that
affect the characteristic of interest. - La metodología de diseño de experimentos estudia
cómo realizar comparaciones lo más homogéneas
posibles, para aumentar la probabilidad de
identificar variables influyentes.
7Why do we often have a large experimental error?
(1)
- Usually because there are factors in the
production that the company cannot measure and
cannot control. - The variability is so great that this hides any
effect of the factors that interest us (like
temperature) - Why so much variability?
- The Friday effect
- The 7pm effect
- Materials, etc.
8Why do we often have a large experimental error?
(2)
- Example we are investigating the effect of
marriage status on salary. It is possible that
marriage status affects salary, but there are
many other factors that also affect it, such as - Level of education
- Age
- Sex
- Where you live
- If you have been unemployed
- It may be impossible to detect the effect of
marriage status on salary because the effect of
all these other factors hides it.
9Design of Experiments some definitions (1)
- The response variable / variable respuesta is
the variable of interest. We want to know how
this variable changes as a function of other
variables (i.e. the response is the time to
failure in the hard disc example) - The factors / factores or experimental variables
/ variables experimentales are those variables
that we think will affect the value of the
response (i.e. temperature) - We only observe the response variable
- However, we control the value of the factors,
then observe the value of the response
10Design of Experiments some definitions (2)
- Also, here we suppose that
- the response is continuous (like time to failure)
- the factors are discrete they have different
levels, exactly like in ANOVA. i.e. 3 levels of
temperature, 4 levels of marriage status
11Solving the problem of unknown factors
- As I have said, in all experiments like this,
there are a large number of factors that we
cannot control and that we cannot measure. - These contribute to the experimental error that
we are trying to reduce. - There are 3 ways that we can use to control and
reduce this problem randomization /
aleatorización, repetition / repetición and
factorial design / diseños factoriales
12The principle of randomization / el
principio de aleatorización (1)
- We assign all the factors that we do not control
by chance to the observations - Los factores no controlados se asignan al azar a
las observaciones - Randomization
- Prevents biases / sasgos in the observations
- Makes the observations independent (or, at least,
less dependent) - Confirms the validity of many common statistical
methods
13The principle of randomization (2)
- Example there are 2 machines that make the hard
drives. This is a factor that we are not
interested in. - Suppose that we make all the drives at 20C on
machine 1, and all the drives at 30C on machine
2. - Then we do not know if differences in reliability
are because of temperature or machine! - However, if we randomly assign each drive at each
temperature to a machine (by throwing a coin, for
example), we do not have this problem.
14The principle of randomization (3)
- Now! Is it not better to assign 5 of the discs at
20C to machine 1, and 5 to machine 2? - If we do this, we have shared the effect of
machine equally for all the temperatures. - Yes, this is better IF machine is the only other
factor - But, it never is! We can pass the rest of our
lives thinking about other factors, but well
almost certainly never think of all of them. We
never know all the factors that affect our
response.
15The principle of randomization (4)
- Further, to divide up the observations between
the factors that we are not interested in, we
need at least 1 observation for each combination
of factors. - Randomization works much better in general, since
it will reduce the effect of all possible
factors.
16Repetition / repetción
- The variance in the sample mean is s2 / n.
- So, if we increase n, we estimate the means with
more accuracy, and so can distinguish better the
effect of factors.
17Factorial design / diseños factoriales (1)
- Clearly, when we measure reliability as a
function of temperature, we try to make all the
other factors as equal as possible. - If a factor (like machine) affects the response,
we have two options - Use the same machine in all experiments. In
general, eliminate all other variables that
affect the reponse. This is called the classical
design / diseño clásico method.
18Factorial design (2)
- Use the different machines for each factor, and
compare reliability with temperature by taking
the mean obtained with the different machines. - In general, introduce the factors that can
affect the response into the experiment, and take
the average of the obervations with respect to
that factor. This is called the modern
experimental / experimentación moderna method.
19Factorial design (3)
- In factorial design, we follow the modern
experimental method. We cross all possible
combinations of the factor that interests us,
with the one that does not. We can see this in a
table
20Factorial design (4) problems with the
classical design method
- Suppose Machine 1 works better at high
temperature and Machine 2 works better at low
temperature, and we want to choose both the best
temperature and the best machine. - In classical design, we choose one machine (say,
no. 1) and measure the reliability of drives at
the 32 temperatures. We see an increase in
reliability with increase in temperature.
Highest temperature is best. - To choose the best machine, we choose one
temperature and observe the reliability of the
drives from machine 1 and machine 2. Machine 2
is better.
21Factorial design (5) problems with the
classical design method
- So, it appears that machine 2 and high
temperature is the best combination. - However, this is wrong, since machine 2 works
better at lower temperatures! The diagram on the
next page explains what has happened. - The problem is that classical design assumes that
we can add the effect of the two factors. - But, in reality, we cannot do this, because the
effect of temperature changes with machine (and
vice versa)
22Factorial design (6)
23Factorial design (7) interaction
- We call this interaction / interacción. Machine
and temperature interact. - If there is no interaction, this classical method
will work, - However, if there is interaction, we need the
modern experimental method to find the best
combination of machine and temperature. - If we look at all combinations of machine and
temperature, then we discover the best
combination.
24Factorial design (8) blocking variables
- A factor that
- Does not interest us
- But we incluide in the experiment to obtain more
equal comparisons (with the modern experimental
method) - is called a blocking variable / variable bloque.
- For example, machine is a blocking variable in
our hard drive reliability example.