Module 3: Experimental Design - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Module 3: Experimental Design

Description:

... instruments, all low pressure runs conducted on a cool day, ... LL. CHEE418/801 Fall 2006. J. McLellan. Module 3- 35. Using Regression to Estimate Effects ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 74
Provided by: jamesmc9
Category:

less

Transcript and Presenter's Notes

Title: Module 3: Experimental Design


1
Module 3 Experimental Design
2
Outline - Module 3
  • definition and motivation
  • limitations of routine operating data
  • terminology
  • considerations in planning an investigation
  • limitations of one-variable-at-a-time strategies
  • two-level factorial designs

3
Definition
  • An experimental design is a disciplined plan for
    collecting data
  • What should we observe, and how should we perturb
    the process?
  • How can we maximize the information content of
    the data?

4
Motivation
  • Experimental design is an integral component of
    quality improvement, and supports improvement in
  • product design
  • process design
  • process operation

5
Process Investigations
The Course

Process Operation
Experimental Design
Statistically Designed Expts.
Data
Statistical Analysis
Regression Analysis
Information
Insight Through Experience
Knowledge
Quality Improvement
Improve Process/ Product Performance
6
The Iterative Nature of Process Investigations

Identify Objectives
Design Data Collection
Collect Data
Analyze Data
7
Why not use routine operating data?
  • Routine operating data frequently does not
    contain sufficient information of interest, due
    to
  • limited range of operating variables due to tight
    control
  • values dont vary significantly, so the effects
    of the variables cant be seen
  • systematic relationships between operating
    variables
  • arising from process control and/or other
    operating policies
  • coincidental (correlation) relationships that
    dont necessarily represent cause and effect

8
Active vs. Passive Data Collection
  • Active Data Collection
  • we actively intervene in the process, and cause
    changes
  • Passive Data Collection
  • we passively observe, without introducing any
    perturbations into the process
  • The only way to ensure that our observations
    represent cause and effect is to introduce
    perturbations (causes) and observe the
    responses (effects).

9
Terminology
  • Responses - measurable outcomes of interest
  • frequently have more than one response
  • e.g., melt grafting - degree of grafting,
    grafting efficiency
  • Factors - controllable variable thought to have
    influence on response
  • deliberately manipulated to determine effect on
    response
  • Level - value or setting of a factor
  • Test run - set of factor level combinations for
    one experimental run

10
Terminology
  • Covariates - variables affecting process or
    product performance which cannot or are not
    controlled
  • Extraneous variation - variation in measured
    response values in an experiment attributable to
    sources other than deliberate changes in the
    levels of the factors
  • Design - selection of test run factor levels -
    set of experimental runs
  • Effect - of factors on response, measured by
    change in average response under two or more
    factor level combinations

11
Example - Wave Solder Process
  • Scenario - wave solder process producing too many
    defective items - investigate extent to which
    conveyor speed, solder pot temperature and flux
    density affect incidence of defects
  • Response -
  • Factors -
  • Potential noise variables -

12
Considerations in Planning an Investigation
  • What are the objectives of the investigation?
  • What are the performance characteristics of
    interest?
  • What responses will be used to assess these
    characteristics?
  • What factors will be deliberately manipulated?
  • What is the feasible operating region?
  • What is the operating region of interest?
  • What other variables may influence the results?
  • Will future additional tests be possible?
  • What sets of operating conditions are to be
    tested?
  • In what order will the test runs be carried out?

13
Considerations in Planning an Investigation
  • Factor effects -
  • eliminate systematic bias by including as much as
    possible all factors suspected of having an
    effect - e.g., in a screening study, to
    identify which factors do have a significant
    effect
  • what types of relationships do we think exist?
  • quadratic, or linear?

14
One-variable-at-a-time investigations
  • Starting at a nominal point, conduct experiments
    in which the first factor is varied, then conduct
    experiments in which only the second factor is
    varied, and so forth
  • Example - reactor yield vs. temperature,
    concentration

maximum yield
contours of constant reactor yield

temperature (C)
concentration ()
15
One-variable-at-a-time investigations
  • If we conduct one sequence of a
    one-variable-at-a-time investigation (i.e.,
    conduct an experimental sequence in each factor
    only once, rather than repeat the sequence of
    experiments), we will not locate the true value
    of the point of maximum yield
  • one factor at a time testing does NOT account for
    possible interactions between the effects of the
    variables - two-factor interactions
  • the yield surface contours are rotated ellipses,
    which have cross-product terms --gt two-factor
    interactions

16
Two-Level Factorial Designs
  • Suppose the we have k factors being
    investigated, and we have a region of interest
    defined by low and high limits for each factor

H
range for temperature
temperature
L
L
H
concentration
range for concentration
17
Two-Level Factorial Designs
  • we conduct an experiment at every combination of
    high and low values for all factors

H
Runs Coded Values L,L -1,-1 L,H -1,1 H,L 1,-1 H
,H 1,1
temperature
L
L
H
concentration
18
Coding
  • The standard coding is
  • so that -
  • -1 corresponds to the low limit of interest
  • 1 corresponds to the upper limit of interest
  • note that the average of the upper and low limit
    is essentially the midpoint of the interval of
    interest

19
Coding for Qualitative Factors
  • Sometimes the factors under consideration are not
    numerical, but are instead qualitative
  • catalyst types A and B
  • catalyst preparation method I, II
  • suppliers A and B
  • machines I and II
  • These factors can be coded as -1, 1
  • e.g., -1 for catalyst type A, 1 for catalyst
    type B

20
Two-Level Factorial Designs
  • If we have k factors under investigation, a
    two-level factorial design will consist of 2k
    runs
  • the number of combinations of high and low values
    (two levels) for k factors
  • These designs are also known as 2k designs, which
    identifies the number of levels (2), and the
    number of factors (k).

21
General Factorial Designs
  • If we have k factors, each considered at mi
    levels, a general factorial design consists of
    experimental runs at all possible combinations of
    the levels for each factor, yielding
  • experimental runs.
  • Examples
  • 2k - three-level factorial design
  • 3k - three-level factorial design

22
Two-Level Factorial Designs
  • Why place the runs at the limits of the region of
    interest?
  • Think of the variance of the slope parameter
    estimate in a straight line model
  • Placing the xis as far as possible from the
    average minimizes the variance of the parameter
    estimates
  • improved precision of the parameter estimates

23
Two-Level Factorial Designs
  • Parameter estimates contain information about the
    effects of the factors --gt precision in parameter
    estimates translates into precision of the
    knowledge of the effects of the factors
  • Multiple linear regression case
  • placing the points as far from the average point
    as possible maximizes the determinant of XTX
  • covariance matrix is based on inverse of XTX -
    area of joint confidence region is proportional
    to 1/sqrt(det(XTX))
  • maximizing determinant minimizes area of joint
    confidence region
  • yields most precise parameter estimates

24
Randomization
  • When implementing a designed experiment, the runs
    should be conducted in a completely randomized
    manner
  • guard against sytematic trends caused by other
    variables which would lead to misinterpretation
    of the results -- biased results
  • e.g., systematic noise component associated with
    increasingly higher temperatures, slow drift in
    one of the instruments, all low pressure runs
    conducted on a cool day,

25
Information Provided by a Designed Experiment
  • Given M distinct sets of factor levels in the
    experimental design, we can estimate
  • the overall average response
  • M-1 pieces of information about the effects of
    the factors on the response
  • This is viewed as providing M-1 independent
    pieces of information about the process (the
    overall average is not viewed as a piece of
    information about the factor effects).
  • Link to regression - for M distinct sets of
    experimental runs, we can estimate
  • intercept parameter
  • M-1 other parameters

total of M parameters
26
Example - Reactor Yield
  • What is the effect of concentration (C) and
    temperature (T) on chemical reactor yield?
  • prepare a 22 factorial design in T, P -- 4 runs
  • Experimental Design

H
Runs Coded Values L,L -1,-1 L,H -1,1 H,L 1,-1 H
,H 1,1
temperature
L
L
H
concentration
27
Example - Reactor Yield
  • Information -
  • main effects - effect of C, T on yield (2
    pieces of info)
  • interaction effect - effect of CT on yield (1
    piece of info)
  • total of 3 pieces of information from 4 runs
  • remaining run helps provide overall average yield

28
Main Effects
  • The main effect of a factor is the average
    influence of a change in level of the single
    factor on the response.
  • In the 2-level factorial design,
  • Main Effect Average of - Average of
  • of a Factor Responses at Responses
  • High Level of Factor at Low Level
  • of Factor

29
Main Effects - Example
  • For temperature -
  • average yield at high T is 70
  • average yield at low T is 57
  • main effect is 70 - 57 13

average 70
72
68
H
temperature
L
60
54
L
H
concentration
average 57
30
Interaction Effects
  • Interaction - extent to which influence of one
    factor on response depends on level of another
    factor
  • Visually - for reactor example

concentration high
72
Examine influence of Temperature at low conc.,
high conc.
60
68
yield
54
concentration low
L
H
temperature
31
Interaction Effects
  • Reactor Yield example -
  • the influence of temperature at high
    concentration is slightly larger than the
    influence of temperature at low concentration --
    mild interaction effect
  • Interaction effect between temperature and
    concentration -
  • 1/2 influence of T at high conc. - influence
    of T at low conc.
  • 1/2 14 - 12 1

32
Interaction Effects - General Definition
  • For two factors, x1 and x2, the interaction
    effect is
  • 1/2 effect of factor 1 on response at high
    level of factor 2
  • - effect of factor 1 on response at low level
    of factor 2
  • Why divide by 2? - place assessment of
    interaction effect on same basis as that of main
    effects

33
Interaction and Main Effects
  • We can return to the interaction plot, and
    visualize the main effects as well

main effect of X2
response
main effect of X1
L
H
X2
34
Using Regression to Estimate Effects
  • For the 22 case for the reactor example, we can
    estimate the main effects and 2-factor
    interaction by fitting the following model to the
    data
  • Main effect of factor 1 - from defn -
    difference between avg high, avg low

HH
HL
LH
LL
35
Using Regression to Estimate Effects
  • General case - to obtain main and 2-factor
    interaction effects for 2k design, fit a
    first-order plus 2-factor interaction model
  • In general
  • Main effect of factor i
  • Interaction effect between factors i, j

36
Example - Chemical Reactor Yield
  • Form the X matrix The observation matrix
  • The parameter estimate vector

int. x1 x2 (x1 x2 )
37
Effects - for Reactor Example, from Regression
  • Using these coefficients, the effects are
  • Main effect of x1 2(-2.5) -5
  • Main effect of x2 2(6.5) 13
  • Interaction effect x1 x2 2(0.5) 1

38
The Effects Representation - another approach!
  • The Effects Representation is another approach to
    compute effects for 2-level factorial designs
  • Steps
  • 1) Form data table
  • 2) Compute weighted sum of factor column values
    corresponding response column values
  • e.g., for column 1 (the x1 column),
    -160154(-1)72168-10
  • 3) Effect for the column is obtained by dividing
    the weighted sum by 2k-1 where k is the number
    of factors
  • e.g., for column 1, main effect for factor 1 is
    (-5)/2 -5

39
The Effects Representation - Example
  • Summarizing for the reactor example
  • which compares to the results from the other
    approaches.
  • If you use the Effects Representation approach,
    check that the design you are analyzing is a
    proper 2k design.

40
Computing Effects
  • In industry, you will find several approaches
    used for computing effects
  • Effects representation
  • Formal definition
  • Regression
  • The approach used is a reflection of how the
    material was learned (did you learn regression
    first?) and where you learned it (statistics
    dept., engineering dept.)
  • The regression approach is a fail-safe approach
    as long as you remember how the parameters are
    related to the effects.

41
Two-Level Factorial Designs - the 23 Case
  • We have 3 factors, and we conduct a 23 design.
  • What information can we obtain?
  • we have 8-1 7 pieces of independent information
  • main effects for 3 factors 3 pieces of info
  • 2-factor interaction effects x1x2 , x1x3 , x2x3
    3 pieces of info.
  • 3-factor interaction effect x1x2x3 1 piece of
    info
  • total of 7 pieces

42
The 23 Design
  • Picture

H
x3
H
L
L
x2
H
L
x1
43
Example - Reactor Yield
83
80

Information Available gtgt main effects (3) gtgt
two factor interactions (3) gtgt 3-factor
interaction (1) gtgt total of 7 pieces in 8 runs
72
68
180
45
52
Temperature
II
160
I
catalyst type
60
54
20
40
concentration
44
Example - Reactor Yield
83
80
  • Main effect of catalyst type -

72
68
180
45
52
Temperature
II
160
I
catalyst type
60
54
20
40
concentration
45
Example - Reactor Yield
  • Main effect for catalyst type

72
72
68
68
-
avg 65
avg 63.5
1.5
60
60
54
54
Catalyst Type II
Catalyst Type I
46
Example - Reactor Yield
  • 2-factor interaction effect between catalyst type
    and temperature
  • 1/2effect of cat type at high T - effect of
    cat type at low T

81.5
48.5
83
80
52
45
II
72
60
68
54
70
57
I
effect at high T 11.5
effect at low T -8.5
2-factor interaction effect for cat type and T is
1/211.5-(-8.5)10
47
Example - Reactor Yield
  • The 2-factor interaction is essentially the
    difference of the averages on the following two
    planes

83
80
cat type T 1
avg69.25
72
68
180
45
52
II
160
I
catalyst type
60
54
cat type T -1
20
40
avg59.25
concentration
48
Example - Reactor Yield
  • We can also summarize the 2-factor interaction
    information as we did before, but averaging over
    the values for different concentrations

81.5
70
yield
57
48.5
catalyst type
49
Constructing Factorial Designs
  • Standard order -
  • levels for first factor alternate (-1, 1)
  • levels for second factor alternate with every
    pair of runs (-1,-1,1,1)
  • levels for third factor alternate every four runs
    (-1,-1,-1,-1,1,1,1,1)
  • and so forth...

50
Design Decisions for 2-level Factorial Designs
  • 1) High and low levels for each factor
  • from process understanding, objectives,
    preliminary investigations
  • 2) Number of runs at each factor level
  • 3) Inclusion of centre point runs
  • estimate inherent noise variance
  • assess curvature over the experimental region
  • 4) Balanced design
  • preserve cancellation structure of runs

51
Number of Runs
  • Conceptually -
  • Perform enough runs so that the precision of the
    predicted effects is sufficient to allow
    detection of a certain effect size
  • strengthen signal to noise
  • Precision
  • depends on inherent noise variance (noise) and
    runs
  • as of runs increases, precision increases
    (increased information from data)

52
Number of Runs
  • Assessment of Significance of Effects
  • hypothesis test - effect is not significant
  • two types of risk
  • type I error - erroneous conclusion that effect
    IS significant (reject null hypothesis) -
    alpha-risk
  • type II error - failure to detect a significant
    effect (accept null hypothesis) - beta-risk
  • analogy - control charts

53
Number of Runs
  • Given - size of effect to be detected
  • - variance of inherent noise
  • - type I error risk (false detection)
  • - type II error risk (failure to detect)
  • the number of runs n required at each factor
    level in 2-level factorial design in order to
    detect an effect of the stated size is

value of standard normal r.v. with upper tail
probability alpha/2
54
Number of Runs
  • Note- if inherent noise variance is estimated,
    replace Z with Students t-distribution value
  • degrees of freedom from noise variance estimate
  • Example
  • want to detect effect whose magnitude gt 2sigma
  • alpha-risk 0.05, beta-risk0.1
  • n 2 (1.961.28)2 (0.5)2 5.26 --gt 6 runs at
    each factor level
  • Interpretation
  • suppose we have a number of factors
  • if we conduct a 2-level design with 16 runs or
    more, at least 8 runs will be conducted at each
    factor level --gt we need at least a 16-run design

55
Centre-Point Runs
  • The 2-level factorial design is improved by
    adding several runs at the centre point of the
    design ( i.e., set factor levels to 0).
  • Benefits -
  • have replicates to enable estimation of inherent
    noise variation
  • can assess curvature of response surface
  • compare average of corner values to average at
    the center - if there is a significant
    difference, curvature is present

56
Centre-Point Runs
  • Centre-point runs contribute no additional
    information about main and interaction effects
  • think of effects representation - multiplication
    by zeros
  • To assess curvature of the response surface
  • calculate average of 2k runs
  • calculate average of replicate runs at the centre
  • use a t-test for differences in means - assuming
    both have the same variance (that estimated from
    the replicates at the centre)

57
Balanced Designs and Replicate Runs
  • Replicate runs are
  • independent
  • repeat runs to estimate inherent noise variance
  • Balanced design
  • design in which each level of every individual
    factor appears the same number of times in
    combination with each of the levels of every
    other factor
  • e.g., 24 design - low level of X1 appears 8 times
    with low level of X2, and .
  • Main point - changing the balance of the design
    can dramatically alter the properties provided by
    the design

58
Upsetting the Balance
  • Example - 22 design for two factors
  • add an additional run at the LL combination
  • XTX is no longer diagonal
  • parameter estimates are no longer uncorrelated
  • effects calculations are no longer uncorrelated
  • potential for misleading conclusions
  • imbalance because we no longer have an equal
    number of -1, 1 combinations

59
Properties of 2k Designs
  • 1) Parameter estimates are uncorrelated - XTX is
    diagonal
  • 2) Parameter estimates have uniform precision
  • entries in XTX are identical (equal to 2k)
  • addition of centre-points improves precision of
    intercept estimate (overall average), but not the
    single factor and interaction term parameters
  • 3) Optimality - for any 2-level experimental
    design, 2-level factorial designs -
  • provide the most precise parameter estimates
  • provide the most precise predicted responses for
    any prediction at a point in the experimental
    region

60
Properties of 2k Designs
  • 4) Terms - 2-level factorial designs
  • allow estimation of main effects - linear in x
  • allow estimation of interactions - products of
    xs
  • do NOT allow estimation of quadratics - only two
    levels, minimum of three levels required

61
Precision of Predicted Effects
  • Recall the definition of the main effect
  • Each average involves half of the 2k points in
    the factorial design - what is the variance of an
    average?
  • when m points are used in the average, and the
    variance of the measurements is sigma2.

Main Effect Average of - Average of of a
Factor Responses at Responses High Level of
Factor at Low Level of Factor
62
Precision of Predicted Effects
  • Main effect calculation consists of difference
    between two averages, each with 2k/2 points
  • remember that variances are ADDITIVE even if we
    subtract random variables
  • Variance of calculated main effect is
  • Standard devn of calculated main effect is

63
Precision of Predicted Effects
  • Precision of interaction effects for 2k designs
  • - precision can be derived by thinking of -
  • interaction effect as difference between averages
    of 2k-1 points (the diagonal planes
  • the formal defn
  • the effects representation
  • The precision of the interaction effects is the
    same as for the main effects

64
Precision of Predicted Effects
  • For cases with replicate runs, think of the
    underlying principle -
  • effects are differences of averages at different
    levels
  • variances of averages are those of the noise
    divided by the number of points in the average
  • variances of sums (and differences) of random
    variables are ADDITIVE

65
Precision of Predicted Effects
  • Regression Perspective -
  • can examine variance of associated parameter
    estimate
  • To determine variance of the effect (as opposed
    to the parameter), consider

for a 2k design
66
Using Precision of Predicted Responses
  • Is an effect statistically significant?
  • Hypothesis Test
  • null hypothesis H0 effect is zero
  • alternate hypothesis Ha effect is non-zero
  • test statistic

if noise variance is known
if noise variance is estimated
67
Testing Significance of Effects
  • To test for significance, compare against
  • Zalpha/2 if noise variance is known
  • tdf,alpha/2 if noise variance is estimated - df
    degrees of freedom of variance estimate
  • this is a two-tailed test - compare absolute
    value of test ratio against the entry from the
    table with upper tail area of alpha/2
  • if test ratio exceeds the fence - significant
    effect
  • if test ratio is inside the fence - effect is not
    statistically significant
  • Confidence intervals can also be formed in a
    manner similar to those for parameter estimates.

68
Obtaining Estimates of the Noise Variance
  • - three possible methods -
  • replicate runs in the current experimental design
  • replicate runs from a previous design
  • from nonsignificant effects
  • Replicate estimates of variance
  • pool if more than one replicate set (e.g., centre
    points vs. vertices)
  • test for constant variance
  • if replicates from previous design are used,
    confirm that conditions for data collection were
    same as those for current experimental design

best
least
69
Variance Estimate from Nonsignificant Effects
  • If certain effects are not statistically
    significant, we conclude that the values reflect
    the extraneous variation (inherent noise).
  • These nonsignificant effects can be used directly
    to estimate the variance of the significant
    calculated effects -

This is an estimate of the variance of a
(significant) calculated effect - it is NOT an
estimate of the noise variance.
70
Variance Estimate from Nonsignificant Effects
  • The Regression Perspective -
  • identifying effects as nonsignificant is
    equivalent to identifying nonsignificant
    parameters in a model
  • deleting terms from the model provides additional
    degrees of freedom for variance estimate - we
    now estimate model in p parameters (p-1 effects),
    and we have more data points than parameters
  • we can estimate noise variance as MSE
  • use variance estimate to form confidence
    intervals, perform hypothesis tests on parameters

71
Normal Probability Plots and Effects
  • Normal probability plot - plot of cumulative
    probability vs. observation
  • Premise - if values are from a normal
    distribution, normal probability plot will be
    LINEAR, centred at zero
  • Procedure -
  • order calculated effects from smallest to largest
  • assign rank i to each effect, from 1 to n ( of
    calculated effects)
  • calculate cumulative probability Pi for each
    effect
  • plot Pi vs. effects on normal probability paper

72
Normal Probability Plots and Effects
  • Nonsignificant effects -
  • form line centred about zero
  • behave as normal deviates --gt likely associated
    with noise
  • Significant effects -
  • will not lie on straight line
  • e.g., kinks, steeper tail

73
Repeat vs. Replicate Runs
  • Replicates -
  • represent additional, independent trials at the
    same factor levels
  • all sources of extraneous variation must be
    present
  • Repeat measurements -
  • represent repeated measurements for a given run
  • dont have all sources of variation present
  • provide indication of measurement noise, not
    entire extraneous variation
  • Example - two separate experiments conducted at
    high P, high T vs. two measurements from an
    experiment at high P, high T
Write a Comment
User Comments (0)
About PowerShow.com