Title: When Least Squares Estimates Do Not Work
1When Least Squares Estimates Do Not Work
- Ildiko E. Frank
- Minitab, Inc.
2OUTLINE
- PLS
- Mixture Design
- Example 1 Octane Blending
- Example 2 Plastic Formulation
- ILS
- Example 3 Epoxy Adhesive
-
3PLS
- Linear regression model
- Multiple responses treated in true multivariate
way - Non-least squares, biased estimate of
coefficients - Intermediate results (scores, loadings) for
exploration - Coefficients nonlinear in response
cross-validation - Model with orthogonal predictors has a single
component - Includes least squares model
4PLS
- Is a biased estimator - as Ridge, PCR, Stepwise
- Handles collinearity - as Ridge, PCR
- Treats responses in true multivariate way as
none others - Calculates intermediate components - as PCR
- Is nonlinear in response as none others
- Includes least squares solution - as PCR, Ridge,
Stepwise
5PLS
- PCR PLS
- Components linear combination linear combination
- Input X only X and Y
- Criteria max (var) max(var)max(corr)
- Meta parameter cross-validation cross-validation
- Multiple Y no yes
6PLS
X predictors n rows, p columns Y responses
n rows, r columns T X scores linear comb. with
coeff. W U Y scores linear comb. with coeff.
C m number of components
7PLS
1. X weight W U X 2. X score T X
W 3. Y weight C T Y 4. Y score U Y
C 5. X loading L T X 6. X residual
X X T L 7. Y residual Y Y T C
8PLS
- All predictors in the model
- Collinear predictors
- Underdetermined system (less rows than columns)
- Collinear responses
I. E. Frank, J. H. Friedman A Statistical View
of Some Chemometrics Regression Tools
Technometrics (1993) p 109
9Mixture Design
- Components (factors) are proportions of total
amount - Sum of components is constant components
correlated - Special models needed Scheffe and Cox
10Mixture Design
- Scheffe Cox
- linear y S bi xi e y bo S bi xi e
- S bi ri 0
- Scheffe bi Cox (bo bi)
- quad y S bixi S bij xixj e y bo S
bixi SS bij xixj e - Scheffe bi Cox (bo bi bii)
- Scheffe bij Cox (bij bii bjj)
11Mixture Design
- Scheffe effect ei bi S bj / (q-1)
- Scheffe adjusted effect aei ei rangei
- Cox effect ei bi / (1 ri)
- Cox adjusted effect aei ei rangei
- bi coefficient for ith component
- rangei (upper bound lower bound) of ith
component - ri reference point ith component proportion
- q number of components
12Mixture Design
- Component Screening
- Scheffe coefficients do not represent change in
response - Scheffe component effect depends on all
coefficients - Cox coefficients are directly interpretable
- Component effects must be adjusted with range
13Mixture Design
Scheffe Y 1A 2B 3C Cox Y -2
-1A 0B 1C
14Example 1 Octane Blending
John Cornell Experiments with Mixtures p 249
15Example 1 Octane Blending
- Components and response correlation
- A B C D E
F G - B 0.104
- C 1.000 0.101
- D 0.371 -0.537 0.374
- E -0.548 -0.293 -0.548 -0.211
- F -0.805 -0.191 -0.805 -0.646 0.463
- G 0.603 -0.590 0.607 0.916 -0.274 -0.656
- Y -0.838 -0.071 -0.838 -0.707 0.494 0.985
-0.741
16Example 1 Octane Blending
R-sq 0.99 R-sq(pred) 0.95
17Example 1 Octane Blending
18Example 1 Octane Blending
19Example 1 Octane Blending
20Example 1 Octane Blending
21Example 1 Octane Blending
22Example 1 Octane Blending
Conclusions 1. LS coefficients / effects for
highly correlated components cannot be
interpreted, trace plot is misleading 2. PLS
biased away from high variance solutions 3. PLS
score plot helps visualizing the design space 4.
PLS loading and biplots indicate important
components 5. PLS model leads to conclusion
reached in several iterations in literature most
important component is F
23Example 2 Plastic Formulation
John Cornell Experiments with Mixtures p 500
24Example 2 Plastic Formulation
Response correlation
Tensile Tensile Flexural Flexural
Strength Modulus Strength Modulus Tensile
Strength Tensile Modulus 0.855 Flexural
Strength 0.995 0.875 Flexural Modulus
0.866 0.992 0.881 Warp
0.910 0.871 0.900
0.915
25Example 2 Plastic Formulation
- PLS Cox
- Tensile Tensile Flexural
Flexural Warp - Strength Modulus Strength
Modulus - Constant 255.375 1008.63 260.508
743.958 428.183 - Resin -150.000 -930.00 -152.667
-666.667 -348.667 - Glass Fibers 442.000 1204.00 397.333
941.333 895.333 - Microspheres -292.000 -274.00 -244.667
-274.667 -546.667 -
- R-sq 98.2 98.5 98.0 95.9
89.0 - R-sq Pred 93.3 93.4 90.1 83.7
52.4
26Example 2 Plastic Formulation
X
Y
27Example 2 Plastic Formulation
X
Y
Cornell Moduli increase with increasing
microspheres ??
28Example 2 Plastic Formulation
Glassfibers positive effect Resin negative
effect Microspheres questionable effect
29Example 2 Plastic Formulation
TStrength
FStrength
TModulus
FModulus
30Example 2 Plastic Formulation
- Conclusions
- PLS score plot displays multiple response fit
- 2. PLS biplot indicates that increasing
Glassfiber increases all responses and increasing
Resin or Microspheres decreases all responses - 3. PLS plots confirm contour and trace plot
findings contradicting literature
31ILS
- Motivation
- PLS model includes all predictors
- PLS does not result in variable selection
- For better interpretation simpler components
needed - Solution
- Different assumption for biasing schema
- Similar to factor rotation, i. e. emphasize large
weights - Continuum from PLS to Stepwise
32ILS
- PLS ILS
- Predictors all in some in
- Parameters of components of components
- of non-zero W
- Stepwise not included included
33ILS
- Algorithm
- Calculate a component
- Rank X weights according to absolute values
- Set weights to zero for predictors with smallest
ranks - Cross-validate number of components and number of
non-zero weights
I. E. Frank Intermediate Least Squares
Regression Method Chemolab (1987) p 233
34Example 3 Epoxy Adhesive
- Design 24 two-level factors
- (e.g. curing agent, temperature, applied
stretch) - Plackett Burman design 28 runs
- Supersaturated design (half PB) 14 runs
- Response adhesion
- Goal select relevant factors to maximize
adhesion - Williams, K. R (1968) Rubber Age
- Lin, D. K. J. (1993) Technometrics
- Wu, C. F. J. and Hamada M. (2000) Experiments
p 373
35Example 3 Epoxy Adhesive
- Plackett-Burman design
- Factors 13 and 16 are the same
- Orthogonal factors PLS(1) LS
- R-sq 0.94 R-sq(pred) 0.0
- P value lt 0.05 only for factors 15 and 20
- Regression p 0.151
36Example 3 Epoxy Adhesive
- ILS on Plackett-Burman design
- Comp Variables R-sq R-sq(pred)
- 1 non-zero
- 1 15 36 26
- 2 15 20 48 35
- 3 15 17 20 57 42
- 4 4 15 17 20 64 46
- 5 4 15 17 20 22 69 49
- 2 non-zero
- 1 15 20 48 35
- 2 4 15 17 20 64 46
- 3 non-zero
- 1 15 17 20 57 42
37Example 3 Epoxy Adhesive
- Half fraction of PB design, fraction 1
- PLS (4) R-sq 100 R-sq(pred) 7
- Half fraction of PB design, fraction 2
- PLS(4) R-sq 100 R-sq(pred) 29
38Example 3 Epoxy Adhesive
Fractions are different !
39Example 3 Epoxy Adhesive
- ILS on half fraction (1) of PB design
- Comp Variables R-sq R-sq(pred)
- 1 non-zero
- 1 15 63 50
- 2 12 15 74 56
- 3 12 15 20 87 71
- 4 4 12 15 20 95 85
- 5 4 10 12 15 20 97 88
- 2 non-zero
- 1 15 17 69 53
- 2 8 12 15 17 79 49
- 3 non-zero
- 1 2 15 17 74 53
40Example 3 Epoxy Adhesive
- ILS on half fraction (2) of PB design
- Comp Variables R-sq R-sq(pred)
- 1 non-zero
- 1 4 36 13
- 2 4 22 55 27
- 3 4 22 23 74 49
- 4 4 18 22 23 82 55
- 5 4 18 22 23 24 88 60
- 2 non-zero
- 1 2 4 49 20
- 2 2 4 20 23 74 47
- 3 non-zero
- 1 2 4 20 63 27
41Example 3 Epoxy Adhesive
- Conclusions
- 1. For orthogonal factors the only model is
PLS(1) LS - 2. R-sq is not a realistic measure of model
quality -
- 3. Variable selection with ILS includes stepwise
models - 4. Literature is over-optimistic about
supersaturated designs
42Summary
- PLS is a useful alternative to LS in DOE analysis
- PLS outperforms LS in case of high collinearity
- PLS can calculate Cox models for mixture designs
- PLS works with underdetermined (supersaturated) X
- PLS takes advantage of collinear responses
- PLS offers various graphical exploratory tools
- ILS, extension of PLS, results in variable
selection