Title: Multivariate Regression Analysis
1Multivariate Regression Analysis
2Aim
- Establish a predictive model between one or more
response variables and one or more input variables
Measurement
Response
3Areas where Regression Analysis is useful
- Process and Environmental Monitoring
- Process Control
- Product Quality/Product Properties
4Why?
- Reveal correspondences/correlations
- Increased Accuracy/Precision in the Information
Process - Improved (reduced) Response time in the
Information Process (on-line, at-line)
5How?
- 1. Collect Data
- 2. Analyse Data
- 3. Establish a Predictive Model
Y BX, yi f (x1, x2, .., xm) y bx, y f
(x1, x2, .., xm)
6Start
7Multivariate Regression
Model
y Xb e
8The solution of regression problems
y Xb e When e is minimised y Xb Xty
XtXb The Normal equation (XtX)-1 Xty b
Minimise
with respect to b0, b1,,bM
Condition XtX must have full rank
9Problems
- Many x-variables, few objects (measurements)
- Correlation between the x-variables
det XtX ? 0 ?(XtX)-1 does not exist!
10Generalised inverse
Generalised inverse X (XtX)-1 Xt ? Normal
equation b Xy
Biased Regression Methods differ in the way that
the Generalised Inverse is calculated
11Latent Variable Regression
12Problem Specification
Standards with known concentrations are measured
on two highly correlated wavelength. Make a
calibration model between the concentrations and
the measured intensities at the two
wavelengths c f(x1,x2)
13Dimensionality Reduction
t, score vector ? c, concentration
vector Quantitative information about the
concentration in t
14The Regression
15Calculation of the Regression Coefficient
16Regression modelling
17Solution
- 1. Decompose the matrix of spectral data (X) into
(orthogonal) latent variables (LVs)
2. Model the dependent variable in terms of the
latent-variable score vectors
18Scores and Loadings
- Scores
- t f (c1, c2, )
- Contains quantitative info about the
concentrations
- Loadings
- p f (?1, ? 2, )
- Contains qualitative info about the spectra
19Regression Methods
- Partial Least Squares (PLS)
- - best for prediction
- Principal Component Regression (PCR)
- - best for outlier checking
? Combine the methods
20Visualisation of PLS
X Y
21Data described by several Latent Variables
Model
22Calculation of the regression vector
?
23Latent-Variable Regression Modelling
The Modelling process
Validation
Interpretation (Regr. coeff., loadings)
Number oflatent variables (Explained var. in X
and Y, Cross Validation, Regr. Coeff., Loadings
etc.)
OutlierDetection
24Cross Validation (statistical validation)
- i) Divide the samples into a number of groups,
ng. - ii) For each LV dimension, a1,2,.., A1, perform
the following calculations 1. Estimate the LV a
with group k of samples excluded. 2. Predict the
responses for samples in group k. 3. Calculate
the squared prediction error for the left-out
samples, - iii) Repeat step ii)until all samples have been
kept out once, and only once, then calculate - iv) If SEP(a)ltSEP(a-1) go to ii), otherwise stop
and select number of dimensions (LVs) in model as
a-1, A
25Application Example 1
- Process industry, where the principal qualities1
of products are linked to chemical composition of
raw material and the manufacturing process.
1 O. M. Kvalheim, Chemom. Intel. Lab. Syst. 19
(1993) iii-iv.
26Application Example 2
- Environmental sciences, such as the prediction of
the diversity of a biological system from
instrumental fingerprinting of the chemical
environment, principal environmental responses.