Regression analysis - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Regression analysis

Description:

Spectroscopy: Predict chemistry from spectral measurements ... If there are many highly collinear variables. unstable regression equations ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 34
Provided by: torm8
Category:

less

Transcript and Presenter's Notes

Title: Regression analysis


1
Regression analysis
Relating two data matrices/tables to each other
Purpose prediction and interpretation
Y-data
X-data
2
Typical examples
  • Spectroscopy Predict chemistry from spectral
    measurements
  • Product development Relating sensory to
    chemistry data
  • Marketing Relating sensory data to consumer
    preferences

3
Topics covered
  • Simple linear regression
  • The selectivity problem a reason why
    multivariate methods are needed
  • The collinearity problem a reason why data
    compression is needed
  • The outlier problem why and how to detect

4
Simple linear regression
  • One y and one x. Use x to predict y.
  • Use a linear model/equation and fit it by least
    squares

5
Data structure
X-variable
Y-variable
2 4 1 . . .
7 6 8 . . .
Objects, same number in x and y-column
6
Least squares (LS) used for estimation of
regression coefficients
y
yb0b1xe
b1
b0
x
Simple linear regression
7
Regression analysis
Interpretation
Outliers?
Pre-processing
8
The selectivity problem A reason why multivariate
methods are needed
9
Can be used for several Ys also
10
Multiple linear regression
  • Provides
  • predicted values
  • regression coefficients
  • diagnostics
  • If there are many highly collinear variables
  • unstable regression equations
  • difficult to interpret coefficients many and
    unstable

11
 
Collinearity, the problem of correlated X-variable
yb0b1x1b2x2e
Regression in this case is fitting a plane to
the data (open circles)
The two xs have high correlation Leads to
unstable equation/plane (in the direction with
little variability)
12
Possible solutions
  • Select the most important wavelengths/variables
    (stepwise methods)
  • Compress the variables to the most dominating
    dimensions (PCR, PLS)
  • We will concentrate on the latter (can be
    combined)

13
Data compression
  • We will first discuss the situation with one
    y-variable
  • Focus on ideas and principles
  • Provides regression equation (as above) and plots
    for interpretation

14
Model for data compression methods
XTPTE
Centred X and y
yTqf
T-scores, carrier of information from X to y P,q
loadings E,f residuals (noise)
15
Regression by data compression
PC1
Regression on scores
16
x1
x2
MLR
y
x3
x4
x1
t1
x2
PCR
y
t2
x3
x4
x1
t1
x2
y
PLS
x3
t2
x4
17
PCR and PLS
  • For each factor/component
  • PCR
  • Maximize variance of linear combinations of X
  • PLS
  • Maximize covariance between linear combinations
    of X and y
  • Each factor is subtracted before the next is
    computed

18
Principal component regression (PCR)
  • Uses principal components
  • Solves the collinearity problem, stable solutions
  • Provides plots for interpretation (scores and
    loadings)
  • Well understood
  • Outlier diagnostics
  • Easy to modify
  • But uses only X to determine components

19
(No Transcript)
20
PLS-regression
  • Easy to compute
  • Stable solutions
  • Provides scores and loadings
  • Often less number of components than PCR
  • Sometimes better predictions

21
PCR and PLS for several Y-variables
  • PCR is computed for each Y. Each Y is regressed
    onto the principal components
  • PLS The algorithm is easily modified. Maximises
    linear combinations of X and Y.
  • For both methods Regression equations and plots

22
Validation is important
  • Measure quality of the predictor
  • Determine A number of components
  • Compare methods

23
Prediction testing
Calibration Estimate coefficients
Testing/validation Predict y, use the coefficients
24
Cross-validation

25
Validation
  • Compute
  • Plot RMSEP versus component
  • Choose the number of components with best RMSEP
    properties
  • Compare for different methods

26
RMSEP
MLR
NIR calibration of protein in wheat. 6 NIR
wavelengths 12 calibration samples, 26 test
samples
27
Estimation error
Model error
Conceptual illustration of important phenomena
28
Prediction vs. cross-validation
  • Prediction testing Prediction ability of the
    predictor at hand. Requires much data.
  • Cross-validation Property of the method. Better
    for smaller data set.

29
Validation
  • One should also plot measured versus predicted
    y-value
  • Correlation can be computed, but can sometimes be
    misleading

30
Example, plot of y versus predicted y
Plot of measured and predicted protein NIR
calibration
31
Outlier detection
  • Instrument error or noise
  • Drift of signal (over time)
  • Misprints
  • Samples outside normal range (different
    population)

32
Outlier detection
  • Outliers can be detected because
  • Model for spectral data (XTPTE)
  • Model for relationship between X and y (yTqf)

33
Outlier detection tools
  • Residuals
  • X and y-residuals
  • X-residuals as before, y-residual is difference
    between measured and predicted y
  • Leverage
  • hi
Write a Comment
User Comments (0)
About PowerShow.com