CALIBRATION - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

CALIBRATION

Description:

CALIBRATION Prof.Dr.Cevdet Demir cevdet_at_uludag.edu.tr LIMITATIONS AND PROBLEMS WITH MLR Number of experiments and number of wavelengths must never be less than number ... – PowerPoint PPT presentation

Number of Views:536

Avg rating:3.0/5.0

Slides: 60

Provided by: RichardB72

Category:

Tags: calibration

more less

Transcript and Presenter's Notes

Title: CALIBRATION

1
CALIBRATION Prof.Dr.Cevdet Demir cevdet_at_uludag.
edu.tr
2

LINKING TWO SETS OF DATA TOGETHER
Peak height to concentration
Spectra to concentrations
Taste to chemical constituents
Biological activity to structure
Biological classification to chromatographic
peak areas

3
NORMALLY WE ARE INTERESTED IN SOME FUNDAMENTAL
PARAMETER e.g. concentration or biological
classification WE TAKE SOME MEASUREMENTS e.g.
spectra or chromatograms WE WANT TO USE THESE
MEASUREMENTS TO GIVE US A PREDICTION OF THE
FUNDAMENTAL PARAMETER
4
UNIVARIATE CALIBRATION One measurement e.g. a
peak height MULTIVARIATE CALIBRATION Several
measurements e.g. spectra
5
NOTATION x block is measured data e.g. spectra,
chromatograms, GCMS of biological extract,
structural parameters c block is what we are
trying to predict e.g. concentration, species,
acceptability of a product, taste
6
(No Transcript)
7
c x
c X
C X
8

MULTIVARIATE CALIBRATION IN ANALYTICAL CHEMISTRY
Single component.
Example, concentration of chlorophyll a by
uv/vis spectra.
Mixture of components, all compounds known.
Example, mixture of pharmaceuticals, all pure
compounds known.

Mixture of components, only some compounds known.
Example, coal tar pitch volatiles in industrial
waste studied by spectroscopy, only some known.
Statistical parameters.
Example, protein in wheat by NIR spectroscopy.

10
UNIVARIATE CALIBRATION x and c blocks
consist of single measurements. Traditional
analytical chemistry CLASSICAL CALIBRATION x ?
c . s Unknown s s ? c . x where c is the
pseudo-inverse
11
(No Transcript)
12
TREATMENT OF ERRORS IN CLASSICAL CALIBRATION
13
PROBLEMS 1. Modern lab dilution and sample
preparation errors (in c) are probably bigger
than spectroscopic errors (in x). Spectra are
more reproducible. Differs to classical
statistics. 2. Want to predict concentration
from spectra etc. not vice versa. Most classical
textbooks in analytical chemistry and most
spreadsheets incorrectly recommend classical
calibration.
14
INVERSE CALIBRATION c ? x . b Unknown b b ?
c . x
c
15
x
c
b

16
COMPARING FORWARD AND INVERSE CALIBRATION
17
INCLUDING THE INTERCEPT first column of x is
1s c ? b0 b1x c ? X . b b ? X . c
c
b
X

18

HOW WELL IS THE MODEL PREDICTED?
Huge number of approaches
Root mean square error (divide by degrees of
freedom number of samples 1 or 2 according to
parameters in the model).
Often express as percentage either of the mean
measurement or the standard deviation of the
measurements

Correlation coefficient of predicted versus true
has problems if the number of samples is small.
ANOVA and replicates analysis using lack-of-fit
error, as discussed in the experimental design
lectures.
Leaving samples out and predicting them
cross-validation and testing will be discussed
later.

PROBLEMS
Outliers can be a major difficulty. Graphical
ways of looking for outliers big area.
Undue influence on least square models.

MULTIWAVELENGTH
Example four compounds, four wavelengths.
MULTIPLE LINEAR REGRESSION (MLR)
X C. B
Know
X a series of spectra
C concentrations

WAYS OF PERFORMING THE CALIBRATION
Producing a series of mixture spectra of known
concentrations by weighing different amounts and
adding together
Taking a series of spectra and calibrating
against and independent method e.g. HPLC.

23
(No Transcript)
24
EXAMPLE UV/VIS OF PAHs AT 4 WAVELENGTHS, NO
WAVELENGTH IS UNIQUE
25
B X . C
estimated pyrene -3.870 A330 8.609 A335
5.098 A340 1.848 A345
26
Can also use classical methods
This can be done by knowledge of the pure
spectra. Different to calibration where a series
of mixtures recorded
27

MULTIPLE LINEAR REGRESSION
Why use only 4 wavelengths?
Why not 10 or 100 wavelengths?
More information not arbitrary choice of
wavelengths.
Number of wavelengths can be greater than number
of compounds.

Example
25 spectra
10 compounds
100 wavelengths

B X . C
In this case
B is a matrix of coefficients, 100 ? 10
X is a spectral matrix, 25 ? 100
C is a concentration matrix, 25 ? 10
Some technical problems using inverse calibration
in this case, and often it does not work.

Better approach
1. First predict the spectra S.
Either they are known from the calibration of the
pure standards
Or they can be predicted from the mixture spectra
S ? C. X
2. Then use these predictions in a model (e.g. of
unknowns)
C ? X. S

31
MLR effectively models a spectrum as a sum of
spectra of the components, e.g. for a 3 component
model Observed spectrum conc A ? spectrum A
conc B ? spectrum B conc C ? spectrum C
32

ENHANCEMENTS
Selecting only certain variables, not all the
wavelengths.
Weighting of variables.

33
ERROR ANALYSIS This now becomes more
sophisticated. In addition to errors in the c
block (concentration errors), now also errors in
the x block (reconstruction of
spectra). Discuss later.
34

LIMITATIONS AND PROBLEMS WITH MLR
Number of experiments and number of wavelengths
must never be less than number of compounds
All significant compounds must be known. If
still unknowns, then these are mixed up with the
knowns. Problems if no pure standards and no
reliable reference method. THIS IS THE BIGGEST
LIMITATION.
Sometimes extra wavelengths can be bad ones e.g.
noise or background.
Assume that concentrations are perfectly known,
errors in only one variable, using classical
approach.

35
However if information on all the significant
compounds is known then MLR is a simple an
effective method.
36
PRINCIPAL COMPONENTS REGRESSION (PCR)
Do not need to know all components in advance,
simply "how many components", and the compounds
of interest. Overcomes a major limitation of MLR
37
c ? T . r
38
The first step is to perform PCA. Obtain a
scores matrix, retaining A components The value
of A may be a guess of the number of compounds in
the mixture. Then r T. c
39
Can extend to more than one concentration C ?
T . R

T
R
C

?

40
Example 25 spectra taken at 100 wavelengths We
know about and want to predict 4 compounds We
think there are around 10 compounds in the
mixture, 6 are unknown. T is a matrix of
dimensions 25 ? 10 C is a matrix of dimensions 25
? 4 R is a matrix of dimensions 10 ? 4
41
Example of the calculation of the concentration
of pyrene in a set of 25 uv/vis spectra
containing 10 different PAHS. How many PCA
components to use? The prediction gets better the
more the number of components.
42
ERRORS x block Simply as in PCA, look at
eigenvalues as more principal components are
calculated
43
ERRORS c block Look at errors in calculation
of concentrations often different behaviour
44
Predictions for pyrene concentration using 1, 5
and 10 principal components.

45
Why not use a large number of PCA
components? Then one can get perfect
prediction? FALLACY the idea is to predict
unknowns, after the knowns have been modelled.
Later PCs often model noise. Choose no of PCs
equal to number of compounds in the mixture?
Methods for determining number of PCs described
later when this is unknown.
46

Advantage over MLR - only partial knowledge
necessary.
Disadvantage assumption that all errors in the
"x" block.
Practical situation.
Modern instruments very reproducible.
Volumetrics, measuring cylinders, syringes are
inaccurate.

47
PARTIAL LEAST SQUARES (PLS) This technique
assumes that errors in both x and c block are
equally significant.
48

49
What does this mean? X T.P E c T.q f
50
THERE IS A COMMON SCORES MATRIX FOR BOTH x AND
c BLOCKS. In PCR we calculate the scores just
for the x block and then use a separate step
for regression. A big difference between PCR and
PLS is that in PCR there is only one scores
matrix whereas for PLS (using 1 column) there are
different scores matrices according for each
compound. The vector q is analogous to loadings.
51

PLS components have some analogies to PC
components.
In PCA, each component consists of a
scores vector
loadings vector
eigenvalue.

In PLS, each component consists of a
scores vector
x loadings vector (p)
c loadings vector (q) a single number
magnitude.

FOR THE TECHNICALLY MINDED.
Unlike eigenvalues, the magnitudes of success PLS
components do not necessarily decrease in size,
although they do model the overall datasets.
Unlike loadings for PCA, loadings in PLS are not
orthogonal.
In most cases PLS loadings are not normal.
There are many algorithms for PLS and it can be
confusing.

54
ERROR ANALYSIS similar principles to PCR but
different curves for different compounds. Sometime
s different number of PLS components are used to
model different compounds in one mixture.
55

For a dataset consisting of 25 spectra observed
at 27 wavelengths, for which 8 PLS components are
calculated, there will be
a T matrix of dimensions 25 ? 8,
a P matrix of dimensions 8 ? 27,
an E matrix of dimensions 25 ?27,
a q vector of dimensions 8 ? 1 and
an f vector of dimensions 25 ? 1.

56
PLS2 when more than one c variable
.

P

E

X

T

.

Q

F

T

C
57

X T.P E
C T.Q F
Differences to PLS1
C is now a matrix
Q is also a matrix
F is also a matrix
Single scores for all compounds in the mixture.

Theoretically PLS2 should perform better than
PLS1 but in practice it often performs worse.
Computationally faster, important 10 years ago.
Useful for non-linear problems such as QSAR where
interactions, but not so useful in analytical
chemistry which is very linear.