Title: CALIBRATION
1CALIBRATION Prof.Dr.Cevdet Demir cevdet_at_uludag.
edu.tr
2- LINKING TWO SETS OF DATA TOGETHER
- Peak height to concentration
- Spectra to concentrations
- Taste to chemical constituents
- Biological activity to structure
- Biological classification to chromatographic
peak areas
3NORMALLY WE ARE INTERESTED IN SOME FUNDAMENTAL
PARAMETER e.g. concentration or biological
classification WE TAKE SOME MEASUREMENTS e.g.
spectra or chromatograms WE WANT TO USE THESE
MEASUREMENTS TO GIVE US A PREDICTION OF THE
FUNDAMENTAL PARAMETER
4UNIVARIATE CALIBRATION One measurement e.g. a
peak height MULTIVARIATE CALIBRATION Several
measurements e.g. spectra
5NOTATION x block is measured data e.g. spectra,
chromatograms, GCMS of biological extract,
structural parameters c block is what we are
trying to predict e.g. concentration, species,
acceptability of a product, taste
6(No Transcript)
7 c x
c X
C X
8- MULTIVARIATE CALIBRATION IN ANALYTICAL CHEMISTRY
- Single component.
- Example, concentration of chlorophyll a by
uv/vis spectra. - Mixture of components, all compounds known.
- Example, mixture of pharmaceuticals, all pure
compounds known.
9- Mixture of components, only some compounds known.
- Example, coal tar pitch volatiles in industrial
waste studied by spectroscopy, only some known. - Statistical parameters.
- Example, protein in wheat by NIR spectroscopy.
10UNIVARIATE CALIBRATION x and c blocks
consist of single measurements. Traditional
analytical chemistry CLASSICAL CALIBRATION x ?
c . s Unknown s s ? c . x where c is the
pseudo-inverse
11(No Transcript)
12TREATMENT OF ERRORS IN CLASSICAL CALIBRATION
13PROBLEMS 1. Modern lab dilution and sample
preparation errors (in c) are probably bigger
than spectroscopic errors (in x). Spectra are
more reproducible. Differs to classical
statistics. 2. Want to predict concentration
from spectra etc. not vice versa. Most classical
textbooks in analytical chemistry and most
spreadsheets incorrectly recommend classical
calibration.
14INVERSE CALIBRATION c ? x . b Unknown b b ?
c . x
c
15x
c
b
16COMPARING FORWARD AND INVERSE CALIBRATION
17INCLUDING THE INTERCEPT first column of x is
1s c ? b0 b1x c ? X . b b ? X . c
c
b
X
18- HOW WELL IS THE MODEL PREDICTED?
- Huge number of approaches
- Root mean square error (divide by degrees of
freedom number of samples 1 or 2 according to
parameters in the model). - Often express as percentage either of the mean
measurement or the standard deviation of the
measurements
19- Correlation coefficient of predicted versus true
has problems if the number of samples is small. - ANOVA and replicates analysis using lack-of-fit
error, as discussed in the experimental design
lectures. - Leaving samples out and predicting them
cross-validation and testing will be discussed
later.
20- PROBLEMS
- Outliers can be a major difficulty. Graphical
ways of looking for outliers big area. - Undue influence on least square models.
21- MULTIWAVELENGTH
-
- Example four compounds, four wavelengths.
- MULTIPLE LINEAR REGRESSION (MLR)
- X C. B
- Know
- X a series of spectra
- C concentrations
22- WAYS OF PERFORMING THE CALIBRATION
- Producing a series of mixture spectra of known
concentrations by weighing different amounts and
adding together - Taking a series of spectra and calibrating
against and independent method e.g. HPLC.
23(No Transcript)
24EXAMPLE UV/VIS OF PAHs AT 4 WAVELENGTHS, NO
WAVELENGTH IS UNIQUE
25B X . C
estimated pyrene -3.870 A330 8.609 A335
5.098 A340 1.848 A345
26Can also use classical methods
This can be done by knowledge of the pure
spectra. Different to calibration where a series
of mixtures recorded
27- MULTIPLE LINEAR REGRESSION
- Why use only 4 wavelengths?
- Why not 10 or 100 wavelengths?
- More information not arbitrary choice of
wavelengths. - Number of wavelengths can be greater than number
of compounds.
28- Example
- 25 spectra
- 10 compounds
- 100 wavelengths
29- B X . C
- In this case
- B is a matrix of coefficients, 100 ? 10
- X is a spectral matrix, 25 ? 100
- C is a concentration matrix, 25 ? 10
- Some technical problems using inverse calibration
in this case, and often it does not work.
30- Better approach
- 1. First predict the spectra S.
- Either they are known from the calibration of the
pure standards - Or they can be predicted from the mixture spectra
- S ? C. X
- 2. Then use these predictions in a model (e.g. of
unknowns) - C ? X. S
31MLR effectively models a spectrum as a sum of
spectra of the components, e.g. for a 3 component
model Observed spectrum conc A ? spectrum A
conc B ? spectrum B conc C ? spectrum C
32- ENHANCEMENTS
- Selecting only certain variables, not all the
wavelengths. - Weighting of variables.
33ERROR ANALYSIS This now becomes more
sophisticated. In addition to errors in the c
block (concentration errors), now also errors in
the x block (reconstruction of
spectra). Discuss later.
34- LIMITATIONS AND PROBLEMS WITH MLR
- Number of experiments and number of wavelengths
must never be less than number of compounds - All significant compounds must be known. If
still unknowns, then these are mixed up with the
knowns. Problems if no pure standards and no
reliable reference method. THIS IS THE BIGGEST
LIMITATION. - Sometimes extra wavelengths can be bad ones e.g.
noise or background. - Assume that concentrations are perfectly known,
errors in only one variable, using classical
approach.
35However if information on all the significant
compounds is known then MLR is a simple an
effective method.
36PRINCIPAL COMPONENTS REGRESSION (PCR)
Do not need to know all components in advance,
simply "how many components", and the compounds
of interest. Overcomes a major limitation of MLR
37c ? T . r
38The first step is to perform PCA. Obtain a
scores matrix, retaining A components The value
of A may be a guess of the number of compounds in
the mixture. Then r T. c
39Can extend to more than one concentration C ?
T . R
T
R
C
?
40Example 25 spectra taken at 100 wavelengths We
know about and want to predict 4 compounds We
think there are around 10 compounds in the
mixture, 6 are unknown. T is a matrix of
dimensions 25 ? 10 C is a matrix of dimensions 25
? 4 R is a matrix of dimensions 10 ? 4
41Example of the calculation of the concentration
of pyrene in a set of 25 uv/vis spectra
containing 10 different PAHS. How many PCA
components to use? The prediction gets better the
more the number of components.
42ERRORS x block Simply as in PCA, look at
eigenvalues as more principal components are
calculated
43ERRORS c block Look at errors in calculation
of concentrations often different behaviour
44Predictions for pyrene concentration using 1, 5
and 10 principal components.
45Why not use a large number of PCA
components? Then one can get perfect
prediction? FALLACY the idea is to predict
unknowns, after the knowns have been modelled.
Later PCs often model noise. Choose no of PCs
equal to number of compounds in the mixture?
Methods for determining number of PCs described
later when this is unknown.
46- Advantage over MLR - only partial knowledge
necessary. -
- Disadvantage assumption that all errors in the
"x" block. - Practical situation.
- Modern instruments very reproducible.
- Volumetrics, measuring cylinders, syringes are
inaccurate. -
47PARTIAL LEAST SQUARES (PLS) This technique
assumes that errors in both x and c block are
equally significant.
48 49What does this mean? X T.P E c T.q f
50THERE IS A COMMON SCORES MATRIX FOR BOTH x AND
c BLOCKS. In PCR we calculate the scores just
for the x block and then use a separate step
for regression. A big difference between PCR and
PLS is that in PCR there is only one scores
matrix whereas for PLS (using 1 column) there are
different scores matrices according for each
compound. The vector q is analogous to loadings.
51- PLS components have some analogies to PC
components. - In PCA, each component consists of a
- scores vector
- loadings vector
- eigenvalue.
52- In PLS, each component consists of a
- scores vector
- x loadings vector (p)
- c loadings vector (q) a single number
- magnitude.
53- FOR THE TECHNICALLY MINDED.
- Unlike eigenvalues, the magnitudes of success PLS
components do not necessarily decrease in size,
although they do model the overall datasets. - Unlike loadings for PCA, loadings in PLS are not
orthogonal. - In most cases PLS loadings are not normal.
- There are many algorithms for PLS and it can be
confusing.
54ERROR ANALYSIS similar principles to PCR but
different curves for different compounds. Sometime
s different number of PLS components are used to
model different compounds in one mixture.
55- For a dataset consisting of 25 spectra observed
at 27 wavelengths, for which 8 PLS components are
calculated, there will be - a T matrix of dimensions 25 ? 8,
- a P matrix of dimensions 8 ? 27,
- an E matrix of dimensions 25 ?27,
- a q vector of dimensions 8 ? 1 and
- an f vector of dimensions 25 ? 1.
56PLS2 when more than one c variable
.
P
E
X
T
.
Q
F
T
C
57- X T.P E
- C T.Q F
- Differences to PLS1
- C is now a matrix
- Q is also a matrix
- F is also a matrix
- Single scores for all compounds in the mixture.
58- Theoretically PLS2 should perform better than
PLS1 but in practice it often performs worse. - Computationally faster, important 10 years ago.
- Useful for non-linear problems such as QSAR where
interactions, but not so useful in analytical
chemistry which is very linear.
59- SUMMARY OF MAIN METHODS
- Univariate calibration
- Classical
- Inverse
- Multiple linear regression
- Principal components regression
- Partial least squares
- PLS1
- PLS2