Title: INTRODUCTION TO CHEMOMETRICS
1INTRODUCTION TO CHEMOMETRICS Richard G
Brereton Centre for Chemometrics University of
Bristol United Kingdom r.g.brereton_at_bris.ac.uk Ph
one 44-117-9287658
2- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
3- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
4- HISTORICAL
- Â
- Name first proposed in early 1970s by Swedish
organic chemist S.Wold. - International Chemometrics Society - 1970s.
- International meeting - Cosenza 1983
- Journals 1986 (Chemometrics and Intelligent
Laboratory Systems) and 1987 (J Chemometrics) - Books mid 1980s
- Courses mainly at continuing professional
development level in late 1980s.
5- DEFINITION
- No universal definition
- Catalysed by different groups of people
- Modern day huge levels of data from instruments
- Data analysis in the laboratory, also chemical
plant
6- Â Many groups and philosophies.
- As an aid to NIR.
- As a very broadly based subject relevant in
biology, chemistry, geology, engineering etc. - As an aid to quantitative analytical chemistry
including chromatography and many forms of
spectroscopy. - As an throughout chemistry e.g. QSAR, molecular
mechanics, spectroscopy. - Â
- Analytical chemists.
- Statisticians.
- Chemical engineers.
7- HISTORICAL CHEMOMETRICS
- 1980s and early 1990s
- NIR calibration
- UV/vis calibration
- Simple PCA e.g. in chromatography
8- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
9- APPLICATIONS
- Biggest growth area in past 10 years
- Classical chemometrics
- A few classical applications, mainly related to
the interests and funding of pioneers - Process control and analysis
- Chromatographic optimisation
- Food analysis
10- Big growth
- Forensic science
- Biology metabolomics etc.
- Clinical science disease diagnosis
- Materials characterisation
- Environmental monitoring
- Fermentation technology
- Reaction monitoring
- Synthesis optimisation
- Analytical chemistry
11- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
12HOW CHEMOMETRICS RELATES TO OTHER
DISCIPLINES The common theme The computerised
instrument Huge quantities of data available,
millions of pieces of information every day. How
to cope? Use statistical and computational
methods.
13Mathematics
Mathematics
Organic
Organic
Chemistry
Chemistry
Statistics
Statistics
Biology
Biology
Analytical
Analytical
Computing
Industrial
Applications
Computing
Industrial
Applications
CHEMOMETRICS
CHEMOMETRICS
Chemistry
Chemistry
among others
among others
p
Theoretical
Theoretical
Pharmaceuticals
Engineering
Engineering
and Physical
and Physical
Chemistry
Chemistry
14CHEMOMETRICS IS NOT A UNITARY SUBJECT LIKE
ORGANIC CHEMISTRY In organic chemistry, a solid
skill base that all organic chemists have is
built upon over the years. All organic chemists
have roughly the same skill base. More
experience ones have a bigger knowledge
base. Good organic chemists read the literature
a lot and know many reactions well.
15ORGANIC CHEMISTRY IS BASICALLY A KNOWLEDGE BASED
SUBJECT certain basic skills and then increase
the knowledge. CHEMOMETRICS IS MORE A SKILLED
BASED SUBJECT not necessary to have a huge
knowledge of named methods, a very few basic
principles but one must have hands-on experience
to expand ones problem solving ability.
16DIFFERENT GROUPS HAVE DIFFERENT BACKGROUNDS AND
EXPECTATIONS AS TO HOW CHEMOMETRICS SHOULD BE
INTRODUCED Statisticians want to start with
distributions, hypothesis tests etc. and build up
from there. They are dissatisfied if the maths is
not explained. Chemical engineers like to start
with linear algebra such as matrices, and expect
a mathematical approach but are not always so
interested in distributions etc.
17Computer scientists are often most interested in
algorithms. Analytical chemists often know a
little statistics but are not necessarily very
confident in maths and algorithms so like to
approach this via statistical analytical
chemistry. Difficult group because the ability to
run instruments is not necessarily an ability in
maths and computing. Organic chemists do not
like maths and want automated packages they can
use. They often require elaborate courses that
avoid matrices. The course an organic chemist
would regard is good is one a statistician would
regard as bad.
18- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
19HIERARCHY OF USERS
20 Statisticians and
Chemical and
Computer scientists
Process engineers
Analytical, organic, environmental, biological,
physical, industrial, archaeological, process
etc.chemists
21- THE ULTIMATE DREAM
- A lower level where no knowledge of chemometrics
is required good software. - E.g. technician in warehouse looking at quality
of drug - Nurse in hospital looking at diagnosis
- Operator in manufacturing plant looking at
whether product is OK.
22- How you introduce chemometrics methods to your
organisation depends on your size and needs. - A large organisation.
- Employ one or more chemometrics / data analysis
experts, probably Matlab proficient, probably
with Ph.D. from one of a small number of groups
that graudate specialists. - Train a wider range of people in chemometrics but
not to expert level, able to use commercial
packages and to consult the expert when needed. - Contract out method development, research and any
customised software.
23- Smaller organisations have different choices.
- Beware a little bit of knowledge can be
dangerous. - Can you afford a full time employee? Is there
enough work for a full time employee? - What level of expertise is required? Often it
cannot be solved by someone doing an afternoon a
week, dont get obsessed by software. - Training courses and software, these will solve
the problem of some users and are important to
get people to recognise where chemometrics can be
applied but are not in themselves sufficient. - Need higher level of expertise occasionally.
- Consultancy.
- Externally funded projects, with experts.
- Part-time expert, who has other jobs.
24Getting started To get started you do need to tap
into the middle of the triangle.
SOME EXPERTISE NEEDED AT THIS LEVEL
25- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
26- Â
- FOUR BUILDING BLOCKS
- Methods.
- Software.
- Instrumental techniques.
- Applications.
- Â
27- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
28- METHODS
- Experimental design
- Pattern recognition
- Calibration
29- MOTIVATIONS FOR DESIGN
- Screening
- Saving time
- Quantitative modelling
- Optimisation
30WHY DESIGN EXPERIMENTS? Example Optimisation
of a reaction with pH and temperature. Can we
find the combination of pH and temperature that
produces the best yield?
31- IMAGINARY RESPONSE SURFACE
- We want to find optimum
- Response surface unknown
- Mathematical model may not be of interest in its
own right - Not necessarily interested in underlying
molecular mechanism - Reproducibility and flat optimum
32(No Transcript)
33One factor at a time strategy may miss optima
34DIFFICULTY Interactions the response for each
factor is not independent. The optimum
temperature at pH 5 differs from that at pH 6.
35- How to be on the safe side?
- Grid search. 10 pHs, 10 temperatures, 100
experiments. - Big grid. Then smaller grid.
36- PROBLEMS
- Time consuming and expensive.
- Many experiments we are almost certain are not
near at optimum so are obviously a waste of time - Reproducibility and experimental error
37WHAT DO WE DO? We need rules! Formal
experimental design
38- Screening
- Factorial designs
- Partial factorials and Plackett-Burman designs
- Modelling and optimisation
- Response surface designs
- Mixture designs
39- PATTERN RECOGNITION
- Grouping of objects e.g. how similar is the
behaviour of compounds, how similar are products.
Also PCA used a great deal to visualise changes
e.g. in a reaction or product. - PCA
- Can we classify products into acceptable or
unacceptable? - Discriminant analysis and Cluster analysis
- Exploratory Data Analysis
- Unsupervised Pattern Recognition
- Supervised Pattern Recognition
40- Exploratory data analysis
- e.g. PCA Principal Components and Factor
Analysis - Looking at relationships
- between samples
- patients, food samples, organisms,
chromatographic columns, wood, spectra, people - between variables
- compound concentrations, spectral peaks,
expenditure, chromatographic tests, elemental
compositions
41Families MA (manual), EM (Employee), CA
(Manager) together with number of children and
monthly expenditure
42PCA SCORES PLOT
43PCA LOADINGS PLOT
44Unsupervised pattern recognition Dendrograms Exam
ple Toxicology Urine samples Discrimination
between acute or chronic intoxication helps
elucidating the source of contamination and may
suggest the best cure for an effective
remediation Use capillary electrophoresis and
then compare chromatograms.
45(No Transcript)
46Supervised pattern recognition
classification Examples Tablets, can we class
into origins and can we detect adulteration from
NIR spectra? Class modelling of mussels, can we
find which come from polluted site from
GC? Â Â Detailed mathematical model
47Multivariate data several measurements per
class Example Fisher Iris data four
measurements per iris Petal width, petal length,
sepal width, sepal length 150 Irises, divided
into 50 of each species
I. Setosa
I. Versicolor
I. Verginica
48(No Transcript)
49(No Transcript)
50Calibration Quantitative estimation. Especially
mixtures. Estimation of bulk parameters.
Multivariate process control. PLS
51- LINKING TWO SETS OF DATA TOGETHER
- Peak height to concentration
- Spectra to concentrations
- Taste to chemical constituents
- Biological activity to structure
- Biological classification to chromatographic
peak areas
52NORMALLY WE ARE INTERESTED IN SOME FUNDAMENTAL
PARAMETER e.g. concentration or biological
classification WE TAKE SOME MEASUREMENTS e.g.
spectra or chromatograms WE WANT TO USE THESE
MEASUREMENTS TO GIVE US A PREDICTION OF THE
FUNDAMENTAL PARAMETER
53UNIVARIATE CALIBRATION One measurement e.g. a
peak height MULTIVARIATE CALIBRATION Several
measurements e.g. spectra
54- MULTIVARIATE CALIBRATION IN ANALYTICAL CHEMISTRY
- Single component.
- Example, concentration of pharmaceutical by
uv/vis spectra. - Mixture of components, all compounds known.
- Example, mixture of pharmaceuticals, all pure
compounds known.
55- Mixture of components, only some compounds known.
- Example, coal tar pitch volatiles in industrial
waste studied by spectroscopy, only some known. - Statistical parameters.
- Example, protein in wheat by NIR spectroscopy.
56- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
57- SOFTWARE
- Â
- Â
- Many approaches according background of user.
- Â
- Â
- Programmers, unlikely in industrial environment
- C / C
- VB and VBA
- Matlab
- Â
- Users, specialist industrial chemometricians
- Matlab very common
- Python
- Excel
- VBA using Excel
58- Packages customised
- e.g. in Bristol often specially commissioned for
contracts - Packages
- Distance learning package for European teaching
(R.Brereton 1994) - MIR deconvolution for HSE (S.Gurden 1996)
- Madcap deconvolution for SB (C.Bessant 2000)
- Excel add-in for calibration and pattern
recognition relate to Wiley textbook (L.Erskine
and T.Thurston 2002) - Boris software package commissioned for GSK
(T.Thurston et al 2004). - Â
- Packages commercial, most people in industry
- Many limited to philosophy of users but quite
user friendly and well supported. - SIMCA, Pirouette, Unscrambler
- SAS, SPSS, Splus
- Both statistical and chemometric packages.
59- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
60- INSTRUMENTAL TECHNIQUES
- Classical chemometrics
- 1980s, early 1990s
- Dominated by a few techniques e.g.
- Near Infrared Spectroscopy (NIR)
- High Performance Liquid Chromatography (HPLC)
- Much early work using these methods.
61MODERN DAY Sophistication e.g. LC-MS, LC-NMR,
MS/MS etc. Rapid and convenient e.g. Raman,
uv/vis, MIR Huge range of techniques
62- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
63- APPLICATIONS
- Biggest growth area in past 10 years
- Classical chemometrics
- A few classical applications, mainly related to
the interests and funding of pioneers - Process control and analysis
- Chromatographic optimisation
- Food analysis
64- History
- Applications
- How chemometrics relates to other disciplines
- Hierarchy of users
- Four building blocks
- Methods
- Experimental design
- Pattern Recognition
- Calibration
- Software
- Instrumentation
- Applications
- Finding out about Chemometrics
65FINDING OUT ABOUT CHEMOMETRICS Book and
Website R.G.Brereton, Chemometrics Data
Analysis for the Laboratory and Chemical Plant,
Wiley, Chichester, 2003 and 2004 www.spectroscopy
now.com Website Chemometrics Channel
66(No Transcript)
67(No Transcript)
68- INTRODUCTION
- EXPERIMENTAL DESIGN
- SIGNAL PROCESSING
- PATTERN RECOGNITION
- CALIBRATION
- EVOLUTIONARY SIGNALS
- APPENDICES
- Vectors and Matrices, Algorithms, Basic
statistics, Excel, Matlab
6954 Problems, some quite lengthy e.g. Chapter 4.
Many case studies. Problem 4.1 Grouping of
elements from fundamental properties using
PCA. Problem 4.2 Introductory PCA Problem 4.3
Introduction to Cluster Analysis Problem 4.4
Classification using Euclidean distance and
KNN Problem 4.5 Certification of NIR filters
using PC scores plots. Problem 4.6 Simple KNN
classification Problem 4.7 Classification of
swedes into fresh and stored using SIMCA. Problem
4.8 Classification of Pottery from pre-classical
sites in Italy, using Euclidean and Mahalanobis
distance measures. Problem 4.9 Effect of Centring
on PCA Problem 4.10 Linear discriminant analysis
in QSAR to study the toxicity of Polyaromatic
Hydrocarbons. Problem 4.11 Class Modelling Using
PCA Problem 4.12 Effect of Preprocessing on PCA
in LCMS. Problem 4.13 Determining the number of
significant components in a dataset by
cross-validation.
70Around 70 datasets, available on the
Web. www.spectroscopynow.com Excel Add-in for
simple chemometrics, for learning
71Over 80 Matlab routines corresponding to main
methods in book, cross-referenced.
function T,P,spca(X,maxrank) PERFORMS PCA,
RETAINING maxrank PCs See Sections 4.3.2 and
4.3.3.1 Note that Matlab uses svd rather than
NIPALS as a default X matrix subject to
PCA maxrank (optional) the number of PCs to be
kept T Scores matrix P Loadings matrix s
Eigenvalues, a column vector Hailin Shen
Richard G. Brereton 20/02/02 if narginlt1
help pcademoreturn elseif narginlt2
m,nsize(X) maxrankmin(m
n) end m,nsize(X) u,s,Psvd(X) Tus T
T(,1maxrank) PP(,1maxrank) sdiag(T'T) fo
r i1maxrank make sure the maximum absolute
value of scores vectors are positive. if
max(abs(T(,i)))gtmax(T(,i))
T(,i)-T(,i) P(,i)-P(,i)
end end
72Chemweb www.chemweb.com ? The Alchemist ?
Features ? Chemometrics ? Archive
73(No Transcript)
74(No Transcript)
75(No Transcript)
76- Other sources
- www.spectroscopynow.com
- Chemometrics channel
- Websites run by companies e.g.
- Applied Spectroscopy
- Infometrix
- Umetrics
- Camo
77- Other selected books
- D.L.Massart, B.G.M.Vandeginste, L.M.C.Buydens, S.
De Jong, P.J.Lewi, J.Smeyers-Verbeke, Handbook of
Chemometrics and Qualimetrics Part A, Elsevier,
Amsterdam, 1997 - B. M. G. Vandeginste, D. L. Massart. L. M. C.
Buydens, S. de Jong, P. J. Lewi and J.
Smeyers-Verbeke, Handbook of Chemometrics and
Qualimetrics Part B, Elsevier, Amsterdam, 1998 - M. Otto, Chemometrics Statistics and Computer
Applications in Analytical Chemistry, Wiley-VCH,
Weinheim, 1998 - K.R. Beebe, R.J. Pell and M.B. Seasholtz,
Chemometrics a practical guide, Wiley, New
York, 1998 - R.Kramer, Chemometrics Techniques for
Quantitative Analysis, Marcel Dekker, New York,
1998 -
78Journals Specialist chemometrics journals for
the enthusiast and expert at the cutting
edge Applications journals most chemometrics
papers published in applied journals, very large
variety of journals, probably 50 or more