Title: Interactive Series Baseline Correction Algorithm
1Interactive Series Baseline Correction Algorithm
- Andrey Bogomolova, Willem Windigb,
- Susan M. Geerc, Debra B. Blondellc,
- and Mark J. Robbinsc
- a ACD/Labs, Russian Chemometrics Society, Moscow,
Russia - b Eigenvector Research Inc., Rochester, NY, USA
- c Eastman Kodak Company, Rochester, NY, USA
2Baseline (Background) Problem
- Baseline is an eternal issue in analytical data
processing - Baseline or background?
- no clear distinction
- baseline is associated with a smooth line
reflecting a physical interference - background tends to be used in a more general
sense to designate ANY unwanted signal including
noise and chemical components - Our preference is given to the term baseline
because smoothness of the background signal is
the main assumption of the proposed correction
algorithm
3Classical Approach to the Baseline Correction
Problem
- Classical baseline correction algorithms with
respect to single curve are almost exhaustively
elaborated in the literature - A baseline to be subtracted is fitted by a linear
(polynomial) function to the nodes that belong to
signal-free regions - The nodes can be automatically detected by the
software or manually placed by the user - These methods are advantageous for half-automatic
processing where software-generated results need
to be revised by a human expert
4Serial (Batch) Methods
- Development of two-dimensional spectroscopy and
hyphenated techniques demanded new methods
applicable to data matrices - Early works in this direction applied automated
baseline correction algorithms to every
individual curve in a matrix dataset - The main problem with this approach is that it
neglects internal (inter-spectral) correlations - Instead of the expected rank reduction it may
introduce additional variance into the dataset - It is a black-box routine that is difficult to
control
5Multivariate Background Correction
- Multivariate data analysis produced a
revolutionary impact onto the baseline problem in
general - The paradigmatic shift from hard-
(knowledge-driven) to soft- or self-
(data-driven) modeling has opened new horizons
and introduced new concepts - PLS introduces the means to address the
background without its subtraction in the
calibration context - OSC by S. Wold turns the problem inside out
eliminating the variance that is irrelevant for
calibration (orthogonal to Y) from the data (X) - A number of other excellent algorithms
6Our Objectives
- The researchers are typically concentrated at the
development of fully automated background
correction methods - Statement fuzzy character of the baseline
problem in general puts in doubt the feasibility
of automated (expert-free) baseline correction
routines - In contrast, we present an alternative approach
that tends to maximize the means of control for a
human operator - simplicity
- visualization
- interactive stepwise algorithm
7The Method
- The method is applied to a series of curves
(e.g., spectra or chromatograms) - The method consists of two distinct steps
- First, a prototype baseline is constructed from
linear segments by selecting a set of nodes - To aid in the node selection the mean values are
calculated to represent the entire series - Second, the prototype baseline is used to
construct individual baselines to be subtracted
from the series curves by adjusting the nodes
vertically to the corrected curve
8HPLC/DAD Sample Data
92nd Derivative for Node Selection
10Baseline Correction for Curve Resolution
- Baseline correction is an application-specific
preprocessing technique - The present baseline correction algorithm has
been developed to improve the performance of
SIMPLISMA (SIMPLe-to-use Interactive
Self-modeling Mixture Analysis) curve resolution
technique - The algorithm has been used at Eastman Kodak
Company over 10 years for routine analysis of
TGA/IR data that represent a challenging case for
curve resolution - a lot of components
- high degree of overlap
- intensive background signal
11TGA/IR Sample Data
Reprinted with permission from Eastman Kodak
Company, 2005
12Baseline Nature in TGA/IR
- The most common reasons for TGA/IR baseline
drift - Temperature fluctuations over time
- Instrument drift
- Material scattering
- Impurities
- Inappropriate background, etc.
- In the present dataset - miscellaneous reasons
- Spectral domain is more suitable for series
baseline correction because of narrow peaks and
explicit baseline areas
13TGA/IR Baseline Correction
Reprinted with permission from Eastman Kodak
Company, 2005
14TGA/IR Corrected Data Map
Reprinted with permission from Eastman Kodak
Company, 2005
15TGA/IR SIMPLISMA Curve Resolution
Reprinted with permission from Eastman Kodak
Company, 2005
16IR Library Identification
Reprinted with permission from Eastman Kodak
Company, 2005
17Conclusions
- A new interactive approach to the baseline
correction problem has been suggested - It allows for adapting traditional automated
single-scan baseline correction routines or for
performing manual correction on matrix data as if
they were a single curve - Advantages of the method include transparency
of the process and the means for extensive
operator interaction - The method has passed long-term testing in an
industrial laboratory and was integrated into a
professional software package - In spite of the simplicity of the algorithm, it
allows for successful elimination of baselines
even in complex cases such as TGA/IR data
18Acknowledgements
- Antony Williams for his friendly support, and
- Michel Hachey for his help and valuable ideas