Title: Dept seminar
1Chemometric Methods for GC x GC
LCDR Gregory J. Hall Glenn S. Frysinger
Department of Science U.S. Coast Guard
Academy New London, Connecticut gregory.hall_at_uscga
.edu
2LCDR Gregory J. Hall
1995 B.S. Marine Science U.S. Coast Guard
Academy 1995 1997 Operations Officer, USCGC
SPAR 1997-1998 M.S. Chemistry, Tufts
University 1998-2000 Rotating Military Faculty,
USCGA 2000 Appointed to the PCTS 2002 2004
Ph.D. sabbatical, Tufts University 2006 Ph.D.
Chemistry, Tufts University Chemometric
Characterization and Classification of Estuarine
Water through Multidimensional Fluorescence
3Permanent Commissioned Teaching Staff (PCTS)
About 23 officers ranked from LT to CAPT Provide
the interpreters between the military and
civilian faculty and leadership for the
college Teaching, Service, and Scholarship
expected Ph.D. required
4LCDR Gregory J. Hall
5What IS Chemometrics?
Chemometrics is the chemical discipline that uses
mathematical, statistical and other methods
employing formal logic to design or select
optimal measurement procedures and experiments,
and to provide maximum relevant chemical
information by analyzing chemical data. (D.L.
Massart Chemometrics, Elsevier, NY,1988)
6Chemometrics already covered and to come
- Difference Chromatograms
- Property Modeling
- Clustering
- Chromatograph Prediction
- Mass Spec searching
- Template Construction
- XICs
- Retention Indices
You are all already chemometricians!
7Today
- Data Structures How I view GC x GC data
- Variance - PCA
- Classification SIMCA, PCR-DA
- Regression PLS
- Peak Resolution - PARAFAC
- Preprocessing Alignment
- The way forward, humble opinions
8Data GC x GC - FID
Second Dimension
Intensity Values
I
Second Dimension
J
X
sample
K
First Dimension
First Dimension
Dataset Data Object
Chromatogram Stack Three way 4 Dimensions
Chromatogram Two way 3 Dimensions
9Data GC x GC -TOF
Second Dimension
m/z
First Dimension
Sample (Date?)
Dataset Four way 5 Dimensions !
X
10Principal Components Analysis (PCA)
PC 1
i
PC 2
j
variable 3
Q
T2
variable 1
variable 2
11Principal Components Analysis (PCA)
P
components
E
T
Samples
data
model
residuals
Goal - Variance capture
12Multi-way Principal Components Analysis (MPCA)
Our data 15 x 410,000
Wise, B. M. Gallagher, N. B. Bro, R. Shaver,
J. M. Windig, W. Koch, R. S. PLS Toolbox 4.0
Eigenvector Research, Inc. Wenatchee, WA, 2006.
13GC GC/MS TIC of Fire Debris
4.0
3.0
Time (s)
2.0
1.0
0.0
Time (min)
6 clean carpet samples 5 gasoline
samples 6 doped carpet samples
14PCA Model Specifics
- Only two carpet classes included
- 4 PCs 98 variance
- Two random samples per class left out, all
gasoline samples left out of training set - Left out samples projected onto the model later.
15PC 1 - Loadings
Red positive loadings, correlated Blue
negative loadings, anti-correlated
16PC 2 - Loading
Chemically interpretable results! Next step -
classification
17Principal Components RegressionDiscriminant
Analysis (PCR-DA)
Y
w/ accelerant wo/ accelerant
0 1
0 1
0 1
0 1
1 0
1 0
1 0
1 0
1 0
PC 1
i
PC 2
j
Q
variable 3
T2
variable 1
variable 2
18Regression Vector
Red positive loadings Blue negative loadings
19Regression Vector Zoom
150
100
30
20
25
20Principal Components Regression Predictions
Sample Scores on the Regression Vector
1.8
1.6
Gasoline
1.4
1.2
Arson Debris
1
0.8
0.6
0.4
Unaltered Carpet
0.2
0
-0.2
1
6
7
12
17
Discriminant Analysis 1 Member of Arson Class
21Classification Soft Independent Model of Class
Analogy (SIMCA)
PC 1
i
PC 2
j
variable 3
Q
T2
variable 1
variable 2
22SIMCA Model Specifics
- PCA modeled for 2 classes Arson , not Arson
- Each model had 2 PCs with 99 variance captured
- One random samples per class left out, all
gasoline samples left out of training set - Left out samples projected onto each model
later.
23Arson Case SIMCA Results
1
1
In Doped Class
In Carpet Class
0
0
Carpet Samples
Carpet
Doped
Gasoline
Carpet
Doped
Gasoline
Carpet Test
Doped Samples
Doped Test
Gasoline Test
1
2
Not in any Class
Nearest Class
0
1
Carpet
Doped
Gasoline
Carpet
Doped
Gasoline
24Arson Case SIMCA Fit Statistics
Fit Statistics for Doped Carpet Class
Fit Statistics for Carpet Class
0.25
1
0.2
0.8
0.15
0.6
T2 Residuals
T2 Residuals
0.1
0.4
0.05
0.2
Carpet Samples
0
0
Carpet Test
0
50
100
150
200
250
0
500
1000
Doped Samples
Q Residuals
Q Residuals
Doped Test
Gasoline Test
0.03
0.04
0.025
0.03
0.02
T2 Residuals
T2 Residuals
0.02
0.01
0.015
0
0.01
-0.01
-4
-2
0
2
4
6
8
-10
0
10
20
30
Q Residuals
Q Residuals
25Parallel Factor Analysis (PARAFAC)
I
I
B
G
R
C
I
K
R
X
J
E
J
A
J
K
K
R
c1
c2
c3
I
I
b1
b2
b3
E
J
X
J
a1
a2
a3
K
K
26Parallel Factor Analysis (PARAFAC)
Score
Sample
Factor 1
Loading
Loading
I
PARAFAC
First Dimension
Second Dimension
J
X
K
Score
c2
GC x GC - FID Chromatogram Stack
b2
Sample
Factor 2
Loading
Loading
a2
First Dimension
Second Dimension
27Parallel Factor Analysis (PARAFAC) GC x GC - TOF
Sinha, A. E. Fraga, C. G. Prazen, B. J.
Synovec, R. E. Journal of Chromatography A 2004,
1027, 269-277.
28Parallel Factor Analysis (PARAFAC)
Score
m/z
Factor 1
Loading
Loading
I
PARAFAC
First Dimension
Second Dimension
J
X
K
Score
c2
GC x GC - TOF Sample
b2
m/z
Factor 2
Loading
Loading
a2
First Dimension
Second Dimension
29Parallel Factor Analysis (PARAFAC) GC x GC - TOF
Complex Environmental Sample
Sinha, A. E. Fraga, C. G. Prazen, B. J.
Synovec, R. E. Journal of Chromatography A 2004,
1027, 269-277.
30PARAFAC Results
Sinha, A. E. Fraga, C. G. Prazen, B. J.
Synovec, R. E. Journal of Chromatography A 2004,
1027, 269-277.
31PARAFAC Results
Sinha, A. E. Fraga, C. G. Prazen, B. J.
Synovec, R. E. Journal of Chromatography A 2004,
1027, 269-277.
32GCImage screen capture
GC GC/MS Peak Deconvolution PARAFAC?
NIJ0221 100 µg 75 Wx gasoline / nylon carpet
matrix
33Partial Least Squares (PLS)
P
variables
latent variables
T
samples
E
properties
F
Q
Y
samples
T
data
model
residuals
34PLS Results Naphthalenes in Jet Fuel
Johnson, K. J. Prazen, B. J. Young, D. C.
Synovec, R. E. Journal of Separation Science
2004, 27, 410-416.
35Alignment Strategy 1
Alignment Strategy 2
Alignment Strategy 3
36Alignment Strategy 4
Piecewise Correlation Maximization
Pierce, K. M. Wood, L. F. Wright, B. W.
Synovec, R. E. Analytical Chemistry 2005, 77,
7735-7743.
37Alignment Strategy 5
Kaczmarek, K. Walczak, B. de Jong, S.
Vandeginste, B. G. M. Journal of Chemical
Information and Computer Sciences 2003, 43,
978-986.
38Alignment Strategy Proposal 1
39Alignment Strategy Proposal 1
40Alignment Strategy Proposal 2
2nd Dimension Piecewise
1st Dimension DTW Alkanes?
41Humble Opinions
- GC x GC is tremendously interesting data
- Tremendous amounts of work possible, even with
data that presently exists. Good alignment will
open up even more possibilities - Include the Chemist in the analysis
- Include the Chemometrician in the experimental
design
42Future?
- More PCA, PCR, PLS, PARAFAC
- Regression certainty calculations
- NPLS, NPLS-DA
- 4. Holistic, automatic alignment strategies
- 2D COW or DTW ?
- PARAFAC 2 ?
- 5. User driven alignment strategies
- Anchor warping
- 6. Inclusion on m/z axis
- Purity, CODA?
43Acknowledgements
U.S. Coast Guard Academy Alexander Trust You all!