INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS

Description:

INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, ... 3 environmental conpartments: Fish (barb', bagra comuna', bleak, carp and ... – PowerPoint PPT presentation

Number of Views:498
Avg rating:3.0/5.0
Slides: 62
Provided by: RTau
Category:

less

Transcript and Presenter's Notes

Title: INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS


1
INVESTIGATION OF MAIN CONTAMINATION SOURCES OF
HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS
FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY
DATA ANALYSIS METHODS
  • Emma Peré-Trepat1 and Romà Tauler 2
  • 1 Dept. of Analytical Chemistry, Universitat de
    Barcelona, Diagonal 647, 08028 Barcelona, Spain
  • 2 IIQAB-CSIC, Jordi Girona 18-26, 08034
    Barcelona, Spain
  • e-mail rtaqam_at_iiqab.csic.es

2
  • Outline
  • Introduction and motivations of this work
  • Environmental data tables and chemometrics models
    and methods
  • Example of application metal contamination
    sources in fish, sediment and surface water river
    samples.
  • Conclusions

3
  • Introduction and motivations of this work
  • Pollution and toxicological chemical compounds
    are a threat for the environment and the health
    which need urgent measures and actions
  • Environmental monitoring studies produce huge
    amounts of multivariate data ordered in large
    data tables (data matrices)
  • The bottle neck in the study of these
    environmental data tables is their analysis and
    interpretation
  • There is a need for chemometrics (statistical
    and numerical analysis of multivariate chemical
    data) analysis of these data tables!

4
  • What kind of information can be obtained from
    chemometric analysis of environmental
    multivariate data tables?
  • Detection, identification, interpretation and
    resolution of the main sources of contamination
  • Distribution of these contamination sources in
    the environment geographically, temporally, by
    environmental compartment (air, water, sediments,
    biota,...),
  • Distinction between point and diffuse
    contamination sources sources
  • Quantitative apportionment of these sources .....

5
  • Introduction and motivations of this work
  • In this work different chemometric multiway data
    analysis
  • methods are compared for the resolution of the
  • environmental sources of 11 metal ions in 17
    river
  • samples of fish, sediment and water at the same
    site
  • locations of Catalonia (NE, Spain).
  • Two-way bilinear model based methods
  • MA-PCA Matrix Augmentation Principal Component
    Analysis
  • MA-MCR-ALS Matrix Augmentation Multivariate Curve
    Resolution Alternating Least Squares
  • Three-way trilinear models based methods
  • PARAFAC
  • TUCKER3
  • MCR-ALS trilinear
  • MCR-ALS TUCKER3

6
  • Introduction and motivations of this work
  • Special attention will be paid to
  • Finding ways to compare results obtained using
    bilinear and trilinear models for three-way data
    getting profiles in three modes from bilinear
    models of three-way data
  • Adaptation of MCR-ALS to the fulfillment of
    PARAFAC and TUCKER3 trilinear models
  • Reliability of solutions calculation of
    boundaries of bands of feasible solutions
  • Integration of Geostatistics and Chemometrics in
    the investigation of environmental data

7
  • Outline
  • Introduction and motivations of this work
  • Environmental data tables and chemometrics models
    and methods
  • Example of application metal contamination
    sources in fish, sediment and river surface water
    samples.
  • Conclusions

8
Environmental data tables (two-way data)
J variables
Conc. of chemicals Physical Properties Biological
properties Other .....
ltLOD
Data table or matrix
I samples
12 13 45 67 89 42 35 0 0.3 0.005 111 33 5 67 90
0.06 44 33 1 2
X
m
Plot of variables (columns)
Plot of samples (rows)
9
Environmental three-way data sets Measured data
usually consisted on concentrations of
different chemical compounds (variables) measured
in different samples at different
times/situations/conditions/compartments. Data
are ordered in a two-way or in a three-way data
table according to their structure
3-way data sets
time/ compartment
  • Three measurement modes
  • - variables mode
  • sample mode
  • times/situations/conditions/ compartments mode

samples


variables (conc. Chemical ompounds)
10
Chemometric models to describe environmental
measurements
  • Models for what?
  • Models for
  • identification of contamination sources?
  • exploration of contamination sources?
  • interpretation of contamination sources?
  • resolution of environmental source?
  • apportionment/quantitation of environmental
    source?
  • ??????..............................

11
Chemometric models to describe environmental
measurements
Bilinear models for two way data
J
dij
I
D
dij is the concentration of chemical contaminant
j in sample i n1,...,N are a reduced number of
independent environmental sources xin is the
amount of source n in sample i ynj is the amount
of contaminant j in source n
12
Chemometric models to describe environmental
measurements
Bilinear models for two way data
J
J
J
YT
N
D
E
X
?
I

I
I
N ltlt I or J
N
PCA X orthogonal, YT orthonormal YT in the
direction of maximum variance Unique solutions
but without physical meaning Identification and
Intereprtation!
MCR-ALS X and YT non-negative X or YT
normalization other constraints (unimodality,
local rank, ) Non-unique solutions but with
physical meaning Resolution and apportionment!
13
Chemometric models to describe environmental
measurements
Extension of Bilinear models for simultaneous
analysis of multiple two way data sets
YT
Xaug
Dk
Xk
?
(n,J)
YT
(I x J)
(I,n)
Xk
?
Dk
PCA orthogonality max. variance
MCR non-negativity, nat. constraints
Matrix augmentation strategy
Daug
YT
Dk
Xk
?
(n,J)
(I x J)
(I,n)
14
Environmental data sets
15
Chemometric models to describe environmental
measurements
Trilinear models for three-way data
Dk
dijk is the concentration of chemical contaminant
j in sample I at time (condition) k n1,...,N are
a reduced number of independent environmental
sources xin is the amount of source n in sample
i ynj is the amount of contaminant j in source
n znk is the contribution of source n to
compartment k
16
Three Way data models
17
PARAFAC (trilinear model)
The same number of components In the three modes
Ni Nj Nk N No interactions between
components Different slices Xk are decomposed In
bilinear profiles having the same shape!
18
Tucker3 models
In PARAFAC Ni Nj Nk N and core array G is a
superdiagonal identity cube
19
Guidelines for method selection (resolution
purposes)
Deviations from trilinearity Mild Medium
Strong Array size PARAFAC
Small PARAFAC2 Medium TUCKER
Large MCR, PCA, SVD,..
Journal of Chemometrics, 2001, 15, 749-771
20
INTEGRATION OF CHEMOMETRICSGEOSTATISTICS (Geograp
hical Information Systems, GIS)
21
  • Outline
  • Introduction and motivations of this work
  • Environmental data tables
  • Chemometrics bilinear and trilinear models and
    methods
  • Example of application metal contamination
    sources in fish, sediment and river surface water
    samples.
  • Conclusions

22
METAL CONTAMINATION SOURCES IN SEDIMENTS, FISH
AND WATERS FROM CATALONIA RIVERS USING MULTIWAY
DATA ANALYSIS METHODS Emma Peré-Trepat (UB),
Mónica Flo, Montserrat Muñoz, Antoni Ginebreda
(ACA), Marta Terrado, Romà Tauler (CSIC)
France
Pyrinees
1. RIU MUGA Castelló
dEmpúries J052 2. RIU FLUVIÀ
Besalú J022 3. RIU
FLUVIÀ LArmentera
J011 4. RIU TER Manlleu
J034 5. RIU TERRI
Sant Julià de Ramis J028 6.
RIU TER Clomers
J112 7. RIU TORDERA
Fogars de Tordera J062 8. RIU CONGOST
La Garriga J037 9.
RIU LLOBREGAT El Pont de Vilomara
J031 10. RIU CARDENER Castellgali
J002 11. RIU LLOBREGAT Abrera
J084 12. RIU LLOBREGAT
Martorell J005 13.
RIU LLOBREGAT Sant Joan Despí
J049 14. RIU FOIX Castellet
J008 15. RIU FRANCOLÍ
La Masó J059 16. RIU
EBRE Flix
J056 17. RIU SEGRE
Térmens J207
Aragón
Barcelona
Mediterranean Sea
17 rivers, 11 metals (As, Ba, Cd, Co, Cu, Cr, Fe,
Mn, Ni, Pb, Zn), 3 environmental conpartments
Fish (barb, bagra comuna, bleak, carp and
trout), Sediment and Water samples
23
  • Missing data (m)
  • Unknown values produce empty holes in data
    matrices
  • When they are few and they are evenly
    distributed, they
  • may be estimated by PCA imputation (or other
    method)
  • Below LOD values (ltLOD)
  • This a common problem in environmental data
    tables
  • If most of the values are below LOD, data
    matrices are sparse
  • For calculations, it is better, either to use
    the experimental values or set them to LOD/2
    instead of to zero or to LOD

24
  • Preliminary data description Use of descriptive
    statistics
  • Individual sample plots
  • Individual variable plots
  • Descriptive statistics (Excel Statistics)
  • Histograms/Box plots
  • Binary correlation between variables
  • 5) ...............................................
    ..............

outliers
upper whisker
upper quartile
median
lower quartile
lower whisker
outliers
25
Effect of different data pre-treatments Sediment
samples
raw mean- centred auto- scaled scaled
Mo is eliminated
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn
26
Data Pretreatment
  • No mean-centering was applied to allow an
    improved physical interpretation of factors
    (application of non-negativity constraints
    instead of orthogonality constraints) and the
    comparison of results using MCR-ALS methods
  • Two scaling possibilities
  • First, data matrix augmentation and then column
    scaling to equal variance (each column element
    divided by its standard deviation)
  • First, column scaling each data matrix separately
    and then data matrix augmentation
  • Variables with nearly no-changes and equal or
    close to their limit of detection were removed
    from scaling and divided by 20 (to avoid their
    miss-overweighting)

27
Description of scaled data Metal distribution in
the three compartments
metals (variables)
Cd, Co and Ld in water were not scaled only
downweigthed
28
Description of scaled data different sites in
the three compartments
Llobregat
Tordera
Segre
Ter
Foix
Llobregat
Congost
Cardener
Fluvià
Llobregat
Muga
Terri
Ebre
Francolí
Ter
Fluvià
Llobregat
sample sites
29
As Ba Cd Co Cu Cr Fe Mn Ni
Pb Zn
30
Fish Sediment Water
SVD odf augmented data matrices in the
three-directions
45
40
svd column-wise (variables)
svd row-wise (samples)
35
svd trube-wise (type)
30
2nd component
25
THREE-WAY DATA ARRAY MATRICIZING or MATRIX
AUGMENTATION
20
15
10
How many components are needed to explain each
mode?
5
0
0
1
2
3
4
5
6
7
8
9
10
31
Bilinear modelling of three-way data (Matrix
Augmentation or matricizing, stretching,
unfolding )
MA-PCA MA-MCR-ALS
contaminants
Y
sites
F
1
4
F
Loadings
S
W
?
S
2
5
sites
sites
6
W
3
Daug
Xaug
Augmented scores matrix
Augmented data matrix
32
Explained variances using bilinear models
(profiles in two modes)
33
MA-PCA of scaled data without scores refolding
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
5
water samples
0
sediment and fish samples
Ba
As
Cu
Zn
-5
0
5
10
15
20
25
30
35
40
45
50
water soluble metal ions
MA-PCA
34
MA-MCR-ALS of scaled data with nn and without
scores refolding
10
sediment and fish samples
Ba
8
Zn
Cu
6
As
4
2
0
0
5
10
15
20
25
30
35
40
45
50
10
8
water samples
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
More easily Interpretable!!!
MA-MCR-ALS MA-PCA
35
Calculation of the boundaries of feasible band
solutions (Journal of Chemometrics, 2001, 15,
627-646)
max
min
Nearly no rotation ambiguities are present in
non-negative environmental profiles calculated by
MCR-ALS (very different to spectroscopy!!!!!)
36
Bilinear modelling of three-way data (Matrix
Augmentation or matricizing, stretching,
unfolding )
Scores refolding strategy!!! (applied only to
final augmented Scores)
Loadings recalculation in two modes from
augmented scores
37
Explained variances using trlinear models
(profiles in three modes)
38
MA-PCA of scaled data with nn and scores refolding
Little differences in samples mode!!!
MA-PCA refolding MA-PCA
39
MA-MCR-ALS of scaled data with scores refolding
MA-MCR-ALS refolding MA-MCR-ALS
40
Trilinear modelling of three-way data
41
PARAFAC of scaled data
PARAFAC MA-PCA (bilinear)
42
(No Transcript)
43
MA-MCR-ALS Trilinear constraint
Xaug
contaminants
Y
sites
F
1
contaminants
F
X
Y
S
W
S
?
2
MCR-ALS
sites
sites
Z
compartments (F,S,W)
sites
W
3
D
Substitution of species profile
Selection of species profile
TRILINEARITY CONSTRAINT (ALS iteration step)
1
1
This constraint is applied at each step of the
ALS optimization and independently for each
component individually
Rebuilding augmented scores
SVD
Folding
2
2
Loadings recalculation in two modes from
augmented scores
every augmented scored wnated to follow the
trilinear model is refolded
3
3
44
MA-MCR-ALS of scaled data with nn, trilinearity
(without scores refolding)
MA-MCR-ALS nn trilinear MA-MCR-ALS nn
45
Calculation of the boundaries of feasible band
solutions (Journal of Chemometrics, 2001, 15,
627-646)
No rotation ambiguities are present in trilinear
non-negative environmental profiles calculated by
MCR-ALS (very different to spectroscopy!!!!!)
46
MA-MCR-ALS of scaled data with nn, trilinearity
and with scores refolding
MA-MCR-ALS nn trilinear PARAFAC nn
47
Comparison PARAFAC vs MCR-ALS (trilinearity)
48
Tucker3 modelling of three-way data
49
Tucker Models with non-negativity constraints
2 3 3
3 3 3
1 3 3
3 2 3
2 2 2 2 2 3
1 2 2 1 2 3
parsimonious model 1 2 2
50
Tucker3 of scaled data
TUCKER3 PARAFAC
model 1 2 2 model 2 2 2
51
MA-MCR-ALS Tucker3 constraint
Xaug
metals
Y
sites
F
1
4
F
X
S
Y
W
S

MCR-ALS
2
5
sites
Z
compartments (F,S,W)
sites
W
3
6
Loadings recalculation in two modes from
augmented scores
D
Tucker3 CONSTRAINT (ALS iteration step)
1
4
Folding
SVD


1
2
3
4
5
6
2
5
This constraint is applied at each step of the
ALS optimization and independently and
individually for each component i
interacting augmented scores are folded together
3
6
52
MA-MCR-ALS of scaled data with nn, tucker3
(without scores refolding)
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
model 1 2 2 model 2 2 2
MA-MCR-ALS nn Tucker3 MA-MCR-ALS nn PARAFAC
53
MA-MCR-ALS of scaled data with nn, tucker3 and
with scores refolding
MA-MCR-ALS nn Tucker3 Tucker3
model 1 2 2 model 1 2 2
54
Summary of Results
55
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS
(Geographical Information Systems, GIS)
(67.3)
(13.2)
56
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS
(Geographical Information Systems, GIS)
(67.3)
(13.2)
57
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS
(Geographical Information Systems, GIS)
(67.3)
(13.2)
58
  • Outline
  • Introduction and motivations of this work
  • Environmental data tables
  • Chemometrics bilinear and trilinear models and
    methods
  • Example of application metal contamination
    sources in fish, sediment and river surface water
    samples.
  • Conclusions

59
Conclusions Chemometric methods allow resolution
of environemtal sources of chemical contaminants
However we should we aware of how every method
displays the information because the mathematical
properties of the used method are different (i.e.
orthogonality vs non-negativity, bilinearity vs
trilinearity, nr. of components...) This
interpretation and resolution of environmental
sources is not easy because the contamination
sources in real world are correlated and because
of experimental data limitations (environmental
sources should show variation in the investigated
data set). Bilinear PCA and MCR-ALS can be used
to study multiway data sets and compared with
multiway methods (like PARAFAC and Tucker if
appropriate scores refolding is
performed) Bilinear non-negative MCR-ALS
solutions may provide good approximation of the
real sources because non-negative environmental
profiles have little rotation ambiguity
60
Conclusions PARAFAC and Tucker3 may provide
simpler models and they are special useful for
trilinear data or when not the same number of
components are present in the different modes.
Intermediate situations between pure bilinear
and pure trilinear models can be easily
implemented in MCR-ALS Bilinear based models are
more flexible than trilinear based models to
resolve true sources of data variation
Different number of components and interactions
between components in different modes (constraint
under development) can be considered in mixed
bilinear-trilinear-Tucker MA-MCR models For an
optimal RESOLUTION, the model should be in
accordance with the 'true' data
structure Integration of Chemometrics-GIS
results may facilitate geographical and temporal
interpretation of contamination sources and they
correlation with land uses, population and
industrial activities
61
Acknowledgements
  • Water Catalan Agency is acknowledge for its
    financial support and for providing experimental
    data sets
  • Research grant Project MCYT, Nr. BQU2003-00191,
    Spain
Write a Comment
User Comments (0)
About PowerShow.com