Fitting the PARAFAC model - PowerPoint PPT Presentation

About This Presentation

Title:

Fitting the PARAFAC model

Description:

The loss function is alternatively minimised with respect to one of the set of ... is minimised and not the residuals. It includes compression based on SVD ... – PowerPoint PPT presentation

Number of Views:234

Avg rating:3.0/5.0

Slides: 26

Provided by: giorgio

Category:

more less

Transcript and Presenter's Notes

Title: Fitting the PARAFAC model

1
Fitting the PARAFAC model

Giorgio Tomasi Chemometrics group, LMT,MLI,
KVL Frederiksberg. Denmark E-mail gt_at_kvl.dk
2
PARAFAC model

PARAFAC (PARallel FACtor analysis)
Fitting an n-linear model to an n-way array.
For a three way array
The associated loss function is
Where

3
The algorithms

Direct methods
DTLD/GRAM (Direct TriLinear Decomposition /
Generalised Rank Annihilation Method)
Alternating methods
ALS (Alternating Least Squares)
ASD (Alternating Slice-wise Diagonalisation)
SWATLD (Self-Weighted Alternating Trilinear
Decomposition)
Derivative based
Levenberg Marquadt
PMF3 (Positive Matrix Factorisation for 3 way
arrays)

4
Direct method

DTLD-GRAM (Sanchez Kowalsky 1986)
Based on a generalised Eigenvalue Problem
Originally applicable only to arrays having only
two slabs in one of the modes (GRAM)
Generalised by means of a Tucker compression
(DTLD)
Advantage quick
Shortcomings
The algorithm does not provide the solution in
terms of least squares
Sensitivity to noise

5
Alternating methods - 1

The loss function is alternatively minimised
with respect to one of the set of parameters
involved
PARAFAC ALS (Harshman 1970, Carrol Chang
1970)
Well established algorithm
Several improvements have been added
(compression, line search, variable separation)
The solution is found in the least squares sense
Shortcomings
Slow convergence rate
Sensitivity to over- (and under-) factoring

6
Alternating methods - 2

SWATLD (Chen ZP et al, 2000)
Alternates in the minimisation of three
different loss functions (one each for A, B and
C)
The solution for each step is found as
Not expressed in terms of least squares.
General property and mechanisms have not been
studied, yet.

7
Alternating methods - 3

ASD (Jiang JH et al., 2000)
Based on a modified loss function employing five
sets of parameters for a trilinear model
The solution is not expressed in terms of least
squares
is minimised and not the residuals
It includes compression based on SVD
Unknown properties

8
Derivative-based methods - 1

Based on the linearisation of the loss function
with respect to the parameters of the model.
All the parameters are unified i a single vector
Vectorisation of the 3-way array

9
Derivative-based methods - 2

Levenberg-Marquadt (Paatero 1997, Bijlsma 1998)
The update for vector p is found as a solution to
the system
The parameter l makes the right hand side
positive definite and non-singular.
The solution is found in the least squares sense
provided that l becomes small enough

10
Derivative-based methods - 3
11
Derivative-based methods - 4

PMF3 (Paatero, 1997)
The loss function includes a penalty term
The system of normal equations is modified
accordingly
A non-linear update is calculated and used if
provides a better solution. The right hand side
is modified into
Line search is applied whenever the algorithm
diverges

12
Compression

A Tucker3 model with
components is fitted
A PARAFAC model is fitted on the Tucker3s core
PARAFAC is expanded to the original dimensions
by means of the Tucker3s loadings
The expanded matrices provide the starting values
for more expensive computations on the original
space (here always by means of PARAFAC-ALS
As to be able to compare the its effect on the
computational expenses ALS, LM and PMF3
algorithms were employed both with and without
compression

13
PARAFAC indeterminacies

Permutational indeterminacy (trivial)
Scaling indeterminacy
The two models are equivalent so long as
The consequence is the rank deficiency of J

14
Tests

Montecarlo simulations
720 data sets of dimension 20 x 20 x 20
Four features were varied
Rank (3 and 5)
Homoscedastic and heteroscedastic noise (3 levels
each)
Collinearity between the components (cosine .5
or .9)
On each data set were fitted to models F and F1
Two real data sets fluorescence spectra
Data set 1 6 replicates, 15 x 66 x 15, rank 4
Data set 2 3 replicates, 22 x 87 x 13, rank 4
Measured on solution of four compounds which
concentrations were then calculated based on the
PARAFAC model

15
Initialisation and convergence

All the algorithms but DTLD were initialised
using matrices of random numbers
10 sets of loading matrices were generated with
random numbers
On each of them were run10 iterations with
PARAFAC-ALS
The best fitting has been used has initial value
Convergence criteria
Relative decrease in fit
Relative change of the parameters (only LM and
PMF3)
Gradient norm (only LM and PMF3)
Consecutive almost singular left hand side 5
Maximum number of iteration 10000/500
respectively for alternating algorithms and
derivative based

16
Evaluation parameters
17
(No Transcript)
18
Quality of the solution

ALS, both with compression and without is very
much affected by overfactoring
SWATLD is very resistant to it an has a better
chance to retrieve the correct factors
ASD seems rather nice but the components tend to
be extremely noisy

19
Time consumption

dGN and PMF3 are the most expensive in terms of
computational time
The filling og the Jacobian takes up to 50 of
the time
Compression significantly helps
Need for more efficient routines to calculate
and

124
235
20
Iterations

Compressed methods require more iterations for
fitting and many less for refining
Compressed methods are more affected by
over-factoring for as n. of iterations
Derivative-based methods are more efficient but
more expensive.
Compression allows similar cost per iteration for
derivative based

21
RMSEP for 1st data set, 4 factors
22
RMSEP for 1st data set, 5 factors
23
RMSEP for 2nd data set
24
Conclusions

PARAFAC-ALS is more sensitive than the other
methods to over-factoring
SWATLD appears as the most efficient method when
it comes to retrieval of the underlying factors
(on simulated data). Conversely it is not as
efficient on real data and hardly ever provides
the least squares solution. It is likely a good
method for initialisation.
Derivative based methods require compression in
order to be feasible for large scale problems
Compression does not seem to affect the recovery
capability of the algorithms it is combined with.

25
Future aspects

PARAFAC growing number of applications in
spectrometry implies dealing with larger data
sets
Need for more efficient routines for the
derivative based methods
Development of more refined methods exploiting
the sparsity of the Jacobian and the
multilinearity. (f.i use of 2nd derivatives,
variable separation,)
Alternative algorithms providing the least
squares solution (e.g. simulated annealing)