Title: Intro to Latent Variable Models Independent Component Analysis
1Intro to Latent Variable Models Independent
Component Analysis
- Ata Kabán
- http//www.cs.bham.ac.uk/axk
2Overview
- Today we learn about
- The cocktail party problem -- called also blind
source separation (BSS) - Independent Component Analysis (ICA) for solving
BSS - The general form of factor models
- Other applications of ICA / BSS
- At an intuitive introductory practical level
3Signals, joint density
Joint density
Signals
Amplitude S1(t)
Amplitude S2(t)
time
marginal densities
4Original signals (hidden sources) s1(t), s2(t),
s3(t), s4(t), t1T
5The ICA model
xi(t) ai1s1(t) ai2s2(t)
ai3s3(t) ai4s4(t) Here,
i14. In vector-matrix notation, and dropping
index t (i.e. assuming stationary s), this is x
A s
s3
s4
s1
s2
a13
a12
a11
a14
x1
x2
x3
x4
latent variables
6This is recorded by the microphones a linear
mixture of the sources xi(t) ai1s1(t)
ai2s2(t) ai3s3(t) ai4s4(t)
7- The coctail party problem
- Called also Blind Source Separation (BSS) problem
- Ill posed problem, unless assumptions are made!
- The most common assumption is that source signals
are statistically independent. This means that
knowing the value of one of them does not give
any information about the other. - The methods based on this assumption are called
Independent Component Analysis methods. These are
statistical techniques of decomposing a complex
data set into independent parts. - It can be shown that under some reasonable
conditions, if the ICA assumption holds, then the
source signals can be recovered up to permutation
and scaling.
Determine the source signals, given only the
mixtures
8Latent variable models
A wider class of time-independent latent variable
models has the following form
Two main design specifications required! 1)
p(s) 2) f(.)
observed data
latent variables
noise
parameters
Linear models with Gaussian latent prior FA,
PPCA, PCA Generalised linear models with discrete
latent prior FMM Linear models with non-Gaussian
latent prior IFA, ICA Linear model with latent
prior over ve domain ve parameters NMF
Non-linear models with uniform latent prior
GTM, LTM
9What are these acronyms
- These are some classical and quite useful models
and associated techniques for data analysis - FA Factor Analysis
- PCA Principal Component Analysis
- PPCA Probabilistic Principal Component Analysis
- FMM Finite Mixture Models
- IFA Independent Factor Analysis
- ICA Independent Component Analysis (noise-free
IFA) - NMF Non-negative Matrix Factorisation
- GTM Generative Topographic Mapping
- LTM Latent Trait Model
10Three main categories of models the intuition
Dense, distributed
Prototype-based
(for compression)
(for clustering)
Figures taken from Lee Seung, Nature 401, 788
(1999)
Sparse, distributed
(for compression, clustering, structure
discovery, data mining)
11Back to the Cocktail Party
Recovered signals
12Some further considerations
- If we knew the mixing parameters aij then we
would just need to solve a linear system of
equations. - We know neither aij nor si.
- ICA was initially developed to deal with problems
closely related to the cocktail party problem - Later it became evident that ICA has many other
applications too. E.g. from electrical recordings
of brain activity from different locations of the
scalp (EEG signals) recover underlying components
of brain activity
13Illustration of ICA with 2 signals
a1
a2
s2
x2
a2
a1
s1
x1
Original s
Mixed signals
14Illustration of ICA with 2 signals
x2
x1
Mixed signals
Step2 Rotatation
Step1 Sphering
15Illustration of ICA with 2 signals
a1
a2
s2
x2
a2
a1
s1
x1
Original s
Mixed signals
Step2 Rotatation
Step1 Sphering
16Excluded case
There is one case when rotation doesnt matter.
This case cannot be solved by basic ICA.
Example of non-Gaussian density (-) vs.Gaussian
(-.) Seek non-Gaussian sources for two
reasons identifyability interestingness
Gaussians are not interesting since the
superposition of independent sources tends to be
Gaussian
when both densities are Gaussian
17Computing the pre-processing steps for ICA
- 0) Centring make the signals centred in zero
- xi ? xi - Exi for each i
- 1) Sphering make the signals uncorrelated. I.e.
apply a transform V to x such that Cov(Vx)I //
where Cov(y)EyyT denotes covariance matrix - VExxT-1/2 // can be done using sqrtm
function in MatLab - x?Vx // for all t (indexes t dropped
here) - // bold lowercase refers to column
vector bold upper to matrix - Scope to make the remaining computations
simpler. It is known that independent variables
must be uncorrelated so this can be fulfilled
before proceeding to the full ICA
18Computing the rotation step
Aapo Hyvarinen (97) FastICA
- Fixed Point Algorithm
- Input X
- Random init of W
- Iterate until convergence
- Output W, S
This is based on an the maximisation of an
objective function G(.) which contains an
approximate non-Gaussianity measure.
where g(.) is derivative of G(.),
W is the rotation transform sought
? is Lagrange multiplier to enforce that
W is an orthogonal transform i.e. a
rotationSolve by fixed point iterationsThe
effect of ? is an orthogonal de-correlation-
Cubic convergence
- The overall transform then to take X back to S is
(WTV) - There are several g(.) options, each will work
best in special cases. See FastICA sw / tut for
details.
19Numerous other ways of approaching the ICA
problem
- Higher order moments and cumulants Comon 94,
Hyvarinen 97 - Nonlinear PCA Karhunen 94 Oja 97
- Maximalization of information transfer Bell
Sejnowski 95 Amari 96 Lee 97-98 - Maximum likelihood MacKay 96 Pearlmutter
Parra 96 Cardoso 97 - Negentropy maximisation Girolami Fyfe 97
- Probabilistic Bayesian formulations of the
noisy ICA Mackay, Valpola, Bishop, etc
recent advances. - More general, more principled
- More difficult since the exact inference is
intractable due to the non-Gaussian continuous
latent variables in the model
20The ML derivation of ICA
David J.C. MacKay (97)
The case of invertible mixing matrix
Natural Gradient Algorithm Random init of
W Iterate until convergence sWx
WWe(If(s)sT)W
xAs sWx, where WA-1
If p(s)exp(-s) then gt f(s)-sign(s) sparse
(super-Gaussian) sources
Note that the latent prior is encoded in the
nonlinear function f(.)!?That is why the choice
of f(.) is problem-dependent.
21Application domains of ICA
- Blind source separation (BellSejnowski, Te won
Lee, Girolami, Hyvarinen, etc.) - Image denoising (Hyvarinen)
- Medical signal processing fMRI, ECG, EEG
(Mackeig) - Modelling of the hippocampus and visual cortex
(Lorincz, Hyvarinen) - Feature extraction, face recognition (Marni
Bartlett) - Compression, redundancy reduction
- Watermarking (D Lowe)
- Clustering (Girolami, Kolenda)
- Time series analysis (Back, Valpola)
- Topic extraction (Kolenda, Bingham, Kabán)
- Scientific Data Mining
22Image de-noising
Noisy image
Original image
Wiener filtering
ICA filtering
23Clustering
In multi-variate data search for the direction
along of which the projection of the data is
maximally non-Gaussian has the most structure
24Blind Separation of Information from Galaxy
Spectra
25Application Mining stellar populations of
elliptical galaxies
Nolan, Harva, Kabán Raychaudhury, in MNRAS, 2006.
- Elliptical galaxies
- oldest galactic systems
- believed to consist of a single population of
old stars - recent theories indicate the presence of younger
populations of stars - what does the data tell us?
26Problem analysis
- Data
- One optical spectrum per galaxy
- This is a linear superposition of the spectra of
its billions of constituent stars - If both young and old stars exist in a galaxy,
their emissions cannot be measured separately - Inverse problem
- Find out spectral components that can explain all
elliptical spectra by an unknown linear
superposition - Prior knowledge from general characteristics of
data - Spectral elements are positive
- The flux at neighboring wavelength is likely to
be similar - Formulated these prior knowledge in the ICA model
in order to obtain components with these
characteristics (details skipped)
27Decomposition using Physical Models
Decomposition using ICA
28Physical interpretability of the components found
Young stellar populations
Old stellar populations
Physical models
ICA models
NMF cICA
29Summing Up
- Assumption that the data consists of unknown
components - Analogously to individual signals in an acoustic
mixing - Trying to solve the inverse problem
- From observing the superposition only
- Recover components of specified non-Gaussian
density form - Components often give simpler, clearer view of
the data which makes ICA useful in data mining
30ICA Related resources / Starting Points
- http//www.cis.hut.fi/projects/ica/cocktail/cockta
il_en.cgiDemo and links to further info on ICA. - http//www.cis.hut.fi/projects/ica/fastica/code/dl
code.shtmlICA software in MatLab. - http//www.cs.helsinki.fi/u/ahyvarin/papers/NN00ne
w.pdf Comprehensive tutorial paper