Intro to Latent Variable Models Independent Component Analysis - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Intro to Latent Variable Models Independent Component Analysis

Description:

The cocktail party problem -- called also blind source separation' (BSS) ... oldest galactic systems. believed to consist of a single population of old stars ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 31

Provided by: axk

Category:

more less

Transcript and Presenter's Notes

Title: Intro to Latent Variable Models Independent Component Analysis

1
Intro to Latent Variable Models Independent
Component Analysis

Ata Kabán
http//www.cs.bham.ac.uk/axk

2
Overview

Today we learn about
The cocktail party problem -- called also blind
source separation (BSS)
Independent Component Analysis (ICA) for solving
BSS
The general form of factor models
Other applications of ICA / BSS
At an intuitive introductory practical level

3
Signals, joint density
Joint density
Signals
Amplitude S1(t)
Amplitude S2(t)
time
marginal densities
4
Original signals (hidden sources) s1(t), s2(t),
s3(t), s4(t), t1T
5
The ICA model
xi(t) ai1s1(t) ai2s2(t)
ai3s3(t) ai4s4(t) Here,
i14. In vector-matrix notation, and dropping
index t (i.e. assuming stationary s), this is x
A s
s3
s4
s1
s2
a13
a12
a11
a14
x1
x2
x3
x4
latent variables
6
This is recorded by the microphones a linear
mixture of the sources xi(t) ai1s1(t)
ai2s2(t) ai3s3(t) ai4s4(t)
7

The coctail party problem
Called also Blind Source Separation (BSS) problem
Ill posed problem, unless assumptions are made!
The most common assumption is that source signals
are statistically independent. This means that
knowing the value of one of them does not give
any information about the other.
The methods based on this assumption are called
Independent Component Analysis methods. These are
statistical techniques of decomposing a complex
data set into independent parts.
It can be shown that under some reasonable
conditions, if the ICA assumption holds, then the
source signals can be recovered up to permutation
and scaling.

Determine the source signals, given only the
mixtures
8
Latent variable models
A wider class of time-independent latent variable
models has the following form
Two main design specifications required! 1)
p(s) 2) f(.)
observed data
latent variables
noise
parameters
Linear models with Gaussian latent prior FA,
PPCA, PCA Generalised linear models with discrete
latent prior FMM Linear models with non-Gaussian
latent prior IFA, ICA Linear model with latent
prior over ve domain ve parameters NMF
Non-linear models with uniform latent prior
GTM, LTM
9
What are these acronyms

These are some classical and quite useful models
and associated techniques for data analysis
FA Factor Analysis
PCA Principal Component Analysis
PPCA Probabilistic Principal Component Analysis
FMM Finite Mixture Models
IFA Independent Factor Analysis
ICA Independent Component Analysis (noise-free
IFA)
NMF Non-negative Matrix Factorisation
GTM Generative Topographic Mapping
LTM Latent Trait Model

10
Three main categories of models the intuition
Dense, distributed
Prototype-based
(for compression)
(for clustering)
Figures taken from Lee Seung, Nature 401, 788
(1999)
Sparse, distributed
(for compression, clustering, structure
discovery, data mining)
11
Back to the Cocktail Party
Recovered signals
12
Some further considerations

If we knew the mixing parameters aij then we
would just need to solve a linear system of
equations.
We know neither aij nor si.
ICA was initially developed to deal with problems
closely related to the cocktail party problem
Later it became evident that ICA has many other
applications too. E.g. from electrical recordings
of brain activity from different locations of the
scalp (EEG signals) recover underlying components
of brain activity

13
Illustration of ICA with 2 signals
a1
a2
s2
x2
a2
a1
s1
x1
Original s
Mixed signals
14
Illustration of ICA with 2 signals
x2
x1
Mixed signals
Step2 Rotatation
Step1 Sphering
15
Illustration of ICA with 2 signals
a1
a2
s2
x2
a2
a1
s1
x1
Original s
Mixed signals
Step2 Rotatation
Step1 Sphering
16
Excluded case
There is one case when rotation doesnt matter.
This case cannot be solved by basic ICA.
Example of non-Gaussian density (-) vs.Gaussian
(-.) Seek non-Gaussian sources for two
reasons identifyability interestingness
Gaussians are not interesting since the
superposition of independent sources tends to be
Gaussian
when both densities are Gaussian
17
Computing the pre-processing steps for ICA

0) Centring make the signals centred in zero
xi ? xi - Exi for each i
1) Sphering make the signals uncorrelated. I.e.
apply a transform V to x such that Cov(Vx)I //
where Cov(y)EyyT denotes covariance matrix
VExxT-1/2 // can be done using sqrtm
function in MatLab
x?Vx // for all t (indexes t dropped
here)
// bold lowercase refers to column
vector bold upper to matrix
Scope to make the remaining computations
simpler. It is known that independent variables
must be uncorrelated so this can be fulfilled
before proceeding to the full ICA

18
Computing the rotation step
Aapo Hyvarinen (97) FastICA

Fixed Point Algorithm
Input X
Random init of W
Iterate until convergence
Output W, S

This is based on an the maximisation of an
objective function G(.) which contains an
approximate non-Gaussianity measure.
where g(.) is derivative of G(.),
W is the rotation transform sought
? is Lagrange multiplier to enforce that
W is an orthogonal transform i.e. a
rotationSolve by fixed point iterationsThe
effect of ? is an orthogonal de-correlation-
Cubic convergence

The overall transform then to take X back to S is
(WTV)
There are several g(.) options, each will work
best in special cases. See FastICA sw / tut for
details.

19
Numerous other ways of approaching the ICA
problem

Higher order moments and cumulants Comon 94,
Hyvarinen 97
Nonlinear PCA Karhunen 94 Oja 97
Maximalization of information transfer Bell
Sejnowski 95 Amari 96 Lee 97-98
Maximum likelihood MacKay 96 Pearlmutter
Parra 96 Cardoso 97
Negentropy maximisation Girolami Fyfe 97
Probabilistic Bayesian formulations of the
noisy ICA Mackay, Valpola, Bishop, etc
recent advances.
More general, more principled
More difficult since the exact inference is
intractable due to the non-Gaussian continuous
latent variables in the model

20
The ML derivation of ICA
David J.C. MacKay (97)
The case of invertible mixing matrix
Natural Gradient Algorithm Random init of
W Iterate until convergence sWx
WWe(If(s)sT)W
xAs sWx, where WA-1
If p(s)exp(-s) then gt f(s)-sign(s) sparse
(super-Gaussian) sources
Note that the latent prior is encoded in the
nonlinear function f(.)!?That is why the choice
of f(.) is problem-dependent.
21
Application domains of ICA

Blind source separation (BellSejnowski, Te won
Lee, Girolami, Hyvarinen, etc.)
Image denoising (Hyvarinen)
Medical signal processing fMRI, ECG, EEG
(Mackeig)
Modelling of the hippocampus and visual cortex
(Lorincz, Hyvarinen)
Feature extraction, face recognition (Marni
Bartlett)
Compression, redundancy reduction
Watermarking (D Lowe)
Clustering (Girolami, Kolenda)
Time series analysis (Back, Valpola)
Topic extraction (Kolenda, Bingham, Kabán)
Scientific Data Mining

22
Image de-noising
Noisy image
Original image
Wiener filtering
ICA filtering
23
Clustering
In multi-variate data search for the direction
along of which the projection of the data is
maximally non-Gaussian has the most structure
24
Blind Separation of Information from Galaxy
Spectra
25
Application Mining stellar populations of
elliptical galaxies
Nolan, Harva, Kabán Raychaudhury, in MNRAS, 2006.

Elliptical galaxies
oldest galactic systems
believed to consist of a single population of
old stars
recent theories indicate the presence of younger
populations of stars
what does the data tell us?

26
Problem analysis

Data
One optical spectrum per galaxy
This is a linear superposition of the spectra of
its billions of constituent stars
If both young and old stars exist in a galaxy,
their emissions cannot be measured separately
Inverse problem
Find out spectral components that can explain all
elliptical spectra by an unknown linear
superposition
Prior knowledge from general characteristics of
data
Spectral elements are positive
The flux at neighboring wavelength is likely to
be similar
Formulated these prior knowledge in the ICA model
in order to obtain components with these
characteristics (details skipped)

27
Decomposition using Physical Models
Decomposition using ICA
28
Physical interpretability of the components found
Young stellar populations
Old stellar populations
Physical models
ICA models
NMF cICA
29
Summing Up

Assumption that the data consists of unknown
components
Analogously to individual signals in an acoustic
mixing
Trying to solve the inverse problem
From observing the superposition only
Recover components of specified non-Gaussian
density form
Components often give simpler, clearer view of
the data which makes ICA useful in data mining

30
ICA Related resources / Starting Points

http//www.cis.hut.fi/projects/ica/cocktail/cockta
il_en.cgiDemo and links to further info on ICA.
http//www.cis.hut.fi/projects/ica/fastica/code/dl
code.shtmlICA software in MatLab.
http//www.cs.helsinki.fi/u/ahyvarin/papers/NN00ne
w.pdf Comprehensive tutorial paper

Write a Comment

User Comments (0)