Title: Independent%20Components%20Analysis
1Independent Components Analysis
2What is ICA?
- Independent component analysis (ICA) is a
method for finding underlying factors or
components from multivariate (multi-dimensional)
statistical data. What distinguishes ICA from
other methods is that it looks for components
that are both statistically independent, and
nonGaussian. - A.Hyvarinen, A.Karhunen, E.Oja
- Independent Component Analysis
3ICA
- Blind Signal Separation (BSS) or Independent
Component Analysis (ICA) is the identification
separation of mixtures of sources with little
prior information. - Applications include
-
- Audio Processing
- Medical data
- Finance
- Array processing (beamforming)
- Coding
- and most applications where Factor Analysis and
PCA is currently used. - While PCA seeks directions that represents data
best in a Sx0 - x2 sense, ICA seeks such
directions that are most independent from each
other. - Often used on Time Series separation of Multiple
Targets
4ICA estimation principles by A.Hyvarinen,
A.Karhunen, E.Oja Independent Component
Analysis
- Principle 1 Nonlinear decorrelation. Find the
matrix W so that for any i ? j , the components
yi and yj are uncorrelated, and the transformed
components g(yi) and h(yj) are uncorrelated,
where g and h are some suitable nonlinear
functions. - Principle 2 Maximum nongaussianity. Find the
local maxima of nongaussianity of a linear
combination yWx under the constraint that the
variance of x is constant. - Each local maximum gives one independent
component.
5ICA mathematical approach from A.Hyvarinen,
A.Karhunen, E.Oja Independent Component Analysis
- Given a set of observations of random
variables x1(t), x2(t)xn(t), where t is the time
or sample index, assume that they are generated
as a linear mixture of independent components
yWx, where W is some unknown matrix. Independent
component analysis now consists of estimating
both the matrix W and the yi(t), when we only
observe the xi(t).
6The simple Cocktail Party Problem
Mixing matrix A
x1
s1
Observations
Sources
x2
s2
x As
n sources, mn observations
7Classical ICA (fast ICA) estimation
Observing signals
Original source signal
ICA
8Motivation
Two Independent Sources
Mixture at two Mics
aIJ ... Depend on the distances of the
microphones from the speakers
9Motivation
Get the Independent Signals out of the Mixture
10ICA Model (Noise Free)
- Use statistical latent variables system
- Random variable sk instead of time signal
- xj aj1s1 aj2s2 .. ajnsn, for all j
- x As
- ICs s are latent variables are unknown AND
Mixing matrix A is also unknown - Task estimate A and s using only the observeable
random vector x - Lets assume that no. of ICs no of observable
mixtures - and A is square and invertible
- So after estimating A, we can compute WA-1 and
hence - s Wx A-1x
11Illustration
2 ICs with distribution Zero mean and
variance equal to 1 Mixing matrix A is
The edges of the parallelogram are in the
direction of the cols of A So if we can Est joint
pdf of x1 x2 and then locating the edges, we
can Est A.
12Restrictions
- si are statistically independent
- p(s1,s2) p(s1)p(s2)
- Nongaussian distributions
- The joint density of unit variance s1 s2 is
symmetric. So it doesnt contain any information
about the directions of the cols of the mixing
matrix A. So A cannt be estimated. - If only one IC is gaussian, the estimation is
still possible.
13Ambiguities
- Cant determine the variances (energies) of the
ICs - Both s A are unknowns, any scalar multiple in
one of the sources can always be cancelled by
dividing the corresponding col of A by it. - Fix magnitudes of ICs assuming unit variance
Esi2 1 - Only ambiguity of sign remains
- Cant determine the order of the ICs
- Terms can be freely changed, because both s and A
are unknown. So we can call any IC as the first
one.
14ICA Principal (Non-Gaussian is Independent)
- Key to estimating A is non-gaussianity
- The distribution of a sum of independent random
variables tends toward a Gaussian distribution.
(By CLT) - f(s1)
f(s2) f(x1) f(s1 s2) - Where w is one of the rows of matrix W.
- y is a linear combination of si, with weights
given by zi. - Since sum of two indep r.v. is more gaussian than
individual r.v., so zTs is more gaussian than
either of si. AND becomes least gaussian when its
equal to one of si. - So we could take w as a vector which maximizes
the non-gaussianity of wTx. - Such a w would correspond to a z with only one
non zero comp. So we get back the si.
15Measures of Non-Gaussianity
- We need to have a quantitative measure of
non-gaussianity for ICA Estimation. - Kurtotis gauss0 (sensitive to outliers)
- Entropy gausslargest
- Neg-entropy gauss 0 (difficult to estimate)
- Approximations
- where v is a standard gaussian random variable
and
16Data Centering Whitening
- Centering
- x x Ex
- But this doesnt mean that ICA cannt estimate the
mean, but it just simplifies the Alg. - ICs are also zero mean because of
- Es WEx
- After ICA, add W.Ex to zero mean ICs
- Whitening
- We transform the xs linearly so that the x are
white. Its done by EVD. - x (ED-1/2ET)x ED-1/2ET Ax As
- where Exx EDET
- So we have to Estimate Orthonormal Matrix A
- An orthonormal matrix has n(n-1)/2 degrees of
freedom. So for large dim A we have to est only
half as much parameters. This greatly simplifies
ICA. - Reducing dim of data (choosing dominant Eig)
while doing whitening also help.
17Computing the pre-processing steps for ICA
- 0) Centring make the signals centred in zero
- xi ? xi - Exi for each i
- 1) Sphering make the signals uncorrelated. I.e.
apply a transform V to x such that Cov(Vx)I //
where Cov(y)EyyT denotes covariance matrix - VExxT-1/2 // can be done using sqrtm
function in MatLab - x?Vx // for all t (indexes t dropped
here) - // bold lowercase refers to column
vector bold upper to matrix - Scope to make the remaining computations
simpler. It is known that independent variables
must be uncorrelated so this can be fulfilled
before proceeding to the full ICA
18Computing the rotation step
Aapo Hyvarinen (97)
This is based on an the maximisation of an
objective function G(.) which contains an
approximate non-Gaussianity measure.
- Fixed Point Algorithm
- Input X
- Random init of W
- Iterate until convergence
- Output W, S
where g(.) is derivative of G(.),
W is the rotation transform sought
? is Lagrange multiplier to enforce that
W is an orthogonal transform i.e. a
rotation Solve by fixed point iterations The
effect of ? is an orthogonal de-correlation
- The overall transform then to take X back to S is
(WTV) - There are several g(.) options, each will work
best in special cases. See FastICA sw / tut for
details.
19Application domains of ICA
- Blind source separation (BellSejnowski, Te won
Lee, Girolami, Hyvarinen, etc.) - Image denoising (Hyvarinen)
- Medical signal processing fMRI, ECG, EEG
(Mackeig) - Modelling of the hippocampus and visual cortex
(Lorincz, Hyvarinen) - Feature extraction, face recognition (Marni
Bartlett) - Compression, redundancy reduction
- Watermarking (D Lowe)
- Clustering (Girolami, Kolenda)
- Time series analysis (Back, Valpola)
- Topic extraction (Kolenda, Bingham, Kaban)
- Scientific Data Mining (Kaban, etc)
20Image denoising
Noisy image
Original image
Wiener filtering
ICA filtering
21Noisy ICA Model
- x As n
- A ... mxn mixing matrix
- s ... n-dimensional vector of ICs
- n ... m-dimensional random noise vector
- Same assumptions as for noise-free model, if we
use measures of nongaussianity which are immune
to gaussian noise. - So gaussian moments are used as contrast
functions. i.e. - however, in pre-whitening the effect of noise
must be taken in to account - x (ExxT - S)-1/2 x
- x Bs n.
22Exercise (part 1, Updated Nov 10)
- How would you calculate efficiently the PCA of
data where the dimensionality d is much larger
than the number of vector observations n? - Download the Wisconsin Data from the UC Irvine
repository, extract PCAs from the data, test
scatter plots of original data and after
projecting onto the principal components, plot
Eigen values
23Ex1. Part 2to ninbbelt_at_gmail.comsubject Ex1
and last names
- Given a high dimensional data, is there a way to
know if all possible projections of the data are
Gaussian? Explain - - What if there is some additive Gaussian noise?
24Ex1. (cont.)
- 2. Use Fast ICA (easily found in google)
http//www.cis.hut.fi/projects/ica/fastica/code/dl
code.html - Choose your favorite two songs
- Create 3 mixture matrices and mix them
- Apply fastica to de-mix
25Ex1 (cont.)
- Discuss the results
- What happens when the mixing matrix is symmetric
- Why did u get different results with different
mixing matrices - Demonstrate that you got close to the original
files - Try different nonlinearity of fastica, which one
is best, can you see that from the data
26References
- Feature extraction (Images, Video)
- http//hlab.phys.rug.nl/demos/ica/
- Aapo Hyvarinen ICA (1999)
- http//www.cis.hut.fi/aapo/papers/NCS99web/node11.
html - ICA demo step-by-step
- http//www.cis.hut.fi/projects/ica/icademo/
- Lots of links
- http//sound.media.mit.edu/paris/ica.html
- object-based audio capture demos
- http//www.media.mit.edu/westner/sepdemo.html
- Demo for BBS with CoBliSS (wav-files)
- http//www.esp.ele.tue.nl/onderzoek/daniels/BSS.ht
ml - Tomas Zemans page on BSS research
- http//ica.fun-thom.misto.cz/page3.html
- Virtual Laboratories in Probability and
Statistics - http//www.math.uah.edu/stat/index.html