An orthonormal matrix has n(n-1)/2 degrees of freedom. .. - PowerPoint PPT Presentation

About This Presentation

Title:

An orthonormal matrix has n(n-1)/2 degrees of freedom. ..

Description:

An orthonormal matrix has n(n-1)/2 degrees of freedom. ... Demo for BBS with 'CoBliSS' (wav-files) http://www.esp.ele.tue.nl/onderzoek/daniels/BSS.html ... – PowerPoint PPT presentation

Number of Views:146

Avg rating:3.0/5.0

Slides: 27

Provided by: Ahtasha8

Category:

more less

Transcript and Presenter's Notes

Title: An orthonormal matrix has n(n-1)/2 degrees of freedom. ..

1
Independent Components Analysis
2
What is ICA?

Independent component analysis (ICA) is a
method for finding underlying factors or
components from multivariate (multi-dimensional)
statistical data. What distinguishes ICA from
other methods is that it looks for components
that are both statistically independent, and
nonGaussian.
A.Hyvarinen, A.Karhunen, E.Oja
Independent Component Analysis

3
ICA

Blind Signal Separation (BSS) or Independent
Component Analysis (ICA) is the identification
separation of mixtures of sources with little
prior information.
Applications include
Audio Processing
Medical data
Finance
Array processing (beamforming)
Coding
and most applications where Factor Analysis and
PCA is currently used.
While PCA seeks directions that represents data
best in a Sx0 - x2 sense, ICA seeks such
directions that are most independent from each
other.
Often used on Time Series separation of Multiple
Targets

4
ICA estimation principles by A.Hyvarinen,
A.Karhunen, E.Oja Independent Component
Analysis

Principle 1 Nonlinear decorrelation. Find the
matrix W so that for any i ? j , the components
yi and yj are uncorrelated, and the transformed
components g(yi) and h(yj) are uncorrelated,
where g and h are some suitable nonlinear
functions.
Principle 2 Maximum nongaussianity. Find the
local maxima of nongaussianity of a linear
combination yWx under the constraint that the
variance of x is constant.
Each local maximum gives one independent
component.

5
ICA mathematical approach from A.Hyvarinen,
A.Karhunen, E.Oja Independent Component Analysis

Given a set of observations of random
variables x1(t), x2(t)xn(t), where t is the time
or sample index, assume that they are generated
as a linear mixture of independent components
yWx, where W is some unknown matrix. Independent
component analysis now consists of estimating
both the matrix W and the yi(t), when we only
observe the xi(t).

6
The simple Cocktail Party Problem
Mixing matrix A
x1
s1
Observations
Sources
x2
s2
x As
n sources, mn observations
7
Classical ICA (fast ICA) estimation
Observing signals
Original source signal
ICA
8
Motivation
Two Independent Sources
Mixture at two Mics
aIJ ... Depend on the distances of the
microphones from the speakers
9
Motivation
Get the Independent Signals out of the Mixture
10
ICA Model (Noise Free)

Use statistical latent variables system
Random variable sk instead of time signal
xj aj1s1 aj2s2 .. ajnsn, for all j
x As
ICs s are latent variables are unknown AND
Mixing matrix A is also unknown
Task estimate A and s using only the observeable
random vector x
Lets assume that no. of ICs no of observable
mixtures
and A is square and invertible
So after estimating A, we can compute WA-1 and
hence
s Wx A-1x

11
Illustration
2 ICs with distribution Zero mean and
variance equal to 1 Mixing matrix A is
The edges of the parallelogram are in the
direction of the cols of A So if we can Est joint
pdf of x1 x2 and then locating the edges, we
can Est A.
12
Restrictions

si are statistically independent
p(s1,s2) p(s1)p(s2)
Nongaussian distributions
The joint density of unit variance s1 s2 is
symmetric. So it doesnt contain any information
about the directions of the cols of the mixing
matrix A. So A cannt be estimated.
If only one IC is gaussian, the estimation is
still possible.

13
Ambiguities

Cant determine the variances (energies) of the
ICs
Both s A are unknowns, any scalar multiple in
one of the sources can always be cancelled by
dividing the corresponding col of A by it.
Fix magnitudes of ICs assuming unit variance
Esi2 1
Only ambiguity of sign remains
Cant determine the order of the ICs
Terms can be freely changed, because both s and A
are unknown. So we can call any IC as the first
one.

14
ICA Principal (Non-Gaussian is Independent)

Key to estimating A is non-gaussianity
The distribution of a sum of independent random
variables tends toward a Gaussian distribution.
(By CLT)
f(s1)
f(s2) f(x1) f(s1 s2)
Where w is one of the rows of matrix W.
y is a linear combination of si, with weights
given by zi.
Since sum of two indep r.v. is more gaussian than
individual r.v., so zTs is more gaussian than
either of si. AND becomes least gaussian when its
equal to one of si.
So we could take w as a vector which maximizes
the non-gaussianity of wTx.
Such a w would correspond to a z with only one
non zero comp. So we get back the si.

15
Measures of Non-Gaussianity

We need to have a quantitative measure of
non-gaussianity for ICA Estimation.
Kurtotis gauss0 (sensitive to outliers)
Entropy gausslargest
Neg-entropy gauss 0 (difficult to estimate)
Approximations
where v is a standard gaussian random variable
and

16
Data Centering Whitening

Centering
x x Ex
But this doesnt mean that ICA cannt estimate the
mean, but it just simplifies the Alg.
ICs are also zero mean because of
Es WEx
After ICA, add W.Ex to zero mean ICs
Whitening
We transform the xs linearly so that the x are
white. Its done by EVD.
x (ED-1/2ET)x ED-1/2ET Ax As
where Exx EDET
So we have to Estimate Orthonormal Matrix A
An orthonormal matrix has n(n-1)/2 degrees of
freedom. So for large dim A we have to est only
half as much parameters. This greatly simplifies
ICA.
Reducing dim of data (choosing dominant Eig)
while doing whitening also help.

17
Computing the pre-processing steps for ICA

0) Centring make the signals centred in zero
xi ? xi - Exi for each i
1) Sphering make the signals uncorrelated. I.e.
apply a transform V to x such that Cov(Vx)I //
where Cov(y)EyyT denotes covariance matrix
VExxT-1/2 // can be done using sqrtm
function in MatLab
x?Vx // for all t (indexes t dropped
here)
// bold lowercase refers to column
vector bold upper to matrix
Scope to make the remaining computations
simpler. It is known that independent variables
must be uncorrelated so this can be fulfilled
before proceeding to the full ICA

18
Computing the rotation step
Aapo Hyvarinen (97)
This is based on an the maximisation of an
objective function G(.) which contains an
approximate non-Gaussianity measure.

Fixed Point Algorithm
Input X
Random init of W
Iterate until convergence
Output W, S

where g(.) is derivative of G(.),
W is the rotation transform sought
? is Lagrange multiplier to enforce that
W is an orthogonal transform i.e. a
rotation Solve by fixed point iterations The
effect of ? is an orthogonal de-correlation

The overall transform then to take X back to S is
(WTV)
There are several g(.) options, each will work
best in special cases. See FastICA sw / tut for
details.

19
Application domains of ICA

Blind source separation (BellSejnowski, Te won
Lee, Girolami, Hyvarinen, etc.)
Image denoising (Hyvarinen)
Medical signal processing fMRI, ECG, EEG
(Mackeig)
Modelling of the hippocampus and visual cortex
(Lorincz, Hyvarinen)
Feature extraction, face recognition (Marni
Bartlett)
Compression, redundancy reduction
Watermarking (D Lowe)
Clustering (Girolami, Kolenda)
Time series analysis (Back, Valpola)
Topic extraction (Kolenda, Bingham, Kaban)
Scientific Data Mining (Kaban, etc)

20
Image denoising
Noisy image
Original image
Wiener filtering
ICA filtering
21
Noisy ICA Model

x As n
A ... mxn mixing matrix
s ... n-dimensional vector of ICs
n ... m-dimensional random noise vector
Same assumptions as for noise-free model, if we
use measures of nongaussianity which are immune
to gaussian noise.
So gaussian moments are used as contrast
functions. i.e.
however, in pre-whitening the effect of noise
must be taken in to account
x (ExxT - S)-1/2 x
x Bs n.

22
Exercise (part 1, Updated Nov 10)

How would you calculate efficiently the PCA of
data where the dimensionality d is much larger
than the number of vector observations n?
Download the Wisconsin Data from the UC Irvine
repository, extract PCAs from the data, test
scatter plots of original data and after
projecting onto the principal components, plot
Eigen values

23
Ex1. Part 2to ninbbelt_at_gmail.comsubject Ex1
and last names

Given a high dimensional data, is there a way to
know if all possible projections of the data are
Gaussian? Explain
- What if there is some additive Gaussian noise?

24
Ex1. (cont.)

2. Use Fast ICA (easily found in google)
http//www.cis.hut.fi/projects/ica/fastica/code/dl
code.html
Choose your favorite two songs
Create 3 mixture matrices and mix them
Apply fastica to de-mix

25
Ex1 (cont.)

Discuss the results
What happens when the mixing matrix is symmetric
Why did u get different results with different
mixing matrices
Demonstrate that you got close to the original
files
Try different nonlinearity of fastica, which one
is best, can you see that from the data

26
References