An orthonormal matrix has n(n-1)/2 degrees of freedom. .. - PowerPoint PPT Presentation

About This Presentation
Title:

An orthonormal matrix has n(n-1)/2 degrees of freedom. ..

Description:

An orthonormal matrix has n(n-1)/2 degrees of freedom. ... Demo for BBS with 'CoBliSS' (wav-files) http://www.esp.ele.tue.nl/onderzoek/daniels/BSS.html ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 27
Provided by: Ahtasha8
Category:

less

Transcript and Presenter's Notes

Title: An orthonormal matrix has n(n-1)/2 degrees of freedom. ..


1
Independent Components Analysis
2
What is ICA?
  • Independent component analysis (ICA) is a
    method for finding underlying factors or
    components from multivariate (multi-dimensional)
    statistical data. What distinguishes ICA from
    other methods is that it looks for components
    that are both statistically independent, and
    nonGaussian.
  • A.Hyvarinen, A.Karhunen, E.Oja
  • Independent Component Analysis

3
ICA
  • Blind Signal Separation (BSS) or Independent
    Component Analysis (ICA) is the identification
    separation of mixtures of sources with little
    prior information.
  • Applications include
  • Audio Processing
  • Medical data
  • Finance
  • Array processing (beamforming)
  • Coding
  • and most applications where Factor Analysis and
    PCA is currently used.
  • While PCA seeks directions that represents data
    best in a Sx0 - x2 sense, ICA seeks such
    directions that are most independent from each
    other.
  • Often used on Time Series separation of Multiple
    Targets

4
ICA estimation principles by A.Hyvarinen,
A.Karhunen, E.Oja Independent Component
Analysis
  • Principle 1 Nonlinear decorrelation. Find the
    matrix W so that for any i ? j , the components
    yi and yj are uncorrelated, and the transformed
    components g(yi) and h(yj) are uncorrelated,
    where g and h are some suitable nonlinear
    functions.
  • Principle 2 Maximum nongaussianity. Find the
    local maxima of nongaussianity of a linear
    combination yWx under the constraint that the
    variance of x is constant.
  • Each local maximum gives one independent
    component.

5
ICA mathematical approach from A.Hyvarinen,
A.Karhunen, E.Oja Independent Component Analysis
  • Given a set of observations of random
    variables x1(t), x2(t)xn(t), where t is the time
    or sample index, assume that they are generated
    as a linear mixture of independent components
    yWx, where W is some unknown matrix. Independent
    component analysis now consists of estimating
    both the matrix W and the yi(t), when we only
    observe the xi(t).

6
The simple Cocktail Party Problem
Mixing matrix A
x1
s1
Observations
Sources
x2
s2
x As
n sources, mn observations
7
Classical ICA (fast ICA) estimation
Observing signals
Original source signal
ICA
8
Motivation
Two Independent Sources
Mixture at two Mics
aIJ ... Depend on the distances of the
microphones from the speakers
9
Motivation
Get the Independent Signals out of the Mixture
10
ICA Model (Noise Free)
  • Use statistical latent variables system
  • Random variable sk instead of time signal
  • xj aj1s1 aj2s2 .. ajnsn, for all j
  • x As
  • ICs s are latent variables are unknown AND
    Mixing matrix A is also unknown
  • Task estimate A and s using only the observeable
    random vector x
  • Lets assume that no. of ICs no of observable
    mixtures
  • and A is square and invertible
  • So after estimating A, we can compute WA-1 and
    hence
  • s Wx A-1x

11
Illustration
2 ICs with distribution Zero mean and
variance equal to 1 Mixing matrix A is
The edges of the parallelogram are in the
direction of the cols of A So if we can Est joint
pdf of x1 x2 and then locating the edges, we
can Est A.
12
Restrictions
  • si are statistically independent
  • p(s1,s2) p(s1)p(s2)
  • Nongaussian distributions
  • The joint density of unit variance s1 s2 is
    symmetric. So it doesnt contain any information
    about the directions of the cols of the mixing
    matrix A. So A cannt be estimated.
  • If only one IC is gaussian, the estimation is
    still possible.

13
Ambiguities
  • Cant determine the variances (energies) of the
    ICs
  • Both s A are unknowns, any scalar multiple in
    one of the sources can always be cancelled by
    dividing the corresponding col of A by it.
  • Fix magnitudes of ICs assuming unit variance
    Esi2 1
  • Only ambiguity of sign remains
  • Cant determine the order of the ICs
  • Terms can be freely changed, because both s and A
    are unknown. So we can call any IC as the first
    one.

14
ICA Principal (Non-Gaussian is Independent)
  • Key to estimating A is non-gaussianity
  • The distribution of a sum of independent random
    variables tends toward a Gaussian distribution.
    (By CLT)
  • f(s1)
    f(s2) f(x1) f(s1 s2)
  • Where w is one of the rows of matrix W.
  • y is a linear combination of si, with weights
    given by zi.
  • Since sum of two indep r.v. is more gaussian than
    individual r.v., so zTs is more gaussian than
    either of si. AND becomes least gaussian when its
    equal to one of si.
  • So we could take w as a vector which maximizes
    the non-gaussianity of wTx.
  • Such a w would correspond to a z with only one
    non zero comp. So we get back the si.

15
Measures of Non-Gaussianity
  • We need to have a quantitative measure of
    non-gaussianity for ICA Estimation.
  • Kurtotis gauss0 (sensitive to outliers)
  • Entropy gausslargest
  • Neg-entropy gauss 0 (difficult to estimate)
  • Approximations
  • where v is a standard gaussian random variable
    and

16
Data Centering Whitening
  • Centering
  • x x Ex
  • But this doesnt mean that ICA cannt estimate the
    mean, but it just simplifies the Alg.
  • ICs are also zero mean because of
  • Es WEx
  • After ICA, add W.Ex to zero mean ICs
  • Whitening
  • We transform the xs linearly so that the x are
    white. Its done by EVD.
  • x (ED-1/2ET)x ED-1/2ET Ax As
  • where Exx EDET
  • So we have to Estimate Orthonormal Matrix A
  • An orthonormal matrix has n(n-1)/2 degrees of
    freedom. So for large dim A we have to est only
    half as much parameters. This greatly simplifies
    ICA.
  • Reducing dim of data (choosing dominant Eig)
    while doing whitening also help.

17
Computing the pre-processing steps for ICA
  • 0) Centring make the signals centred in zero
  • xi ? xi - Exi for each i
  • 1) Sphering make the signals uncorrelated. I.e.
    apply a transform V to x such that Cov(Vx)I //
    where Cov(y)EyyT denotes covariance matrix
  • VExxT-1/2 // can be done using sqrtm
    function in MatLab
  • x?Vx // for all t (indexes t dropped
    here)
  • // bold lowercase refers to column
    vector bold upper to matrix
  • Scope to make the remaining computations
    simpler. It is known that independent variables
    must be uncorrelated so this can be fulfilled
    before proceeding to the full ICA

18
Computing the rotation step
Aapo Hyvarinen (97)
This is based on an the maximisation of an
objective function G(.) which contains an
approximate non-Gaussianity measure.
  • Fixed Point Algorithm
  • Input X
  • Random init of W
  • Iterate until convergence
  • Output W, S

where g(.) is derivative of G(.),
W is the rotation transform sought
? is Lagrange multiplier to enforce that
W is an orthogonal transform i.e. a
rotation Solve by fixed point iterations The
effect of ? is an orthogonal de-correlation
  • The overall transform then to take X back to S is
    (WTV)
  • There are several g(.) options, each will work
    best in special cases. See FastICA sw / tut for
    details.

19
Application domains of ICA
  • Blind source separation (BellSejnowski, Te won
    Lee, Girolami, Hyvarinen, etc.)
  • Image denoising (Hyvarinen)
  • Medical signal processing fMRI, ECG, EEG
    (Mackeig)
  • Modelling of the hippocampus and visual cortex
    (Lorincz, Hyvarinen)
  • Feature extraction, face recognition (Marni
    Bartlett)
  • Compression, redundancy reduction
  • Watermarking (D Lowe)
  • Clustering (Girolami, Kolenda)
  • Time series analysis (Back, Valpola)
  • Topic extraction (Kolenda, Bingham, Kaban)
  • Scientific Data Mining (Kaban, etc)

20
Image denoising
Noisy image
Original image
Wiener filtering
ICA filtering
21
Noisy ICA Model
  • x As n
  • A ... mxn mixing matrix
  • s ... n-dimensional vector of ICs
  • n ... m-dimensional random noise vector
  • Same assumptions as for noise-free model, if we
    use measures of nongaussianity which are immune
    to gaussian noise.
  • So gaussian moments are used as contrast
    functions. i.e.
  • however, in pre-whitening the effect of noise
    must be taken in to account
  • x (ExxT - S)-1/2 x
  • x Bs n.

22
Exercise (part 1, Updated Nov 10)
  • How would you calculate efficiently the PCA of
    data where the dimensionality d is much larger
    than the number of vector observations n?
  • Download the Wisconsin Data from the UC Irvine
    repository, extract PCAs from the data, test
    scatter plots of original data and after
    projecting onto the principal components, plot
    Eigen values

23
Ex1. Part 2to ninbbelt_at_gmail.comsubject Ex1
and last names
  • Given a high dimensional data, is there a way to
    know if all possible projections of the data are
    Gaussian? Explain
  • - What if there is some additive Gaussian noise?

24
Ex1. (cont.)
  • 2. Use Fast ICA (easily found in google)
    http//www.cis.hut.fi/projects/ica/fastica/code/dl
    code.html
  • Choose your favorite two songs
  • Create 3 mixture matrices and mix them
  • Apply fastica to de-mix

25
Ex1 (cont.)
  • Discuss the results
  • What happens when the mixing matrix is symmetric
  • Why did u get different results with different
    mixing matrices
  • Demonstrate that you got close to the original
    files
  • Try different nonlinearity of fastica, which one
    is best, can you see that from the data

26
References
  • Feature extraction (Images, Video)
  • http//hlab.phys.rug.nl/demos/ica/
  • Aapo Hyvarinen ICA (1999)
  • http//www.cis.hut.fi/aapo/papers/NCS99web/node11.
    html
  • ICA demo step-by-step
  • http//www.cis.hut.fi/projects/ica/icademo/
  • Lots of links
  • http//sound.media.mit.edu/paris/ica.html
  • object-based audio capture demos
  • http//www.media.mit.edu/westner/sepdemo.html
  • Demo for BBS with CoBliSS (wav-files)
  • http//www.esp.ele.tue.nl/onderzoek/daniels/BSS.ht
    ml
  • Tomas Zemans page on BSS research
  • http//ica.fun-thom.misto.cz/page3.html
  • Virtual Laboratories in Probability and
    Statistics
  • http//www.math.uah.edu/stat/index.html
Write a Comment
User Comments (0)
About PowerShow.com