Intro to Latent Variable Models Independent Component Analysis - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Intro to Latent Variable Models Independent Component Analysis

Description:

The cocktail party problem -- called also blind source separation' (BSS) ... oldest galactic systems. believed to consist of a single population of old stars ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 31
Provided by: axk
Category:

less

Transcript and Presenter's Notes

Title: Intro to Latent Variable Models Independent Component Analysis


1
Intro to Latent Variable Models Independent
Component Analysis
  • Ata Kabán
  • http//www.cs.bham.ac.uk/axk

2
Overview
  • Today we learn about
  • The cocktail party problem -- called also blind
    source separation (BSS)
  • Independent Component Analysis (ICA) for solving
    BSS
  • The general form of factor models
  • Other applications of ICA / BSS
  • At an intuitive introductory practical level

3
Signals, joint density
Joint density
Signals
Amplitude S1(t)
Amplitude S2(t)
time
marginal densities
4
Original signals (hidden sources) s1(t), s2(t),
s3(t), s4(t), t1T
5
The ICA model
xi(t) ai1s1(t) ai2s2(t)
ai3s3(t) ai4s4(t) Here,
i14. In vector-matrix notation, and dropping
index t (i.e. assuming stationary s), this is x
A s
s3
s4
s1
s2
a13
a12
a11
a14
x1
x2
x3
x4
latent variables
6
This is recorded by the microphones a linear
mixture of the sources xi(t) ai1s1(t)
ai2s2(t) ai3s3(t) ai4s4(t)
7
  • The coctail party problem
  • Called also Blind Source Separation (BSS) problem
  • Ill posed problem, unless assumptions are made!
  • The most common assumption is that source signals
    are statistically independent. This means that
    knowing the value of one of them does not give
    any information about the other.
  • The methods based on this assumption are called
    Independent Component Analysis methods. These are
    statistical techniques of decomposing a complex
    data set into independent parts.
  • It can be shown that under some reasonable
    conditions, if the ICA assumption holds, then the
    source signals can be recovered up to permutation
    and scaling.

Determine the source signals, given only the
mixtures
8
Latent variable models
A wider class of time-independent latent variable
models has the following form
Two main design specifications required! 1)
p(s) 2) f(.)
observed data
latent variables
noise
parameters
Linear models with Gaussian latent prior FA,
PPCA, PCA Generalised linear models with discrete
latent prior FMM Linear models with non-Gaussian
latent prior IFA, ICA Linear model with latent
prior over ve domain ve parameters NMF
Non-linear models with uniform latent prior
GTM, LTM
9
What are these acronyms
  • These are some classical and quite useful models
    and associated techniques for data analysis
  • FA Factor Analysis
  • PCA Principal Component Analysis
  • PPCA Probabilistic Principal Component Analysis
  • FMM Finite Mixture Models
  • IFA Independent Factor Analysis
  • ICA Independent Component Analysis (noise-free
    IFA)
  • NMF Non-negative Matrix Factorisation
  • GTM Generative Topographic Mapping
  • LTM Latent Trait Model

10
Three main categories of models the intuition
Dense, distributed
Prototype-based
(for compression)
(for clustering)
Figures taken from Lee Seung, Nature 401, 788
(1999)
Sparse, distributed
(for compression, clustering, structure
discovery, data mining)
11
Back to the Cocktail Party
Recovered signals
12
Some further considerations
  • If we knew the mixing parameters aij then we
    would just need to solve a linear system of
    equations.
  • We know neither aij nor si.
  • ICA was initially developed to deal with problems
    closely related to the cocktail party problem
  • Later it became evident that ICA has many other
    applications too. E.g. from electrical recordings
    of brain activity from different locations of the
    scalp (EEG signals) recover underlying components
    of brain activity

13
Illustration of ICA with 2 signals
a1
a2
s2
x2
a2
a1
s1
x1
Original s
Mixed signals
14
Illustration of ICA with 2 signals
x2
x1
Mixed signals
Step2 Rotatation
Step1 Sphering
15
Illustration of ICA with 2 signals
a1
a2
s2
x2
a2
a1
s1
x1
Original s
Mixed signals
Step2 Rotatation
Step1 Sphering
16
Excluded case
There is one case when rotation doesnt matter.
This case cannot be solved by basic ICA.
Example of non-Gaussian density (-) vs.Gaussian
(-.) Seek non-Gaussian sources for two
reasons identifyability interestingness
Gaussians are not interesting since the
superposition of independent sources tends to be
Gaussian
when both densities are Gaussian
17
Computing the pre-processing steps for ICA
  • 0) Centring make the signals centred in zero
  • xi ? xi - Exi for each i
  • 1) Sphering make the signals uncorrelated. I.e.
    apply a transform V to x such that Cov(Vx)I //
    where Cov(y)EyyT denotes covariance matrix
  • VExxT-1/2 // can be done using sqrtm
    function in MatLab
  • x?Vx // for all t (indexes t dropped
    here)
  • // bold lowercase refers to column
    vector bold upper to matrix
  • Scope to make the remaining computations
    simpler. It is known that independent variables
    must be uncorrelated so this can be fulfilled
    before proceeding to the full ICA

18
Computing the rotation step
Aapo Hyvarinen (97) FastICA
  • Fixed Point Algorithm
  • Input X
  • Random init of W
  • Iterate until convergence
  • Output W, S

This is based on an the maximisation of an
objective function G(.) which contains an
approximate non-Gaussianity measure.
where g(.) is derivative of G(.),
W is the rotation transform sought
? is Lagrange multiplier to enforce that
W is an orthogonal transform i.e. a
rotationSolve by fixed point iterationsThe
effect of ? is an orthogonal de-correlation-
Cubic convergence
  • The overall transform then to take X back to S is
    (WTV)
  • There are several g(.) options, each will work
    best in special cases. See FastICA sw / tut for
    details.

19
Numerous other ways of approaching the ICA
problem
  • Higher order moments and cumulants Comon 94,
    Hyvarinen 97
  • Nonlinear PCA Karhunen 94 Oja 97
  • Maximalization of information transfer Bell
    Sejnowski 95 Amari 96 Lee 97-98
  • Maximum likelihood MacKay 96 Pearlmutter
    Parra 96 Cardoso 97
  • Negentropy maximisation Girolami Fyfe 97
  • Probabilistic Bayesian formulations of the
    noisy ICA Mackay, Valpola, Bishop, etc
    recent advances.
  • More general, more principled
  • More difficult since the exact inference is
    intractable due to the non-Gaussian continuous
    latent variables in the model

20
The ML derivation of ICA
David J.C. MacKay (97)
The case of invertible mixing matrix
Natural Gradient Algorithm Random init of
W Iterate until convergence sWx
WWe(If(s)sT)W
xAs sWx, where WA-1
If p(s)exp(-s) then gt f(s)-sign(s) sparse
(super-Gaussian) sources
Note that the latent prior is encoded in the
nonlinear function f(.)!?That is why the choice
of f(.) is problem-dependent.
21
Application domains of ICA
  • Blind source separation (BellSejnowski, Te won
    Lee, Girolami, Hyvarinen, etc.)
  • Image denoising (Hyvarinen)
  • Medical signal processing fMRI, ECG, EEG
    (Mackeig)
  • Modelling of the hippocampus and visual cortex
    (Lorincz, Hyvarinen)
  • Feature extraction, face recognition (Marni
    Bartlett)
  • Compression, redundancy reduction
  • Watermarking (D Lowe)
  • Clustering (Girolami, Kolenda)
  • Time series analysis (Back, Valpola)
  • Topic extraction (Kolenda, Bingham, Kabán)
  • Scientific Data Mining

22
Image de-noising
Noisy image
Original image
Wiener filtering
ICA filtering
23
Clustering
In multi-variate data search for the direction
along of which the projection of the data is
maximally non-Gaussian has the most structure
24
Blind Separation of Information from Galaxy
Spectra
25
Application Mining stellar populations of
elliptical galaxies
Nolan, Harva, Kabán Raychaudhury, in MNRAS, 2006.
  • Elliptical galaxies
  • oldest galactic systems
  • believed to consist of a single population of
    old stars
  • recent theories indicate the presence of younger
    populations of stars
  • what does the data tell us?

26
Problem analysis
  • Data
  • One optical spectrum per galaxy
  • This is a linear superposition of the spectra of
    its billions of constituent stars
  • If both young and old stars exist in a galaxy,
    their emissions cannot be measured separately
  • Inverse problem
  • Find out spectral components that can explain all
    elliptical spectra by an unknown linear
    superposition
  • Prior knowledge from general characteristics of
    data
  • Spectral elements are positive
  • The flux at neighboring wavelength is likely to
    be similar
  • Formulated these prior knowledge in the ICA model
    in order to obtain components with these
    characteristics (details skipped)

27
Decomposition using Physical Models
Decomposition using ICA
28
Physical interpretability of the components found
Young stellar populations
Old stellar populations
Physical models
ICA models
NMF cICA
29
Summing Up
  • Assumption that the data consists of unknown
    components
  • Analogously to individual signals in an acoustic
    mixing
  • Trying to solve the inverse problem
  • From observing the superposition only
  • Recover components of specified non-Gaussian
    density form
  • Components often give simpler, clearer view of
    the data which makes ICA useful in data mining

30
ICA Related resources / Starting Points
  • http//www.cis.hut.fi/projects/ica/cocktail/cockta
    il_en.cgiDemo and links to further info on ICA.
  • http//www.cis.hut.fi/projects/ica/fastica/code/dl
    code.shtmlICA software in MatLab.
  • http//www.cs.helsinki.fi/u/ahyvarin/papers/NN00ne
    w.pdf Comprehensive tutorial paper
Write a Comment
User Comments (0)
About PowerShow.com