Measure Independence in Kernel Space - PowerPoint PPT Presentation

About This Presentation

Title:

Measure Independence in Kernel Space

Description:

Kernel trick: defining a map from X to a feature space F, ... Feature Selection. See the demo for application in ICA... Thank you!!! Questions? ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 27

Provided by: cisTe

Category:

more less

Transcript and Presenter's Notes

Title: Measure Independence in Kernel Space

1
Measure Independence in Kernel Space

Presented by
Qiang Lou

2
References

I made slides based on following papers
F. Bach and M. Jordan. Kernel Independent
Component Analysis. Journal of Machine Learning
Research, 2002.
Arthur Gretton, Ralf herbrich, Alexander Smola,
Olivier Bousquet, Bernhard Scholkopf. Kernel
Methods for Measuring Independence. Journal of
Machine Learning and Research, 2005.

3
Outline

Introduction
Canonical Correlation
Kernel Canonical Correlation
Application Example

4
Introduction

What is Independence?
Intuitively, two variables y1, y2 are said to be
independent if information on value of one
variable does not give any information on the
value of the other variable.
Technically, y1 and y2 are independent if and
only if and only if the joint pdf is factorizable
in the following way
p(y1, y2) p1(y1)p2(y2)

5
Introduction

How to measure Independence.
--Can we use correlation?
--Uncorrelated variables means Independent
variables?
Remark
y1 and y2 are uncorrelated means
Ey1 y2 Ey1Ey2 0

6
Introduction

The answer is No
Fact
Independence implies uncorrelatedness, but the
reverse is not true.
Which means
p(y1, y2) p1(y1)p2(y2) ? Ey1 y2
Ey1Ey2 0
Ey1 y2 Ey1Ey2 0 ? p(y1, y2)
p1(y1)p2(y2)
This is easy to prove

7
Introduction

Now comes the question
How to measure independence?

8
Canonical Correlation

Canonical Correlation Analysis (CCA) is concerned
with finding a pair of linear transformations
such that one component within each set of
transformed variables is correlated with a single
component in the other set.
We focus on the first canonical correlation which
is defined as the maximum possible correlation
between the two projections and of x1 and
x2

C is the covariance matrix of (x1, x2)
9
Canonical Correlation

Taking derivatives with respect to and , we
obtain

10
Canonical Correlation
11
Canonical Correlation
12
Canonical Correlation
So, it can be extended to more than two sets of
variables (find smallest eigenvalue)
13
Kernel Canonical Correlation
Kernel trick defining a map from X to a
feature space F, such that we can find a kernel
satisfying
14
Kernel Canonical Correlation
F-correlation -- canonical correlation between
F1(x1) and F2(x2)
15
Kernel Canonical Correlation
Notes X1 and x2 are independent implies value
of is 0. Is the converse true? -- If F
is large, its true. -- If F is the space
corresponding to a Gaussian Kernel which is
positive definite kernel on X R
16
Kernel Canonical Correlation
Estimation of the F-correlation -- kernelized
version of canonical correlation
We will show that depends only on Gram
matrices K1 and K2 of these observations, we will
use to denote this canonical
correlation. Suppose the data are centered in
feature space. (i.e.
)
17
Kernel Canonical Correlation
We want to know
Which means we want to know three things
18
Kernel Canonical Correlation
For fixed f1 and f2, the empirical covariance of
the projections in feature can be written
19
Kernel Canonical Correlation
Similarly, we can get the following
20
Kernel Canonical Correlation
Put three expressions together, we get
Similar with the problem we talked before, this
is equivalent to the following generalized
eigenvalue problem
21
Kernel Canonical Correlation
Problem suppose that the Gram matrices K1 and K2
have full rank, canonical correlation will
always be 1, whatever K1 and K2 are. Let V1
and V2 denote the subspaces of RN generated by
the columns of K1 and K2, then we can
rewrite If K1 and K2 have full rank, V1 and V2
would be equal to RN
22
Kernel Canonical Correlation
Solution regularization by penalizing the
norm of f1 and f2, so we get the regularized
F-correlation as following where k is a small
positive constant. We expand
23
Kernel Canonical Correlation
Now we can get regularized KCC
24
Kernel Canonical Correlation
Generalizing to more than two sets of variables,
its equivalent to the generalized eigenvalue
problem
25
Example Application
Applications -- ICA (Independent
Component Analysis) -- Feature Selection
See the demo for application
in ICA
26
Thank you!!! ?