Measure Independence in Kernel Space - PowerPoint PPT Presentation

About This Presentation
Title:

Measure Independence in Kernel Space

Description:

Kernel trick: defining a map from X to a feature space F, ... Feature Selection. See the demo for application in ICA... Thank you!!! Questions? ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 27
Provided by: cisTe
Category:

less

Transcript and Presenter's Notes

Title: Measure Independence in Kernel Space


1
Measure Independence in Kernel Space
  • Presented by
  • Qiang Lou

2
References
  • I made slides based on following papers
  • F. Bach and M. Jordan. Kernel Independent
    Component Analysis. Journal of Machine Learning
    Research, 2002.
  • Arthur Gretton, Ralf herbrich, Alexander Smola,
    Olivier Bousquet, Bernhard Scholkopf. Kernel
    Methods for Measuring Independence. Journal of
    Machine Learning and Research, 2005.

3
Outline
  • Introduction
  • Canonical Correlation
  • Kernel Canonical Correlation
  • Application Example

4
Introduction
  • What is Independence?
  • Intuitively, two variables y1, y2 are said to be
    independent if information on value of one
    variable does not give any information on the
    value of the other variable.
  • Technically, y1 and y2 are independent if and
    only if and only if the joint pdf is factorizable
    in the following way
  • p(y1, y2) p1(y1)p2(y2)

5
Introduction
  • How to measure Independence.
  • --Can we use correlation?
  • --Uncorrelated variables means Independent
    variables?
  • Remark
  • y1 and y2 are uncorrelated means
  • Ey1 y2 Ey1Ey2 0

6
Introduction
  • The answer is No
  • Fact
  • Independence implies uncorrelatedness, but the
    reverse is not true.
  • Which means
  • p(y1, y2) p1(y1)p2(y2) ? Ey1 y2
    Ey1Ey2 0
  • Ey1 y2 Ey1Ey2 0 ? p(y1, y2)
    p1(y1)p2(y2)
  • This is easy to prove

7
Introduction
  • Now comes the question
  • How to measure independence?

8
Canonical Correlation
  • Canonical Correlation Analysis (CCA) is concerned
    with finding a pair of linear transformations
    such that one component within each set of
    transformed variables is correlated with a single
    component in the other set.
  • We focus on the first canonical correlation which
    is defined as the maximum possible correlation
    between the two projections and of x1 and
    x2

C is the covariance matrix of (x1, x2)
9
Canonical Correlation
  • Taking derivatives with respect to and , we
    obtain

10
Canonical Correlation
11
Canonical Correlation
12
Canonical Correlation
So, it can be extended to more than two sets of
variables (find smallest eigenvalue)
13
Kernel Canonical Correlation
Kernel trick defining a map from X to a
feature space F, such that we can find a kernel
satisfying
14
Kernel Canonical Correlation
F-correlation -- canonical correlation between
F1(x1) and F2(x2)
15
Kernel Canonical Correlation
Notes X1 and x2 are independent implies value
of is 0. Is the converse true? -- If F
is large, its true. -- If F is the space
corresponding to a Gaussian Kernel which is
positive definite kernel on X R
16
Kernel Canonical Correlation
Estimation of the F-correlation -- kernelized
version of canonical correlation
We will show that depends only on Gram
matrices K1 and K2 of these observations, we will
use to denote this canonical
correlation. Suppose the data are centered in
feature space. (i.e.
)
17
Kernel Canonical Correlation
We want to know
Which means we want to know three things
18
Kernel Canonical Correlation
For fixed f1 and f2, the empirical covariance of
the projections in feature can be written
19
Kernel Canonical Correlation
Similarly, we can get the following
20
Kernel Canonical Correlation
Put three expressions together, we get
Similar with the problem we talked before, this
is equivalent to the following generalized
eigenvalue problem
21
Kernel Canonical Correlation
Problem suppose that the Gram matrices K1 and K2
have full rank, canonical correlation will
always be 1, whatever K1 and K2 are. Let V1
and V2 denote the subspaces of RN generated by
the columns of K1 and K2, then we can
rewrite If K1 and K2 have full rank, V1 and V2
would be equal to RN
22
Kernel Canonical Correlation
Solution regularization by penalizing the
norm of f1 and f2, so we get the regularized
F-correlation as following where k is a small
positive constant. We expand
23
Kernel Canonical Correlation
Now we can get regularized KCC
24
Kernel Canonical Correlation
Generalizing to more than two sets of variables,
its equivalent to the generalized eigenvalue
problem
25
Example Application
Applications -- ICA (Independent
Component Analysis) -- Feature Selection
See the demo for application
in ICA
26
Thank you!!! ?
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com