A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction

About This Presentation

Title:

A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction

Description:

A Transductive Framework of Distance Metric Learning by Spectral ... cijk is positive if dij should be larger than dik, and xj, xk are neighbors of xi. ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 25

Provided by: velblodVid

Category:

more less

Transcript and Presenter's Notes

Title: A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction

1
A Transductive Framework of Distance Metric
Learning by Spectral Dimensionality Reduction

Fuxin Li1, Jian Yang2 and Jue Wang1
1 Insitute of Automation, Chinese Academy of
Sciences
2 Beijing University of Technology

2
Metric Learning What does it do?

Given an apriori metric,
metric learning tries to
adapt it to the
information coming
from training items.
For example, items from the same class can be
pushed together, while items from different
classes can be pull apart.

3
Whats good?

Many pattern recognition and machine learning
methods depend on a good choice of metric, which
may not be available every time.
As a compromise, Euclidean metric is often used
as the default one.

Questionable!

Metric learning can be used to make things
better.

4
Endless Learning Cycle

Moreover, the learned metric can be used again as
an apriori metric in the next time.

5
How to learn?

The basic idea in many metric learning algorithms
is to balance the information from the training
data and the apriori metric.
Meanwhile, many methods seek to learn a
Mahalanobis metric in the original space or in
the feature space induced by kernels.
(Xing, Ng., Jordan, Russell 2003)
(Kwok and Tsang 2003)
(Weinberger, Blitzer and Saul 2006)

6
Wait a minute

We can get a Mahalanobis metric by linearly
transforming the input data.
Basically, when we say we learn a Mahalanobis
metric, we are looking for good directions in
the input space and projecting on them.
This reminds us of another field of research
dimensionality reduction.

7
Dimensionality Reduction

Linear or nonlinear.
Try to find a low-dimensional representation of
the original data.
Nonlinear dimensionality reduction quite popular
in recent years.
Mainly unsupervised.

8
And Metric Learning?

Dimensionality reduction learns metrics.
Metric learning a supervised or
semi-supervised version of dimensionality
reduction?
Under certain assumptions and formulations, yes.
A combined view of these two topics can give
something interesting.

9
A Metric Learning Formulation

cijk is positive if dij should be larger than
dik, and xj, xk are neighbors of xi.
Intuitively, minimizing the first part of the
criterion moves labeled items with the same label
together and items with different label apart.
The second part ensures the apriori structure is
preserved.

10
Graph Transduction

A possible way to specify the second term is to
link the items as a graph and use the Laplacian
as the penalty P.
Intuitively, when dragging an item, the items
linked to it will follow.

11
The Euclidean Assumption

Further assume that the metric is Euclidean.
Means that there exists a Euclidean space that we
can place all the input items in there as points,
and our metric is just the Euclidean metric in
the space.
This assumption was first used in metric learning
in (Zhang 2003) but they did not
make a connection with
spectral methods.

12
And Kernels

To be Euclidean implies that there exists an
inner product in the space.
The distance is completely decided by the inner
product, so it suffices to learn the inner
product only.
For a finite sample, this means that it suffices
to learn the Gram matrix. Thus the metric
learning problem in this case is the same with
the kernel learning problem.

13
Learning a Kernel

The following semi-supervised optimization
problem learns a kernel matrix G of order nm
Ci and P are penalty matrices from the training
data and the apriori structure, respectively.
With suitable choices of these matrices, this
problem is the same with the previous one.

with G XXT. The rows of X give the coordinates
of each training item in a low- dimensional
Euclidean space.
14
Dimensionality Reduction
Kernel PCA/Kernel MDS/Laplacian
Eigenmaps/Other Spectral Methods
(Semi-)Supervision

Ci
P
(Bengio et al, 2004)
Metric Learning Under the Euclidean assumption

With the particular loss function, it retained as
an eigenvalue problem in the semi-supervised
case. If we use a sparse graph Laplacian penalty,
the algorithm has a time complexity of O((nm)2).
15
More to give RKHS regularization

In the one-dimensional case, it is proved to be
equivalent with an RKHS regularization problem
Interesting because different from common
regularization problems, it penalizes pairwise
differences, a function of ( f (xi), f (xj))
instead of ( f (xi), yi).
Also gives a natural out-of-sample extension.

16
Moving y to the weights

The information in y moved into the weights.
Set wij t(d(xi, xj), d(yi,yj)), possible to
solve problems with output y in any metric space.
Example multi-class classification

17
The parameter ?

? controls the strength of our prior belief.
Intuitively, a learned guy should have stronger
prior beliefs not easily changed by what he sees.
While an newbie might have weaker beliefs that
changes rapidly from observations.
However, in semi-supervised settings it is
difficult to decide a good value of ?. In our
experiments it is decided by grid search.

18
Experiments Two Moons
19
Experiments UCI Data
20
Experiments MNIST

In the MNIST experiments we experimented with
Tangent Distance (Simard et al. 1998) to see if a
better apriori knowledge can result in better
results.
The result shows that Tangent Distance is much
better than Euclidean Distance, while our
algorithm has only marginal improvements over
Laplacian Eigenmaps in this case.

21
Conclusion

The work is preliminary but a framework for
distance metric learning is useful.
Under the Euclidean assumption, the distance
metric learning problem can be done by adding
label information to spectral dimensionality
reduction methods.
It also gives a regularization problem that is
different from others and quite interesting.

22
Ongoing Work

We are currently experimenting with different
loss functions other than the current one.
Some particularly interesting loss functions
include hinge loss and exponential loss.
However optimizing on these loss functions
require semi-definite programming or convex
optimization, and the scalability is not as good
as the current algorithm. We are working to find
fast solvers of the optimization problems.

23
Beyond Euclidean

What will happen when the Euclidean assumption is
not satisfied?
A particularly interesting scenario is that the
Euclidean assumption only satisfies locally.
Then the dataset is locally homeomorphic to a
subset on a Euclidean space, which means it lies
on a topological manifold.
Different from locally linear techniques in
current manifold learning, it is possible to
design locally Euclidean techniques grounding
from the current framework.

24
Thanks!

Write a Comment

User Comments (0)