Title: A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction
1A Transductive Framework of Distance Metric
Learning by Spectral Dimensionality Reduction
- Fuxin Li1, Jian Yang2 and Jue Wang1
- 1 Insitute of Automation, Chinese Academy of
Sciences - 2 Beijing University of Technology
2Metric Learning What does it do?
- Given an apriori metric,
- metric learning tries to
- adapt it to the
- information coming
- from training items.
- For example, items from the same class can be
pushed together, while items from different
classes can be pull apart.
3Whats good?
- Many pattern recognition and machine learning
methods depend on a good choice of metric, which
may not be available every time. - As a compromise, Euclidean metric is often used
as the default one.
Questionable!
- Metric learning can be used to make things
better.
4Endless Learning Cycle
- Moreover, the learned metric can be used again as
an apriori metric in the next time.
5How to learn?
- The basic idea in many metric learning algorithms
is to balance the information from the training
data and the apriori metric. - Meanwhile, many methods seek to learn a
Mahalanobis metric in the original space or in
the feature space induced by kernels. - (Xing, Ng., Jordan, Russell 2003)
- (Kwok and Tsang 2003)
- (Weinberger, Blitzer and Saul 2006)
6Wait a minute
- We can get a Mahalanobis metric by linearly
transforming the input data. - Basically, when we say we learn a Mahalanobis
metric, we are looking for good directions in
the input space and projecting on them. - This reminds us of another field of research
dimensionality reduction.
7Dimensionality Reduction
- Linear or nonlinear.
- Try to find a low-dimensional representation of
the original data. - Nonlinear dimensionality reduction quite popular
in recent years. - Mainly unsupervised.
8And Metric Learning?
- Dimensionality reduction learns metrics.
- Metric learning a supervised or
semi-supervised version of dimensionality
reduction? - Under certain assumptions and formulations, yes.
- A combined view of these two topics can give
something interesting.
9A Metric Learning Formulation
- cijk is positive if dij should be larger than
dik, and xj, xk are neighbors of xi. - Intuitively, minimizing the first part of the
criterion moves labeled items with the same label
together and items with different label apart. - The second part ensures the apriori structure is
preserved.
10Graph Transduction
- A possible way to specify the second term is to
link the items as a graph and use the Laplacian
as the penalty P. - Intuitively, when dragging an item, the items
linked to it will follow.
11The Euclidean Assumption
- Further assume that the metric is Euclidean.
- Means that there exists a Euclidean space that we
can place all the input items in there as points,
and our metric is just the Euclidean metric in
the space. - This assumption was first used in metric learning
in (Zhang 2003) but they did not - make a connection with
- spectral methods.
12And Kernels
- To be Euclidean implies that there exists an
inner product in the space. - The distance is completely decided by the inner
product, so it suffices to learn the inner
product only. - For a finite sample, this means that it suffices
to learn the Gram matrix. Thus the metric
learning problem in this case is the same with
the kernel learning problem.
13Learning a Kernel
- The following semi-supervised optimization
problem learns a kernel matrix G of order nm - Ci and P are penalty matrices from the training
data and the apriori structure, respectively.
With suitable choices of these matrices, this
problem is the same with the previous one.
with G XXT. The rows of X give the coordinates
of each training item in a low- dimensional
Euclidean space.
14Dimensionality Reduction
Kernel PCA/Kernel MDS/Laplacian
Eigenmaps/Other Spectral Methods
(Semi-)Supervision
Ci
P
(Bengio et al, 2004)
Metric Learning Under the Euclidean assumption
With the particular loss function, it retained as
an eigenvalue problem in the semi-supervised
case. If we use a sparse graph Laplacian penalty,
the algorithm has a time complexity of O((nm)2).
15More to give RKHS regularization
- In the one-dimensional case, it is proved to be
equivalent with an RKHS regularization problem - Interesting because different from common
regularization problems, it penalizes pairwise
differences, a function of ( f (xi), f (xj))
instead of ( f (xi), yi). - Also gives a natural out-of-sample extension.
16Moving y to the weights
- The information in y moved into the weights.
- Set wij t(d(xi, xj), d(yi,yj)), possible to
solve problems with output y in any metric space. - Example multi-class classification
17The parameter ?
- ? controls the strength of our prior belief.
- Intuitively, a learned guy should have stronger
prior beliefs not easily changed by what he sees.
While an newbie might have weaker beliefs that
changes rapidly from observations. - However, in semi-supervised settings it is
difficult to decide a good value of ?. In our
experiments it is decided by grid search.
18Experiments Two Moons
19Experiments UCI Data
20Experiments MNIST
- In the MNIST experiments we experimented with
Tangent Distance (Simard et al. 1998) to see if a
better apriori knowledge can result in better
results. - The result shows that Tangent Distance is much
better than Euclidean Distance, while our
algorithm has only marginal improvements over
Laplacian Eigenmaps in this case.
21Conclusion
- The work is preliminary but a framework for
distance metric learning is useful. - Under the Euclidean assumption, the distance
metric learning problem can be done by adding
label information to spectral dimensionality
reduction methods. - It also gives a regularization problem that is
different from others and quite interesting.
22Ongoing Work
- We are currently experimenting with different
loss functions other than the current one. - Some particularly interesting loss functions
include hinge loss and exponential loss. - However optimizing on these loss functions
require semi-definite programming or convex
optimization, and the scalability is not as good
as the current algorithm. We are working to find
fast solvers of the optimization problems.
23Beyond Euclidean
- What will happen when the Euclidean assumption is
not satisfied? - A particularly interesting scenario is that the
Euclidean assumption only satisfies locally. - Then the dataset is locally homeomorphic to a
subset on a Euclidean space, which means it lies
on a topological manifold. - Different from locally linear techniques in
current manifold learning, it is possible to
design locally Euclidean techniques grounding
from the current framework.
24Thanks!