A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction

Description:

A Transductive Framework of Distance Metric Learning by Spectral ... cijk is positive if dij should be larger than dik, and xj, xk are neighbors of xi. ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 25
Provided by: velblodVid
Category:

less

Transcript and Presenter's Notes

Title: A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction


1
A Transductive Framework of Distance Metric
Learning by Spectral Dimensionality Reduction
  • Fuxin Li1, Jian Yang2 and Jue Wang1
  • 1 Insitute of Automation, Chinese Academy of
    Sciences
  • 2 Beijing University of Technology

2
Metric Learning What does it do?
  • Given an apriori metric,
  • metric learning tries to
  • adapt it to the
  • information coming
  • from training items.
  • For example, items from the same class can be
    pushed together, while items from different
    classes can be pull apart.

3
Whats good?
  • Many pattern recognition and machine learning
    methods depend on a good choice of metric, which
    may not be available every time.
  • As a compromise, Euclidean metric is often used
    as the default one.

Questionable!
  • Metric learning can be used to make things
    better.

4
Endless Learning Cycle
  • Moreover, the learned metric can be used again as
    an apriori metric in the next time.

5
How to learn?
  • The basic idea in many metric learning algorithms
    is to balance the information from the training
    data and the apriori metric.
  • Meanwhile, many methods seek to learn a
    Mahalanobis metric in the original space or in
    the feature space induced by kernels.
  • (Xing, Ng., Jordan, Russell 2003)
  • (Kwok and Tsang 2003)
  • (Weinberger, Blitzer and Saul 2006)

6
Wait a minute
  • We can get a Mahalanobis metric by linearly
    transforming the input data.
  • Basically, when we say we learn a Mahalanobis
    metric, we are looking for good directions in
    the input space and projecting on them.
  • This reminds us of another field of research
    dimensionality reduction.

7
Dimensionality Reduction
  • Linear or nonlinear.
  • Try to find a low-dimensional representation of
    the original data.
  • Nonlinear dimensionality reduction quite popular
    in recent years.
  • Mainly unsupervised.

8
And Metric Learning?
  • Dimensionality reduction learns metrics.
  • Metric learning a supervised or
    semi-supervised version of dimensionality
    reduction?
  • Under certain assumptions and formulations, yes.
  • A combined view of these two topics can give
    something interesting.

9
A Metric Learning Formulation
  • cijk is positive if dij should be larger than
    dik, and xj, xk are neighbors of xi.
  • Intuitively, minimizing the first part of the
    criterion moves labeled items with the same label
    together and items with different label apart.
  • The second part ensures the apriori structure is
    preserved.

10
Graph Transduction
  • A possible way to specify the second term is to
    link the items as a graph and use the Laplacian
    as the penalty P.
  • Intuitively, when dragging an item, the items
    linked to it will follow.

11
The Euclidean Assumption
  • Further assume that the metric is Euclidean.
  • Means that there exists a Euclidean space that we
    can place all the input items in there as points,
    and our metric is just the Euclidean metric in
    the space.
  • This assumption was first used in metric learning
    in (Zhang 2003) but they did not
  • make a connection with
  • spectral methods.

12
And Kernels
  • To be Euclidean implies that there exists an
    inner product in the space.
  • The distance is completely decided by the inner
    product, so it suffices to learn the inner
    product only.
  • For a finite sample, this means that it suffices
    to learn the Gram matrix. Thus the metric
    learning problem in this case is the same with
    the kernel learning problem.

13
Learning a Kernel
  • The following semi-supervised optimization
    problem learns a kernel matrix G of order nm
  • Ci and P are penalty matrices from the training
    data and the apriori structure, respectively.
    With suitable choices of these matrices, this
    problem is the same with the previous one.

with G XXT. The rows of X give the coordinates
of each training item in a low- dimensional
Euclidean space.
14
Dimensionality Reduction
Kernel PCA/Kernel MDS/Laplacian
Eigenmaps/Other Spectral Methods
(Semi-)Supervision

Ci
P
(Bengio et al, 2004)
Metric Learning Under the Euclidean assumption

With the particular loss function, it retained as
an eigenvalue problem in the semi-supervised
case. If we use a sparse graph Laplacian penalty,
the algorithm has a time complexity of O((nm)2).
15
More to give RKHS regularization
  • In the one-dimensional case, it is proved to be
    equivalent with an RKHS regularization problem
  • Interesting because different from common
    regularization problems, it penalizes pairwise
    differences, a function of ( f (xi), f (xj))
    instead of ( f (xi), yi).
  • Also gives a natural out-of-sample extension.

16
Moving y to the weights
  • The information in y moved into the weights.
  • Set wij t(d(xi, xj), d(yi,yj)), possible to
    solve problems with output y in any metric space.
  • Example multi-class classification

17
The parameter ?
  • ? controls the strength of our prior belief.
  • Intuitively, a learned guy should have stronger
    prior beliefs not easily changed by what he sees.
    While an newbie might have weaker beliefs that
    changes rapidly from observations.
  • However, in semi-supervised settings it is
    difficult to decide a good value of ?. In our
    experiments it is decided by grid search.

18
Experiments Two Moons
19
Experiments UCI Data
20
Experiments MNIST
  • In the MNIST experiments we experimented with
    Tangent Distance (Simard et al. 1998) to see if a
    better apriori knowledge can result in better
    results.
  • The result shows that Tangent Distance is much
    better than Euclidean Distance, while our
    algorithm has only marginal improvements over
    Laplacian Eigenmaps in this case.

21
Conclusion
  • The work is preliminary but a framework for
    distance metric learning is useful.
  • Under the Euclidean assumption, the distance
    metric learning problem can be done by adding
    label information to spectral dimensionality
    reduction methods.
  • It also gives a regularization problem that is
    different from others and quite interesting.

22
Ongoing Work
  • We are currently experimenting with different
    loss functions other than the current one.
  • Some particularly interesting loss functions
    include hinge loss and exponential loss.
  • However optimizing on these loss functions
    require semi-definite programming or convex
    optimization, and the scalability is not as good
    as the current algorithm. We are working to find
    fast solvers of the optimization problems.

23
Beyond Euclidean
  • What will happen when the Euclidean assumption is
    not satisfied?
  • A particularly interesting scenario is that the
    Euclidean assumption only satisfies locally.
  • Then the dataset is locally homeomorphic to a
    subset on a Euclidean space, which means it lies
    on a topological manifold.
  • Different from locally linear techniques in
    current manifold learning, it is possible to
    design locally Euclidean techniques grounding
    from the current framework.

24
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com