Title: Learning Instance Specific Distance Using Metric Propagation
1Learning Instance Specific Distance Using Metric
Propagation
- De-Chuan Zhan, Ming Li, Yu-Feng Li, Zhi-Hua Zhou
- LAMDA Group
- National Key Lab for Novel Software Technology
- Nanjing University, China
- zhandc, lim, liyf, zhouzh_at_lamda.nju.edu.cn
2Distance based classification
K-nearest neighbor classification SVM with
Gaussian kernels
Is the distance reliable?
Are there any more natural measurements?
3Any more natural measurements?
When sky is compared to other pictures
Color, probably texture features
When Phelps II is compared to other athletes
Speed of swimming, shape of feet
Can we assign a specific distance measurement for
each instance, both labeled and unlabeled?
our work
4Outline
- Introduction
- Our Methods
- Experiments
- Conclusion
5Introduction Distance Metric Learning
- Many machine learning algorithms rely on the
distance metric for input data patterns. - Classification
- Clustering
- Retrieval
There are many metric learning algorithms
developed Yang, 2006
Problem Focus on learning a uniform Mahalanobis
distance for ALL instances
6Introduction Other distance functions
- Instead of applying a uniform distance metric for
every example, it is more natural to measure
distances according to specific properties of
data - Some researchers define distance from samples
own perspective
- QSim Zhou and Dai, ICDM06 Athitsos et al.,
TDS07 - Local distance functions Frome et al., NIPS06,
ICCV07
7Introduction Query sensitive similarity
Actually, instance specific similarities or query
specific similarities are studied in other fields
before
In content-based image retrieval, there has been
a study which tries to compute query sensitive
similarities. The similarities among different
images are decided after receiving a query image.
Zhou and Dai, ICDM06
The problem Query similarity is based on
pure heuristics.
8Introduction Local distance functions
The distance from the j-th instance to the i-th
instance is larger than that from the j-th to the
k-th
DjigtDjk
1. Cannot generalize directly 2.The local
distance defined is not directly comparable.
DijgtDkj
All constraints can be tired together. Requiring
more heuristics for testing.
The problem Local distance functions for
unlabeled data are N/A.
9Introduction Our Work
Can we assign a specific distance measurement for
each instance, both labeled and unlabeled?
Yes, we learn Instance Specific Distance via
Metric Propagation
10Outline
- Introduction
- Our Methods
- Experiments
- Conclusion
11Our Methods Intuition
- Focus on learning instance specific distance for
both labeled and unlabeled data.
- For labeled data
- the pair of examples come from the same class
should be closer to each other
- For unlabeled data
- Metric propagation on a relationship graph
12Our Methods The ISD Framework
- Instead of directly conducting metric propagation
while learning the distances for labeled
examples, we formulate the metric propagation
with a regularized framework.
The j-th instance belongs to a class other than
the i-th, or the j-th instance is a neighbor of
i-th instance, i.e., all Cannot-links and some of
the must-links are considered
13Our Methods The ISD Framework relationship to
FSM
Although only pair-wised side information is
investigated in our work, the ISD Framework is a
common frame
FSM Frome et al. NIPS06 is a special case of
ISD
14Our Methods The ISD Framework update graph
15Our Methods ISD with L1-loss
Convex problem ? we employ the alternating
descent method to solve it, i.e. to sequentially
solve one w for one instance at each time by
fixing other ws till converges or maxiters
reached.
16Our Methods ISD with L1-loss (cont)
Primal
17Our Methods Acceleration ISD with L2-loss
- For acceleration
- The alternating descent method is used to solve
the problem - Reduce the number of constraints by considering
some must- - links
However, the number of inequality constraints may
be large
18Our Methods Acceleration ISD with L2-loss
19Outline
- Introduction
- Our Methods
- Experiments
- Conclusion
20Experiments Configurations
- Data sets
- 15 UCI data sets
- COREL image dataset (20 classes, 100
images/class) - 2/3 labeled training set 1/3 unlabeled for
testing, 30 runs - Compared methods
- ISD-L1/L2
- FSM/FSSM (Frome et al. 2006 2007)
- LMNN (Weinberger et al. 2005)
- DNE (Zhang et al, 2007)
- Parameters are selected via cross validation
21Experiments Classification Performance
Comparison of test error rates (meanstd.)
22Experiments Influence of the number of iteration
rounds
Updating rounds Starting from Euclidean
The error rates of ISD-L1 are reduced on most
datasets as the number of update increasing
The error rates of ISD-L2 reduce on some datasets.
The error rates of ISD-L2 reduce on some
datasets. However, on others, the performance are
degenerated. Overfitting L2-loss is more
sensitive to noise
23Experiments Influence of the amount of labeled
data
ISD is less sensitive to the influence of the
amount of labeled data
When the amount of labeled samples is limited,
the superiority of ISD is more apparent
24Conclusion
- Main contribution
- A method for learning instance-specific distance
for labeled as well as unlabeled instances. - Future work
- The construction of the initial graph
- Label propagation, metric propagation, any more
properties to propagate?
Thanks!