Title: Discriminant Adaptive Nearest Neighbor Classification
1Discriminant Adaptive Nearest Neighbor
ClassificationDistance metric learning, with
application to clustering with side-information
2k-NN classification
- Given n training pairs with
and denoting class membership. - Given new x0, predict class y0
- Find K training points x(i) closest in distance
- to x0
- Classify using majority vote
-
3Radius of NN neighborhood
- N data points uniformly distributed in a unit
cube -1/2 , 1/2d let R be the radius of a 1
nearest neighbor centered at the origin. vdrd is
the volume of sphere with radius r in d
dimensions
4Solution
- Nearest neighbor techniques are based on the
assumption that locally the class posterior
probabilities P(jx) are approximately constant. - In high-dimensions, nearest neighbors are far
away causing bias and degrading performance. - Adapt metric used in k-NN, so that resulting
neighborhoods stretch out in directions in which
the class probabilities change the least
5Discriminant Adaptive NN
- Two classes in two dimensions, Class 1 almost
completely surrounds Class 2. - The modified neighborhood extends further
parallel to the decision boundaries and shrinks
the neighborhood in the direction orthogonal to
the decision boundary.
6DANN metric
- The metric S is defined by
- W (within-class covariance matrix)
- B
- between class class covariance matrix
7DANN neighborhoods
8DANN Classifier
- Initialize the metric S I
- Spread out the nearest neighborhood of KM points
around the test point x0, in the metric S. - Calculate the weighted between and within
sum-of-square matrices W and B using the points
in the neighborhood. - Define a new metric
- Iterate steps 2,3 and 4
- After completion using S for K-NN classification
at the test point x0.
9Parameters
- Km should be relatively large min (50,n/5)
- K should be around 5
- 1 gave good results
10Global Dimension Reduction
- For the local neighborhood N(i) of xi, the local
class centroids are contained in a subspace
useful for classification. - At each training point xi, the between-centroids
sum of square matrix Bi is computed, and then
these matrices are averaged over all training
points - The eigenvectors e1, e2, ep of the matrix
span the optimal subspaces for global subspace
reduction.
11Global Dimension Reduction
- Eigenvalues of for a two class, 4
dimensional sphere model with 6 noise dimensions - Decision boundary is a 4 dimensional sphere.
12Global Dimension Reduction
- Two dimensional Gaussian data with two classes
(substantial within class covariance).
13Distance metric learning
- Datamining algorithms require good metrics that
reflect the important relations between the data - If a user indicates certain points in input space
are similar can, can we learn a metric that
assigns small distances between similar pairs - Can be used in preprocessing step to help
unsupervised algorithms find better solutions.
14Distance metric learning, with application to
clustering with side-informationE.P. Xing,
A.Y. Ng, M.I. Jordan and S. Russell
15Learning Distance Metrics
- Given set S
- Consider distance metric
- Positive semi-definite (to satisfy triangle
inequality) - d(x,y)0 does not imply x y
- If AI Euclidean Distance
- If ADiagonal Mahalanobis Distances
- Equals rescaling datapoint x to A1/2x and
applying standard Euclidean metric to the
rescaled data.
16Learning the Metric
- S set of similar points, D set of dissimilar
points - To learn diagonal A, use Newton-Rhapson method to
minimize - To learn full A use gradient ascent and iterative
analysis
17Experiments
- Center and left panels represent rescaled data
(diagonal A and full A) x -gt A1/2x -
18K-means Clustering
- Learn metric using side information and use the
metric to cluster data - K-means using Euclidean metric
- Constrained K-means subject to points always
being assigned to the same cluster - K-means metric K-means with distortion defined
using the learned distance metric - Constrained K-means using the learned distance
metric
19Clustering Results
- K-means 0.4975
- Constrained K-means 0.5060
- K-means metric 1
- Constrained K-means metric 1
20Clustering Results
- K-means 0.4993
- Constrained K-means 0.5701
- K-means metric 1
- Constrained K-means metric 1
21Clustering Results
- Accuracy vs. side-information for UCI protein and
wine data set.