Discriminant Adaptive Nearest Neighbor Classification - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Discriminant Adaptive Nearest Neighbor Classification

Description:

The modified neighborhood extends further parallel to the decision boundaries ... DANN neighborhoods. DANN Classifier. Initialize the metric S = I ... – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 22
Provided by: thomas343
Category:

less

Transcript and Presenter's Notes

Title: Discriminant Adaptive Nearest Neighbor Classification


1
Discriminant Adaptive Nearest Neighbor
ClassificationDistance metric learning, with
application to clustering with side-information
  • 02/15/04
  • Thomas DSilva

2
k-NN classification
  • Given n training pairs with
    and denoting class membership.
  • Given new x0, predict class y0
  • Find K training points x(i) closest in distance
  • to x0
  • Classify using majority vote

3
Radius of NN neighborhood
  • N data points uniformly distributed in a unit
    cube -1/2 , 1/2d let R be the radius of a 1
    nearest neighbor centered at the origin. vdrd is
    the volume of sphere with radius r in d
    dimensions

4
Solution
  • Nearest neighbor techniques are based on the
    assumption that locally the class posterior
    probabilities P(jx) are approximately constant.
  • In high-dimensions, nearest neighbors are far
    away causing bias and degrading performance.
  • Adapt metric used in k-NN, so that resulting
    neighborhoods stretch out in directions in which
    the class probabilities change the least

5
Discriminant Adaptive NN
  • Two classes in two dimensions, Class 1 almost
    completely surrounds Class 2.
  • The modified neighborhood extends further
    parallel to the decision boundaries and shrinks
    the neighborhood in the direction orthogonal to
    the decision boundary.

6
DANN metric
  • The metric S is defined by
  • W (within-class covariance matrix)
  • B
  • between class class covariance matrix

7
DANN neighborhoods
8
DANN Classifier
  • Initialize the metric S I
  • Spread out the nearest neighborhood of KM points
    around the test point x0, in the metric S.
  • Calculate the weighted between and within
    sum-of-square matrices W and B using the points
    in the neighborhood.
  • Define a new metric
  • Iterate steps 2,3 and 4
  • After completion using S for K-NN classification
    at the test point x0.

9
Parameters
  • Km should be relatively large min (50,n/5)
  • K should be around 5
  • 1 gave good results

10
Global Dimension Reduction
  • For the local neighborhood N(i) of xi, the local
    class centroids are contained in a subspace
    useful for classification.
  • At each training point xi, the between-centroids
    sum of square matrix Bi is computed, and then
    these matrices are averaged over all training
    points
  • The eigenvectors e1, e2, ep of the matrix
    span the optimal subspaces for global subspace
    reduction.

11
Global Dimension Reduction
  • Eigenvalues of for a two class, 4
    dimensional sphere model with 6 noise dimensions
  • Decision boundary is a 4 dimensional sphere.

12
Global Dimension Reduction
  • Two dimensional Gaussian data with two classes
    (substantial within class covariance).

13
Distance metric learning
  • Datamining algorithms require good metrics that
    reflect the important relations between the data
  • If a user indicates certain points in input space
    are similar can, can we learn a metric that
    assigns small distances between similar pairs
  • Can be used in preprocessing step to help
    unsupervised algorithms find better solutions.

14
Distance metric learning, with application to
clustering with side-informationE.P. Xing,
A.Y. Ng, M.I. Jordan and S. Russell
15
Learning Distance Metrics
  • Given set S
  • Consider distance metric
  • Positive semi-definite (to satisfy triangle
    inequality)
  • d(x,y)0 does not imply x y
  • If AI Euclidean Distance
  • If ADiagonal Mahalanobis Distances
  • Equals rescaling datapoint x to A1/2x and
    applying standard Euclidean metric to the
    rescaled data.

16
Learning the Metric
  • S set of similar points, D set of dissimilar
    points
  • To learn diagonal A, use Newton-Rhapson method to
    minimize
  • To learn full A use gradient ascent and iterative
    analysis

17
Experiments
  • Center and left panels represent rescaled data
    (diagonal A and full A) x -gt A1/2x

18
K-means Clustering
  • Learn metric using side information and use the
    metric to cluster data
  • K-means using Euclidean metric
  • Constrained K-means subject to points always
    being assigned to the same cluster
  • K-means metric K-means with distortion defined
    using the learned distance metric
  • Constrained K-means using the learned distance
    metric

19
Clustering Results
  • K-means 0.4975
  • Constrained K-means 0.5060
  • K-means metric 1
  • Constrained K-means metric 1

20
Clustering Results
  • K-means 0.4993
  • Constrained K-means 0.5701
  • K-means metric 1
  • Constrained K-means metric 1

21
Clustering Results
  • Accuracy vs. side-information for UCI protein and
    wine data set.
Write a Comment
User Comments (0)
About PowerShow.com