Title: Optimal Dimensionality of Metric Space for kNN Classification
1Optimal Dimensionality of Metric Space for kNN
Classification
- Wei Zhang, Xiangyang Xue, Zichen Sun
- Yuefei Guo, and Hong Lu
- Dept. of Computer Science Engineering
- FUDAN University, Shanghai, China
2Outline
- Motivation
- Related Work
- Main Idea
- Proposed Algorithm
- Discriminant Neighborhood Embedding
- Dimensionality Selection Criterion
- Experimental Results
- Toy Datasets
- Real-world Datasets
- Conclusions
3Related Work
- Many recent techniques have been proposed to
learn a more appropriate metric space for better
performance of many learning and data mining
algorithms, for examples, - Relevant Component Analysis, Bar-Hillel, A., et
al. ICML2003. - Locality Preserving Projections, He, X. et al.,
NIPS 2003. - Neighborhood Components Analysis, Goldberger, J.,
et al. NIPS 2004. - Marginal Fisher Analysis, Yan, S., et al., CVPR
2005. - Local Discriminant Embedding, Chen, H.-T., et al.
CVPR 2005. - Local Fisher Discriminant Analysis, Sugiyama, M.
ICML 2006 -
- However, the target dimensionality of the new
space is selected empirically in the above
mentioned approaches
4Main Idea
- Given finite labeled multi-class samples, what
can we do for better performance of kNN
classification? - Can we learn a low dimensional embedding for that
kNN points in the same class have smaller
distances to each other than to points in
different classes? - Can we estimate the optimal dimensionality of the
new metric space in the meantime ?
5Outline
- Motivation
- Related Work
- Main Idea
- Proposed Algorithm
- Discriminant Neighborhood Embedding
- Dimensionality Selection Criterion
- Experimental Results
- Toy Datasets
- Real-world Datasets
- Conclusions
6Setup
- N labeled multi-class points
- k nearest neighbors of in the same class
- k nearest neighbors of in the other classes
- Discriminant adjacent matrix F
7Objective Function
- Objective Function
- Intra-class compactness in the new space
- Inter-class separability in the new space
(S is a diagonal matrix whose entries are column
sums of F)
8How to Compute P
- Note
- The matrix X(S-F)XT is symmetric, but not
positive definite. It might have negative, zero,
or positive eigenvalues - The optimal transformation P can be obtained by
the eigenvectors of X(S-F)XT corresponding to its
all d negative eigenvalues
9What does the Positive/Negative Eigenvalue Mean?
- The ith eigenvector Pi corresponding to the ith
eigenvalue - the total kNN pairwise distance in the
same class - the total kNN pairwise distance in
different class
10Choosing the Leading Negative Eigenvalues
- Among all the negative eigenvalues, some might
have much larger absolute values, but the others
with small absolute values could be ignored - We can then choose t (tltd) negative eigenvalues
with the largest absolute values such that -
11Learned Mahalanobis Distance
- In the original space, the distance between any
pair of points can be obtained by
12Outline
- Motivation
- Related Work
- Main Idea
- Proposed Algorithm
- Discriminant Neighborhood Embedding
- Dimensionality Selection Criterion
- Experimental Results
- Toy Datasets
- Real-world Datasets
- Conclusions
13Three Classes of Well Clustered Data
- Both eigenvalues are negative and comparable
- Need not perform dimensionality reduction
14Two Classes of Data with Multimodal Distribution
- A big difference between two negative eigenvalues
- The leading eigenvector P1 corresponding to
will be kept.
15Three Classes of Data
- Two eigenvectors corresponding to positive and
negative eigenvalues, respectively. - The eigenvector with positive eigenvalue should
be discarded from the point of view of kNN
classification.
16Five Classes of Non-separable Data
- Both eigenvalues are positive, and it means that
we could not perform kNN classification well both
in the original and new spaces
17UCI Sonar Dataset
- When eigenvalues lt 0, the more dimensionality,
the higher accuracy - When eigenvalues near 0, its optimum can be
achieved - When eigenvalues gt 0, the performance decreases
Cumulative eigenvalue curve
18Comparisons with the State-of-the-Art
19UMIST Face Database
20Comparisons with the State-of-the-Art
UMIST Face Database
21Outline
- Motivation
- Related Work
- Main Idea
- The Proposed Algorithm
- Discriminant Neighborhood Embedding
- Dimensionality Selection Criterion
- Experimental Results
- Toy Datasets
- Real-world Datasets
- Conclusions
22Conclusions
- Summary
- A low dimensional embedding can be LEARNED for
better accuracy in kNN classification given
finite training samples - Optimal dimensionality can be estimated
- Future work
- For large scale datasets, how to reduce the
computational complexity?
23Thanks for your Attention! Any questions?