Title: Evaluation of Distance Metrics for Recognition Based on
1Evaluation of Distance Metrics for Recognition
Based on Non-Negative Matrix Factorization
- David Guillamet, Jordi Vitrià
- Pattern Recognition Letters
- 241599-1605, June, 2003
- John Galeotti
- Advanced Perception
- March 23, 2004
2Actually, Two ICPR02 Papers
- Analyzing Non-Negative Matrix Factorization for
Image Classification - David Guillamet, Bernt Schiele, Jordi Vitrià
- Determining a Suitable Metric When using
Non-negative Matrix Factorization - David Guillamet, Jordi Vitrià
3Non-Negative Matrix Factorization
- TLA NMF
- Used for dimensionality reduction
- Vnxm WnxrHrxm, r lt nm/(nm)
- V has non-negative training samples as its
columns - W contains the non-negative basis vectors
- H contains the non-negative coefficients to
approximate each column of V using W - Results similar in concept to PCA, but with
non-negative basis vectors
4NMF Distinguishing Properties
- Requires positive data
- Computationally expensive
- Part-based decomposition
- Because only additive combinations of original
data are allowed - Not an orthonormal basis
5Different Decomposition Types
- 20 Dimensions of Numeric Digits
- PCA NMF
- 50 Dimensions of Numeric Digits
- PCA NMF
6Why not just use PCA?
- PCA is optimal for reconstruction
- PCA is not optimal for separation and recognition
of classes
7NMF Issues Addressed
- If/when is NMF better at dimensionality reduction
than PCA for classification? - Can combining PCA and NMF lead to better
performance? - What is the best distance metric to use with the
nonorthonormal basis of NMF?
8How NMF Works
- Vnxm WnxrHrxm, r lt nm/(nm)
- Begin with a nxm matrix of training data V
- Each column is a vectorized data point
- Randomly initialize W and H with positive values
- Iterate according to update rules
-
-
9How NMF Works
- In general, NMF requires the non-linear
optimization of an objective function - The update rules just given correspond to a
popular objective function, and are guaranteed to
converge. - That objective function relates to the
probability of generating the images in V from
the bases W and encodings H -
10NMF vs. PCA Experiments
- Dataset 10 classes of natural textures
- Clouds, grass, ice, trees, sand, sky, etc.
- 932 color images total
- Each image tessellated into 10x10 patches
- 1000 patches for training, 1000 for testing
- Each patch classified as a single texture
- Raw feature vectors Color histograms
- Each region histogrammed into 8 bins per color,
16 colors ? 512 dimensional vectors
11NMF vs. PCA Experiments
- Learn both NMF and PCA subspaces for each class
of histogram - For both NMF and PCA
- Project queries onto the learned subspaces of
each class - Label each query by the subspace that best
reconstructs the query - This seems like a poor scheme for NMF
- (Other experiments allow better schemes)
12NMF vs. PCA Results
- NMF works best for dispersed classes
- PCA works best for compact classes
- Both seem usefultry combining them
- But, why are less than half of the sky vectors
best reconstructed by PCA when for sky PCA has a
mean reconstruction error less than 1/4 that of
NMF? Mistakes?
13NMFPCA Experiments
- During training, we learned whether NMF or PCA
worked best for each class - Project a query to a class using only the method
that works best for that class - Result 2.3 improvement in the recognition rate
over NMF alone (PCA 5.8), but is this
significant at 60?
14Hierarchy Experiments
- At level k of the hierarchy, project the query
onto each original class NMF or PCA subspace - But, to choose the direction to descend the
hierarchy, we only care about the level k
super-class containing the matching class - Furthermore, for each class the choice of PCA vs.
NMF can be independently set at each level of the
hierarchy
15Hierarchy Results
- 2 improvement in recognition rate
- I really suspect that this is insignificant, and
resulting only from the additional degrees of
freedom - They employ various additional neighborhood-based
hacks to increase their accuracy further, but I
dont see any relevance to NMF specifically
16Need for a better metric
- Want to classify based on nearest neighbor,
rather than reprojection error - Unfortunately, NMF generates a nonorthonormal
basis, and so the relative distance to a base
depends on the uniqueness of that base - Bases will share a lot of pixels in common areas
17Earth Movers Distance (EMD)
- Defined as the minimal amount of work that must
be performed to transform one feature
distribution into the other - A special case of the transportation problem
from linear optimization - Let Iset of suppliers, Jset of consumers,
cijcost to ship from I to J, fijamount shipped
from I to J - Distance cost to make datasets equal
18Earth Movers Distance (EMD)
- Based on finding a measure of correlation between
bases to define its cost matrix - The cost matrix weights the transition of one
basis (bi) to another (bj) - cij distangle(bi,bj) -( x y )/( x y
)
19EMD Transportation Problem
- fij quant.
shipped from i?j - Consumers
dont ship - Dont exceed
demand - Dont exceed
supply - Demand must equal supply for EMD to be a metric
20EMD vs. Other Experiments
- Digit recognition from MNIST digit database
- 60,000 training images 10,000 for test
- Classify by NN and 5NN in the subspace
- Result EMD works best in low-dimensional
subspaces, but in high-dimensional subspaces EMD
does not work well - More specificly, EMD works well when the bases
contain some intersecting pixels
21Occlusion Experiments
- Randomly occlude either 1 or 2 of the 4 quadrants
of an image (25 and 50 occlusion) - Why does distangle do so well?
Best subspace distance with occlusions Best subspace distance with occlusions Best subspace distance with occlusions
Low dim. High dim.
25 Occlusion NMFdistangle PCA sometimes better
50 Occlusion NMFdistangle OR EMD NMFdistangle
22Demo
- NMF difficulties
- EMD experiments instead
- Demonstrate using existing code within the
desired framework of a cost matrix - Their code http//robotics.stanford.edu/rubner/
emd/default.htm - My code http//www.vialab.org/john/Pres9-code/
23Conclusion
- NMF is a parts-based alternative to PCA
- NMF and PCA should be combined for
minimum-reprojection-error classification - For nearest-neighbor classification, NMF needs a
better metric - When the subspace dimensionality is chosen
appropriately for good bases, NMFEMD or
NMFdistangle have the highest recognition rates