Title: Spectral Hashing
1Spectral Hashing
- Y. Weiss (Hebrew U.)
- A. Torralba (MIT)
- Rob Fergus (NYU)
2What does the world look like?
Motivation
High level image statistics
Object Recognition for large-scale search
3Semantic Hashing
Salakhutdinov Hinton, 2007
Query Image
Semantic HashFunction
Address Space
Binary code
Images in database
Query address
Semantically similar images
Quite differentto a (conventional)randomizing
hash
41. Locality Sensitive Hashing
- Gionis, A. Indyk, P. Motwani, R. (1999)
- Take random projections of data
- Quantize each projection with few bits
101
Gist descriptor
No learning involved
5Toy Example
62. Boosting
- Modified form of BoostSSC Shaknarovich, Viola
Darrell, 2003 - Positive examples are pairs of similar images
- Negative examples are pairs of unrelated images
Learn threshold dimension for each bit (weak
classifier)
7Toy Example
83. Restricted Boltzmann Machine (RBM)
- Type of Deep Belief Network
- Hinton Salakhutdinov, Science 2006
Units are binary stochastic
SingleRBMlayer
W
- Attempts to reconstruct input at visible layer
from activation of hidden layer
9Multi-Layer RBM non-linear dimensionality
reduction
Output binary code (N dimensional)
N
Layer 3
w3
256
256
Layer 2
w2
512
512
Layer 1
w1
512
Linear units at first layer
Input Gist vector (512 dimensions)
10Toy Example
112-D Toy example
7 bits
15 bits
Distance from query point Red 0 bits Green
1 bit Black gt2 bits Blue 2 bits
Query Point
12Toy Results
Distance Red 0 bits Green 1 bit Blue 2
bits
13Semantic Hashing
Salakhutdinov Hinton, 2007
Query Image
Semantic HashFunction
Address Space
Binary code
Images in database
Query address
Semantically similar images
Quite differentto a (conventional)randomizing
hash
14Spectral Hash
Query Image
SpectralHash
Non-lineardimensionality reduction
Address Space
Binary code
Images in database
Real-valuedvectors
Query address
Semantically similar images
Quite differentto a (conventional)randomizing
hash
15Spectral Hashing (NIPS 08)
- Assume points are embedded in Euclidean space
- How to binarize so Hamming distance approximates
Euclidean distance?
Ham_Dist(10001010,11101110)3
16Spectral Hashing theory
- Want to min YT(D-W)Y subject to
- Each bit on 50 of time
- Bits are independent
- Sadly, this is NP-complete
- Relax the problem, by letting Y be continuous.
- Now becomes eigenvector problem
17Nystrom Approximation
- Method for approximating eigenfunctions
- Interpolate between existing data points
- Requires evaluation of distance to existing
data ? cost grows linearly with points - Also overfits badly in practice
18What about a novel data point?
- Need a function to map new points into the space
- Take limit of Eigenvalues as n?\inf
- Need to carefully normalize graph Laplacian
- Analytical form of Eigenfunctions exists for
certain distributions (uniform, Gaussian) - Constant time compute/evaluate new point
- For uniform
Only depends on extent of distribution (b-a)
19Eigenfunctions for uniform distribution
20The Algorithm
- Input Data xi of dimensionality d desired
bits, k
- Fit a multidimensional rectangle to the data
- Run PCA to align axes, then bound uniform
distribution - For each dimension, calculate k smallest
eigenfunctions. - This gives dk eigenfunctions. Pick ones with
smallest k eigenvalues. - Threshold eigenfunctions at zero to give binary
codes
211. Fit Multidimensional Rectangle
- Run PCA to align axes
- Bound uniform distribution
222. Calculuate Eigenfunctions
233. Pick k smallest Eigenfunctions
Eigenvalues
e.g. k3
244. Threshold chosen Eigenfunctions
25Back to the 2-D Toy example
7 bits
15 bits
Distance Red 0 bits Green 1 bit Blue 2
bits
262-D Toy Example Comparison
2710-D Toy Example
28Experimentson Real Data
29Input Image representation Gist vectors
- Pixels not a convenient representation
- Use Gist descriptor instead (Oliva Torralba,
2001) - 512 dimensions/image (real-valued ? 16,384 bits)
- L2 distance btw. Gist vectors not bad substitute
for human perceptual distance
NO COLOR INFORMATION
Oliva Torralba, IJCV 2001
30LabelMe images
- 22,000 images (20,000 train 2,000 test)
- Ground truth segmentations for all
- Assume L2 Gist distance is true distance
31LabelMe data
32(No Transcript)
33Extensions
34How to handle non-uniform distributions
35Bit allocation between dimensions
- Compare value of cuts in original space, i.e.
before the pointwise nonlinearity.
36Summary
- Spectral Hashing
- Simple way of computing good binary codes
- Forced to make big assumption about data
distribution - Use point-wise non-linearities to map
distribution to uniform - Need more experiments on real data
37(No Transcript)
38(No Transcript)
39Overview
- Assume points are embedded in Euclidean space
(e.g. output from RBM) - How to binarize the space so that Hamming
distance between points approximates L2 distance?
40- Semantic Hashing beyond 30 bits
41Strategies for Binarization
- Deliberately add noise during backprop - forces
extreme values to overcome noise