Title: Efficient Image Search and Retrieval using Compact Binary Codes
1Efficient Image Search and Retrieval using
Compact Binary Codes
- Rob Fergus (NYU)
- Jon Barron (NYU/UC Berkeley)
- Antonio Torralba (MIT)
- Yair Weiss (Hebrew U.)
CVPR 2008
2Large scale image search
Internet contains many billions of images
How can we search them, based on visual content?
- The Challenge
- Need way of measuring similarity between images
- Needs to scale to Internet
3Current Image Search Engines
Essentially text-based
4Existing approaches to Content-Based Image
Retrieval
- Focus of scaling rather than understanding image
- Variety of simple/hand-designed cues
- Color and/or Texture histograms, Shape, PCA, etc.
- Various distance metrics
- Earth Movers Distance (Rubner et al. 98)
- Most recognition approaches slow (1sec/image)
5Our Approach
- Learn the metric from training data
- Use compact binary codes for speed
DO BOTH TOGETHER
6Large scale image/video search
- Representation must fit in memory (disk too slow)
- Facebook has 10 billion images (1010)
- PC has 10 Gbytes of memory (1011 bits)
- ? Budget of 101 bits/image
- YouTube has a trillion video frames (1012)
- Big cluster of PCs has 10 Tbytes (1014 bits)
- ? Budget of 102 bits/frame
7Some file sizes
- Typical YouTube clip (compressed) is 108 bits
- 1 Megapixel JPEG image is 107 bits
- 32x32 color image is 104 bits
- - Smallest useful image size
101 - 102 bits is not much ? Need a really
compact image representation
8Binary codes for images
- Want images with similar contentto have similar
binary codes - Use Hamming distance between codes
- Number of bit flips
- E.g.
- Semantic Hashing Salakhutdinov Hinton, 2007
- Text documents
Ham_Dist(10001010,10001110)1
Ham_Dist(10001010,11101110)3
9Semantic Hashing
Salakhutdinov Hinton, 2007 for text documents
Query Image
Semantic HashFunction
Address Space
Binary code
Images in database
Query address
Semantically similar images
Quite differentto a (conventional)randomizing
hash
10Semantic Hashing
- Each image code is a memory address
- Find neighbors by exploring Hamming ball around
query address
Address Space
- Lookup time is independentof of data points
- Depends on radius of ball length of code
Images in database
Query address
Code length
Choose
Radius
11Code requirements
- Similar images ? Similar Codes
- Very compact (lt102 bits/image)
- Fast to compute
- Does NOT have to reconstruct image
- Three approaches
- Locality Sensitive Hashing (LSH)
- Boosting
- Restricted Boltzmann Machines (RBMs)
12Input Image representation Gist vectors
- Pixels not a convenient representation
- Use Gist descriptor instead (Oliva Torralba,
2001) - 512 dimensions/image (real-valued ? 16,384 bits)
- L2 distance btw. Gist vectors not bad substitute
for human perceptual distance
NO COLOR INFORMATION
Oliva Torralba, IJCV 2001
131. Locality Sensitive Hashing
- Gionis, A. Indyk, P. Motwani, R. (1999)
- Take random projections of data
- Quantize each projection with few bits
101
Gist descriptor
No learning involved
142. Boosting
- Modified form of BoostSSC Shaknarovich, Viola
Darrell, 2003 - Positive examples are pairs of similar images
- Negative examples are pairs of unrelated images
Learn threshold dimension for each bit (weak
classifier)
153. Restricted Boltzmann Machine (RBM)
- Type of Deep Belief Network
- Hinton Salakhutdinov, Science 2006
Units are binary stochastic
SingleRBMlayer
W
- Attempts to reconstruct input at visible layer
from activation of hidden layer
163. Restricted Boltzmann Machine (RBM)
- p(Activation of hidden unit )
- Sigmoid )
- Symmetric situation for visible units
SingleRBMlayer
- Learn weights ( biases) in unsupervised manner
(via sampling-based approach)
17Multi-Layer RBM non-linear dimensionality
reduction
Output binary code (N dimensions)
N
Layer 3
w3
256
256
Layer 2
w2
512
512
Layer 1
w1
512
Linear units at first layer
Input Gist vector (512 dimensions)
18Training RBM models
1st Phase Pre-training Unsupervised Can use
unlabeled data (unlimited quantity) Learn
parameters greedily per layer Gets them to
right ballpark
2nd Phase Fine-tuning Supervised Requires
labeled data (limited quantity) Back propagate
gradients of chosen error function Moves
parameters to local minimum
19Greedy pre-training (Unsupervised)
512
Layer 1
w1
512
Input Gist vector (512 real dimensions)
20Greedy pre-training (Unsupervised)
256
Layer 2
w2
512
Activations of hidden units from layer 1 (512
binary dimensions)
21Greedy pre-training (Unsupervised)
N
Layer 3
w3
256
Activations of hidden units from layer 2 (256
binary dimensions)
22Fine-tuning back-propagation of Neighborhood
Components Analysis objective
Output binary code (N dimensions)
N
Layer 3
w3
256
256
Layer 2
w2
512
512
Layer 1
w1
512
Input Gist vector (512 real dimensions)
23Neighborhood Components Analysis
- Goldberger, Roweis, Salakhutdinov Hinton, NIPS
2004 - Tries to preserve neighborhood structure of input
space - Assumes this structure is given (will explain
later)
Toy example with 2 classes N2 units at top of
network
Points in output space (coordinate is activation
probability of unit)
24Neighborhood Components Analysis
- Adjust network parameters (weights and biases)
to move - Points of SAME class closer
- Points of DIFFERENT class away
25Neighborhood Components Analysis
- Adjust network parameters (weights and biases)
to move - Points of SAME class closer
- Points of DIFFERENT class away
Points close in input space (Gist) will be close
in output code space
26Simple Binarization Strategy
- Deliberately add noise
- Set threshold - e.g. use median
27Overall Query Scheme
lt10µs
Binary code
Image 1
RBM
Retrieved images
lt1ms
Semantic Hash
Query Image
Gist descriptor
Compute Gist
1ms (in Matlab)
28Retrieval Experiments
29Test set 1 LabelMe
- 22,000 images (20,000 train 2,000 test)
- Ground truth segmentations for all
- Can define ground truth distance btw. images
using these segmentations
30Defining ground truth
- Boosting and NCA back-propagation require ground
truth distance between images - Define this using labeled images from LabelMe
31Defining ground truth
- Pyramid Match (Lazebnik et al. 2006, Grauman
Darrell 2005)
32Defining ground truth
- Pyramid Match (Lazebnik et al. 2006, Grauman
Darrell 2005)
Varying spatial resolution to capture approximate
spatial correspondance
33Examples of LabelMe retrieval
- 12 closest neighbors under different distance
metrics
34LabelMe Retrieval
of 50 true neighbors in retrieval set
0 2,000 10,000
20,0000
Size of retrieval set
35LabelMe Retrieval
of 50 true neighbors in retrieval set
of 50 true neighbors in first 500 retrieved
0 2,000 10,000
20,0000
Number of bits
Size of retrieval set
36Test set 2 Web images
- 12.9 million images
- Collected from Internet
- No labels, so use Euclidean distance between Gist
vectors as ground truth distance
37Web images retrieval
of 50 true neighbors in retrieval set
Size of retrieval set
38Web images retrieval
of 50 true neighbors in retrieval set
of 50 true neighbors in retrieval set
Size of retrieval set
Size of retrieval set
39Examples of Web retrieval
- 12 neighbors using different distance metrics
40Retrieval Timings
41Scaling it up
- Google very interested in it
- Jon Barron summer internship at Google NYC
- NCA has N2 cost
- Use DrLIM (Hadsell, Chopra, LeCun 2006)
- Train on Google proprietary labels
42Further Directions
- Spectral Hashing
- Brute Force Object Recognition
43Spectral Hashing (NIPS 08)
- Assume points are embedded in Euclidean space
- How to binarize so Hamming distance approximates
Euclidean distance?
- Under certain (reasonable) assumptions, analytic
form exists - No learning, super-simple
- Come to Machine Learning seminaron Dec 2nd
442-D Toy example
7 bits
15 bits
Distance from query point Red 0 bits Green
1 bit Black gt2 bits Blue 2 bits
Query Point
452-D Toy Example Comparison
46Further Directions
- Spectral Hashing
- Brute Force Object Recognition
4780 Million Tiny Images (PAMI 08)
Implemented by
105
106
108
48LabelMe Recognition examples
49Summary
- Explored various approaches to learning binary
codes for hashing-based retrieval - Very quick with performance comparable to complex
descriptors - Remaining issues
- How to learn metric (so that it scales)
- How to produce binary codes
- How to use for recognition