Efficient Image Search and Retrieval using Compact Binary Codes presentation

About This Presentation

Transcript and Presenter's Notes

Title: Efficient Image Search and Retrieval using Compact Binary Codes

1
Efficient Image Search and Retrieval using
Compact Binary Codes

Rob Fergus (NYU)
Jon Barron (NYU/UC Berkeley)
Antonio Torralba (MIT)
Yair Weiss (Hebrew U.)

CVPR 2008
2
Large scale image search
Internet contains many billions of images
How can we search them, based on visual content?

The Challenge
Need way of measuring similarity between images
Needs to scale to Internet

3
Current Image Search Engines
Essentially text-based
4
Existing approaches to Content-Based Image
Retrieval

Focus of scaling rather than understanding image
Variety of simple/hand-designed cues
Color and/or Texture histograms, Shape, PCA, etc.
Various distance metrics
Earth Movers Distance (Rubner et al. 98)
Most recognition approaches slow (1sec/image)

5
Our Approach

Learn the metric from training data

Use compact binary codes for speed

DO BOTH TOGETHER
6
Large scale image/video search

Representation must fit in memory (disk too slow)
Facebook has 10 billion images (1010)
PC has 10 Gbytes of memory (1011 bits)
? Budget of 101 bits/image

YouTube has a trillion video frames (1012)
Big cluster of PCs has 10 Tbytes (1014 bits)
? Budget of 102 bits/frame

7
Some file sizes

Typical YouTube clip (compressed) is 108 bits
1 Megapixel JPEG image is 107 bits
32x32 color image is 104 bits
- Smallest useful image size

101 - 102 bits is not much ? Need a really
compact image representation
8
Binary codes for images

Want images with similar contentto have similar
binary codes
Use Hamming distance between codes
Number of bit flips
E.g.
Semantic Hashing Salakhutdinov Hinton, 2007
Text documents

Ham_Dist(10001010,10001110)1
Ham_Dist(10001010,11101110)3
9
Semantic Hashing
Salakhutdinov Hinton, 2007 for text documents
Query Image
Semantic HashFunction
Address Space
Binary code
Images in database
Query address
Semantically similar images
Quite differentto a (conventional)randomizing
hash
10
Semantic Hashing

Each image code is a memory address
Find neighbors by exploring Hamming ball around
query address

Address Space

Lookup time is independentof of data points
Depends on radius of ball length of code

Images in database
Query address
Code length
Choose
Radius
11
Code requirements

Similar images ? Similar Codes
Very compact (lt102 bits/image)
Fast to compute
Does NOT have to reconstruct image
Three approaches
Locality Sensitive Hashing (LSH)
Boosting
Restricted Boltzmann Machines (RBMs)

12
Input Image representation Gist vectors

Pixels not a convenient representation
Use Gist descriptor instead (Oliva Torralba,
2001)
512 dimensions/image (real-valued ? 16,384 bits)
L2 distance btw. Gist vectors not bad substitute
for human perceptual distance

NO COLOR INFORMATION
Oliva Torralba, IJCV 2001
13
1. Locality Sensitive Hashing

Gionis, A. Indyk, P. Motwani, R. (1999)

Take random projections of data
Quantize each projection with few bits

101
Gist descriptor
No learning involved
14
2. Boosting

Modified form of BoostSSC Shaknarovich, Viola
Darrell, 2003
Positive examples are pairs of similar images
Negative examples are pairs of unrelated images

Learn threshold dimension for each bit (weak
classifier)
15
3. Restricted Boltzmann Machine (RBM)

Type of Deep Belief Network
Hinton Salakhutdinov, Science 2006

Units are binary stochastic
SingleRBMlayer
W

Attempts to reconstruct input at visible layer
from activation of hidden layer

16
3. Restricted Boltzmann Machine (RBM)

p(Activation of hidden unit )
Sigmoid )

Symmetric situation for visible units

SingleRBMlayer

Learn weights ( biases) in unsupervised manner
(via sampling-based approach)

17
Multi-Layer RBM non-linear dimensionality
reduction
Output binary code (N dimensions)
N
Layer 3
w3
256
256
Layer 2
w2
512
512
Layer 1
w1
512
Linear units at first layer
Input Gist vector (512 dimensions)
18
Training RBM models
1st Phase Pre-training Unsupervised Can use
unlabeled data (unlimited quantity) Learn
parameters greedily per layer Gets them to
right ballpark
2nd Phase Fine-tuning Supervised Requires
labeled data (limited quantity) Back propagate
gradients of chosen error function Moves
parameters to local minimum
19
Greedy pre-training (Unsupervised)
512
Layer 1
w1
512
Input Gist vector (512 real dimensions)
20
Greedy pre-training (Unsupervised)
256
Layer 2
w2
512
Activations of hidden units from layer 1 (512
binary dimensions)
21
Greedy pre-training (Unsupervised)
N
Layer 3
w3
256
Activations of hidden units from layer 2 (256
binary dimensions)
22
Fine-tuning back-propagation of Neighborhood
Components Analysis objective
Output binary code (N dimensions)
N
Layer 3
w3
256
256
Layer 2
w2
512
512
Layer 1
w1
512
Input Gist vector (512 real dimensions)
23
Neighborhood Components Analysis

Goldberger, Roweis, Salakhutdinov Hinton, NIPS
2004
Tries to preserve neighborhood structure of input
space
Assumes this structure is given (will explain
later)

Toy example with 2 classes N2 units at top of
network
Points in output space (coordinate is activation
probability of unit)
24
Neighborhood Components Analysis

Adjust network parameters (weights and biases)
to move
Points of SAME class closer

Points of DIFFERENT class away

25
Neighborhood Components Analysis

Adjust network parameters (weights and biases)
to move
Points of SAME class closer

Points of DIFFERENT class away

Points close in input space (Gist) will be close
in output code space
26
Simple Binarization Strategy

Deliberately add noise

Set threshold - e.g. use median

27
Overall Query Scheme
lt10µs
Binary code
Image 1
RBM
Retrieved images
lt1ms
Semantic Hash
Query Image
Gist descriptor
Compute Gist
1ms (in Matlab)
28
Retrieval Experiments
29
Test set 1 LabelMe

22,000 images (20,000 train 2,000 test)
Ground truth segmentations for all
Can define ground truth distance btw. images
using these segmentations

30
Defining ground truth

Boosting and NCA back-propagation require ground
truth distance between images
Define this using labeled images from LabelMe

31
Defining ground truth

Pyramid Match (Lazebnik et al. 2006, Grauman
Darrell 2005)

32
Defining ground truth

Pyramid Match (Lazebnik et al. 2006, Grauman
Darrell 2005)

Varying spatial resolution to capture approximate
spatial correspondance
33
Examples of LabelMe retrieval

12 closest neighbors under different distance
metrics

34
LabelMe Retrieval
of 50 true neighbors in retrieval set
0 2,000 10,000
20,0000
Size of retrieval set
35
LabelMe Retrieval
of 50 true neighbors in retrieval set
of 50 true neighbors in first 500 retrieved
0 2,000 10,000
20,0000
Number of bits
Size of retrieval set
36
Test set 2 Web images

12.9 million images
Collected from Internet
No labels, so use Euclidean distance between Gist
vectors as ground truth distance

37
Web images retrieval
of 50 true neighbors in retrieval set
Size of retrieval set
38
Web images retrieval
of 50 true neighbors in retrieval set
of 50 true neighbors in retrieval set
Size of retrieval set
Size of retrieval set
39
Examples of Web retrieval

12 neighbors using different distance metrics

40
Retrieval Timings
41
Scaling it up

Google very interested in it
Jon Barron summer internship at Google NYC
NCA has N2 cost
Use DrLIM (Hadsell, Chopra, LeCun 2006)
Train on Google proprietary labels

42
Further Directions

Spectral Hashing
Brute Force Object Recognition

43
Spectral Hashing (NIPS 08)

Assume points are embedded in Euclidean space
How to binarize so Hamming distance approximates
Euclidean distance?

Under certain (reasonable) assumptions, analytic
form exists
No learning, super-simple
Come to Machine Learning seminaron Dec 2nd

44
2-D Toy example

3 bits

7 bits
15 bits
Distance from query point Red 0 bits Green
1 bit Black gt2 bits Blue 2 bits
Query Point
45
2-D Toy Example Comparison
46
Further Directions

Spectral Hashing
Brute Force Object Recognition

47
80 Million Tiny Images (PAMI 08)
Implemented by

images

105
106
108
48
LabelMe Recognition examples
49
Summary

Explored various approaches to learning binary
codes for hashing-based retrieval
Very quick with performance comparable to complex
descriptors
Remaining issues
How to learn metric (so that it scales)
How to produce binary codes
How to use for recognition

Write a Comment

User Comments (0)

About PowerShow.com

Efficient Image Search and Retrieval using Compact Binary Codes PowerPoint PPT Presentation