Efficient Image Search and Retrieval using Compact Binary Codes - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Efficient Image Search and Retrieval using Compact Binary Codes

Description:

Efficient Image Search and Retrieval using Compact Binary Codes – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 50
Provided by: rob1114
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficient Image Search and Retrieval using Compact Binary Codes


1
Efficient Image Search and Retrieval using
Compact Binary Codes
  • Rob Fergus (NYU)
  • Jon Barron (NYU/UC Berkeley)
  • Antonio Torralba (MIT)
  • Yair Weiss (Hebrew U.)

CVPR 2008
2
Large scale image search
Internet contains many billions of images
How can we search them, based on visual content?
  • The Challenge
  • Need way of measuring similarity between images
  • Needs to scale to Internet

3
Current Image Search Engines
Essentially text-based
4
Existing approaches to Content-Based Image
Retrieval
  • Focus of scaling rather than understanding image
  • Variety of simple/hand-designed cues
  • Color and/or Texture histograms, Shape, PCA, etc.
  • Various distance metrics
  • Earth Movers Distance (Rubner et al. 98)
  • Most recognition approaches slow (1sec/image)

5
Our Approach
  • Learn the metric from training data
  • Use compact binary codes for speed

DO BOTH TOGETHER
6
Large scale image/video search
  • Representation must fit in memory (disk too slow)
  • Facebook has 10 billion images (1010)
  • PC has 10 Gbytes of memory (1011 bits)
  • ? Budget of 101 bits/image
  • YouTube has a trillion video frames (1012)
  • Big cluster of PCs has 10 Tbytes (1014 bits)
  • ? Budget of 102 bits/frame

7
Some file sizes
  • Typical YouTube clip (compressed) is 108 bits
  • 1 Megapixel JPEG image is 107 bits
  • 32x32 color image is 104 bits
  • - Smallest useful image size

101 - 102 bits is not much ? Need a really
compact image representation
8
Binary codes for images
  • Want images with similar contentto have similar
    binary codes
  • Use Hamming distance between codes
  • Number of bit flips
  • E.g.
  • Semantic Hashing Salakhutdinov Hinton, 2007
  • Text documents

Ham_Dist(10001010,10001110)1
Ham_Dist(10001010,11101110)3
9
Semantic Hashing
Salakhutdinov Hinton, 2007 for text documents
Query Image
Semantic HashFunction
Address Space
Binary code
Images in database
Query address
Semantically similar images
Quite differentto a (conventional)randomizing
hash
10
Semantic Hashing
  • Each image code is a memory address
  • Find neighbors by exploring Hamming ball around
    query address

Address Space
  • Lookup time is independentof of data points
  • Depends on radius of ball length of code

Images in database
Query address
Code length
Choose
Radius
11
Code requirements
  • Similar images ? Similar Codes
  • Very compact (lt102 bits/image)
  • Fast to compute
  • Does NOT have to reconstruct image
  • Three approaches
  • Locality Sensitive Hashing (LSH)
  • Boosting
  • Restricted Boltzmann Machines (RBMs)

12
Input Image representation Gist vectors
  • Pixels not a convenient representation
  • Use Gist descriptor instead (Oliva Torralba,
    2001)
  • 512 dimensions/image (real-valued ? 16,384 bits)
  • L2 distance btw. Gist vectors not bad substitute
    for human perceptual distance

NO COLOR INFORMATION
Oliva Torralba, IJCV 2001
13
1. Locality Sensitive Hashing
  • Gionis, A. Indyk, P. Motwani, R. (1999)
  • Take random projections of data
  • Quantize each projection with few bits

101
Gist descriptor
No learning involved
14
2. Boosting
  • Modified form of BoostSSC Shaknarovich, Viola
    Darrell, 2003
  • Positive examples are pairs of similar images
  • Negative examples are pairs of unrelated images

Learn threshold dimension for each bit (weak
classifier)
15
3. Restricted Boltzmann Machine (RBM)
  • Type of Deep Belief Network
  • Hinton Salakhutdinov, Science 2006

Units are binary stochastic
SingleRBMlayer
W
  • Attempts to reconstruct input at visible layer
    from activation of hidden layer

16
3. Restricted Boltzmann Machine (RBM)
  • p(Activation of hidden unit )
  • Sigmoid )
  • Symmetric situation for visible units

SingleRBMlayer
  • Learn weights ( biases) in unsupervised manner
    (via sampling-based approach)

17
Multi-Layer RBM non-linear dimensionality
reduction
Output binary code (N dimensions)
N
Layer 3
w3
256
256
Layer 2
w2
512
512
Layer 1
w1
512
Linear units at first layer
Input Gist vector (512 dimensions)
18
Training RBM models
1st Phase Pre-training Unsupervised Can use
unlabeled data (unlimited quantity) Learn
parameters greedily per layer Gets them to
right ballpark
2nd Phase Fine-tuning Supervised Requires
labeled data (limited quantity) Back propagate
gradients of chosen error function Moves
parameters to local minimum
19
Greedy pre-training (Unsupervised)
512
Layer 1
w1
512
Input Gist vector (512 real dimensions)
20
Greedy pre-training (Unsupervised)
256
Layer 2
w2
512
Activations of hidden units from layer 1 (512
binary dimensions)
21
Greedy pre-training (Unsupervised)
N
Layer 3
w3
256
Activations of hidden units from layer 2 (256
binary dimensions)
22
Fine-tuning back-propagation of Neighborhood
Components Analysis objective
Output binary code (N dimensions)
N
Layer 3
w3
256
256
Layer 2
w2
512
512
Layer 1
w1
512
Input Gist vector (512 real dimensions)
23
Neighborhood Components Analysis
  • Goldberger, Roweis, Salakhutdinov Hinton, NIPS
    2004
  • Tries to preserve neighborhood structure of input
    space
  • Assumes this structure is given (will explain
    later)

Toy example with 2 classes N2 units at top of
network
Points in output space (coordinate is activation
probability of unit)
24
Neighborhood Components Analysis
  • Adjust network parameters (weights and biases)
    to move
  • Points of SAME class closer
  • Points of DIFFERENT class away

25
Neighborhood Components Analysis
  • Adjust network parameters (weights and biases)
    to move
  • Points of SAME class closer
  • Points of DIFFERENT class away

Points close in input space (Gist) will be close
in output code space
26
Simple Binarization Strategy
  1. Deliberately add noise
  • Set threshold - e.g. use median

27
Overall Query Scheme
lt10µs
Binary code
Image 1
RBM
Retrieved images
lt1ms
Semantic Hash
Query Image
Gist descriptor
Compute Gist
1ms (in Matlab)
28
Retrieval Experiments
29
Test set 1 LabelMe
  • 22,000 images (20,000 train 2,000 test)
  • Ground truth segmentations for all
  • Can define ground truth distance btw. images
    using these segmentations

30
Defining ground truth
  • Boosting and NCA back-propagation require ground
    truth distance between images
  • Define this using labeled images from LabelMe

31
Defining ground truth
  • Pyramid Match (Lazebnik et al. 2006, Grauman
    Darrell 2005)

32
Defining ground truth
  • Pyramid Match (Lazebnik et al. 2006, Grauman
    Darrell 2005)

Varying spatial resolution to capture approximate
spatial correspondance
33
Examples of LabelMe retrieval
  • 12 closest neighbors under different distance
    metrics

34
LabelMe Retrieval
of 50 true neighbors in retrieval set
0 2,000 10,000
20,0000
Size of retrieval set
35
LabelMe Retrieval
of 50 true neighbors in retrieval set
of 50 true neighbors in first 500 retrieved
0 2,000 10,000
20,0000
Number of bits
Size of retrieval set
36
Test set 2 Web images
  • 12.9 million images
  • Collected from Internet
  • No labels, so use Euclidean distance between Gist
    vectors as ground truth distance

37
Web images retrieval
of 50 true neighbors in retrieval set
Size of retrieval set
38
Web images retrieval
of 50 true neighbors in retrieval set
of 50 true neighbors in retrieval set
Size of retrieval set
Size of retrieval set
39
Examples of Web retrieval
  • 12 neighbors using different distance metrics

40
Retrieval Timings
41
Scaling it up
  • Google very interested in it
  • Jon Barron summer internship at Google NYC
  • NCA has N2 cost
  • Use DrLIM (Hadsell, Chopra, LeCun 2006)
  • Train on Google proprietary labels

42
Further Directions
  • Spectral Hashing
  • Brute Force Object Recognition

43
Spectral Hashing (NIPS 08)
  • Assume points are embedded in Euclidean space
  • How to binarize so Hamming distance approximates
    Euclidean distance?
  • Under certain (reasonable) assumptions, analytic
    form exists
  • No learning, super-simple
  • Come to Machine Learning seminaron Dec 2nd

44
2-D Toy example
  • 3 bits

7 bits
15 bits
Distance from query point Red 0 bits Green
1 bit Black gt2 bits Blue 2 bits
Query Point
45
2-D Toy Example Comparison
46
Further Directions
  • Spectral Hashing
  • Brute Force Object Recognition

47
80 Million Tiny Images (PAMI 08)
Implemented by
  • images

105
106
108
48
LabelMe Recognition examples
49
Summary
  • Explored various approaches to learning binary
    codes for hashing-based retrieval
  • Very quick with performance comparable to complex
    descriptors
  • Remaining issues
  • How to learn metric (so that it scales)
  • How to produce binary codes
  • How to use for recognition
Write a Comment
User Comments (0)
About PowerShow.com