Fast Image Search - PowerPoint PPT Presentation

1 / 123
About This Presentation
Title:

Fast Image Search

Description:

Fast Image Search – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 124
Provided by: wisdomWe
Category:
Tags: courses | fast | image | search | taun

less

Transcript and Presenter's Notes

Title: Fast Image Search


1
Fast Image Search
Presented by
  • Uri Shabi Shiri Chechik

For the Advanced Topics in Computer Vision
course Spring 2007
2
- The tasks
Introduction
  • Recognition
  • Given a database of images and an input query
    image, we wish to find an image in the database
    that represents the same object as in the query
    image.
  • Classification
  • Given a database of images, the algorithm
  • Divides the images into groups
  • Given a query image it returns the group that the
    image belongs to

3
- The tasks
Introduction
Database
Query Image
Results
D Nister, H Stewenius. Scalable Recognition with
a Vocabulary Tree. CVPR06. 2006.
4
- The tasks
Introduction
Query Image
Database
5
- The problem
Introduction
D Nister, H Stewenius. Scalable Recognition with
a Vocabulary Tree. CVPR06. 2006.
6
- How many object categories are there?
Introduction
Biederman 1987
7
- The Challenges
Introduction
Challenges 1 view point variation
Michelangelo 1475-1564
Adapted with permission from Fei Fei Li -
http//people.csail.mit.edu/torralba/iccv2005/
8
- The Challenges
Introduction
Challenges 2 illumination
Adapted with permission from Fei Fei Li -
http//people.csail.mit.edu/torralba/iccv2005/
9
- The Challenges
Introduction
Challenges 3 occlusion
Magritte, 1957
Adapted with permission from Fei Fei Li -
http//people.csail.mit.edu/torralba/iccv2005/
10
- The Challenges
Introduction
Challenges 4 scale
Adapted with permission from Fei Fei Li -
http//people.csail.mit.edu/torralba/iccv2005/
11
- The Challenges
Introduction
Challenges 5 deformation
Xu, Beihong 1943
Adapted with permission from Fei Fei Li -
http//people.csail.mit.edu/torralba/iccv2005/
12
Introduction
- The Challenges
  • To sum up, we have few challenges
  • View point variation
  • Illumination
  • Occlusion
  • Scale
  • Deformation

13
- Bag of Words (Documents)
Introduction
  • A document can be represented by a collection of
    words
  • Common words can be ignored (the, an,etc.)
  • This is called a stop List
  • Words are represented by their stems
  • walk, walking, walks ?walk
  • A topic can be recognized by Word frequencies

Sivic Zisserman. Video Google a text retrieval
approach to object matching in videos, Computer
Vision, 2003
14
Analogy to documents
- Bag of Words (Documents)
Introduction
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain
the cerebral cortex was a movie screen, so to
speak, upon which the image in the eye was
projected. Through the discoveries of Hubel and
Wiesel we now know that behind the origin of the
visual perception in the brain there is a
considerably more complicated course of events.
By following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to demonstrate
that the message about the image falling on the
retina undergoes a step-wise analysis in a system
of nerve cells stored in columns. In this system
each cell has its specific function and is
responsible for a specific detail in the pattern
of the retinal image.
sensory
perception
Visual X 2
brain
retinal
cerebral
image
cell
eye X 2
15
- Bag of Words
Introduction
  • Images can be represented by visual words
  • An object in an image can be recognized by visual
    word frequencies

J Sivic, A Zisserman. Video Google a text
retrieval approach to object matching in videos.
Computer Vision, 2003.
16
- Visual word
Introduction
  • We could use a feature as a visual word, but
  • Too many features
  • Two features of the same object will never look
    the same
  • A visual word is a visual stem which is
    represented by a descriptor
  • What is a good code word (visual word)?
  • Invariant to different view points, illumination,
    scale, shift and transformation

17
- Bag of Words
Introduction
Adapted with permission from Fei Fei Li -
http//people.csail.mit.edu/torralba/iccv2005/
18
Adapted with permission from Fei Fei Li -
http//people.csail.mit.edu/torralba/iccv2005/
19
- Bag of Words
Introduction
  • The fact that we only use the frequencies of
    visual words, implies that this method is
    Translation Invariant.
  • This is why it is called a Bag of Words, since
    two images with the same words are identified as
    the same image

20
Breaking down the problem
  • Feature Detection
  • Feature Description
  • Feature Recognition how to find similar words
    to a query feature from a database of code words
  • Image Recognition\Classification how to find
    similar images to the query image\ how to
    classify our image

21
Adapted with permission from Fei Fei Li -
http//people.csail.mit.edu/torralba/iccv2005/
22
Feature Detection
  • We can use any feature detection algorithms
  • We can use a mixture of feature detections and
    capture more types of features
  • What is a good detection?
  • Invariant to rotation, illumination, scale, shift
    and transformation

23
Feature Description
  • What is a good descriptor?
  • Invariant to different view points, illumination,
    scale, shift and transformation
  • The image recognition is rotation or scale
    invariant if the detector descriptor are as
    well.

24
Feature Description
  • SIFT descriptor
  • Local Frames of Reference

25
- SIFT
Feature Description
  • We determine a local orientation according to the
    dominant gradient
  • Define native coordinate system
  • We take a 1616 window and divide it into 16 44
    windows
  • We then compute the gradient orientation
    histogram of 8 main directions for each window

26
- SIFT
Feature Description
  • Properties
  • Rotation invariant

27
Feature Description
- Local Affine Frames of Reference
  • Works together with Distinguished Regions
    detector
  • Assumption Two frames of the same objects are
    related by affine transformation
  • Idea Find an affine transformation that best
    normalizes the frame.
  • Two normalized frames of the same object will
    looks similar

St?epan Obdr?zalek,Jiri Matas. Object
Recognition using Local Affine Frames on
Distinguished Region
28
- Local Frames of Reference
Feature Description
  • Properties
  • Rotation invariant (depending on the shape)
  • Brings different features of the same object to
    be similar A great advantage!
  • Could test similarity of features with great
    efficiency

29
- How to normalize?
Feature Description
  • In affine transformation we have 6 degrees of
    freedom, that can enforce 6 constraints
  • An example to the constraints
  • Rotate the object around the line from the center
    of gravity to the most extreme point

30
Fast Feature Recognition
  • A reminder
  • Given a database of code words and a query
    feature, we find the closest code word to the
    feature

Query Feature
Database
31
- Inverted File
Fast Feature Recognition
  • Each (visual) word is associated with a list of
    (images) documents containing it

? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
32
- Inverted File
Fast Feature Recognition
  • Each image in the database is scored according to
    how many common features it has with the query
    image.
  • The image with the best score is selected
  • Also note, that in order for the object to be
    recognized successfully (compete with background
    regions) it need to be large enough (at least ¼
    of image area)

33

Fast Feature Recognition
  • Why do we need different approaches?
  • Why cant we just use a table?
  • There could be too many visual words and we want
    a fast solution!

? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
34
Fast Feature Recognition
  • Three different approaches
  • A small number of words
  • Vocabulary Tree
  • Decision Tree

35
- A small number of words
Fast Feature Recognition
  • Construction of the vocabulary
  • We take a large training set of images from many
    categories
  • Then form a codebook containing W words using the
    K-means algorithm
  • Recognition phase
  • We sequentially find the nearest neighbor of the
    query feature

R Fergus, L Fei-Fei, P Perona, A Zisserman.
Learning Object Categories from Googles Image
Search. ICCV 2005.
36
- A small number of words
Fast Feature Recognition
37
- A small number of words
Fast Feature Recognition
38
- A small number of words
Fast Feature Recognition
  • Pros
  • Going sequentially over the words leads high
    accuracy
  • Space efficiency - we save only a small number of
    words
  • Cons
  • A small number of words doesnt capture all the
    features

39
- K-Mean Clustering
A small Detour
  • Input
  • A set of n points x1,x2,,xn in a
  • d-dimensional feature space (the descriptors)
  • Number of clusters - K
  • Objective
  • To find the partition of the points into K
    non-empty disjointed subsets
  • So that each group consists of the descriptors
    closest to a particular center

40
- K-Mean Clustering
A small Detour
  • Step 1
  • Randomly choose K equal size sets and calculate
    their centers

m1
41
- K-Mean Clustering
A small Detour
  • Step 2
  • For each xi
  • Assign xi to the cluster with the closest center

42
- K-Mean Clustering
A small Detour
  • Step 3 Repeat until no update
  • Compute the mean (mass center) for each cluster
  • For each xi
  • Assign xi to the cluster with the closest center

43
- K-Mean Clustering
A small Detour
  • The final result

44
- Vocabulary Tree
Fast Feature Recognition
  • Idea
  • Use many visual words capture all features
  • But since we cant sequentially go over a large
    number of words well use a tree!

D Nister, H Stewenius. Scalable Recognition with
a Vocabulary Tree. CVPR06. 2006.
45
- Vocabulary Tree
Fast Feature Recognition
  • Construction of the vocabulary
  • Input A large set of descriptor vectors
  • Partition the training data into K groups, where
    each group consists of the descriptors closest to
    a particular center
  • Continue recursively for each group up to L
    levels
  • Recognition phase
  • Traverse the tree up to the leaves which will
    hopefully contain the closest word

46
- Vocabulary Tree
Fast Feature Recognition
47
- Vocabulary Tree
Fast Feature Recognition
48
- Vocabulary Tree
Fast Feature Recognition
49
- Vocabulary Tree
Fast Feature Recognition
50
- Vocabulary Tree
Fast Feature Recognition
51
- Vocabulary Tree
Fast Feature Recognition
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
- Vocabulary Tree
Fast Feature Recognition
  • Pros
  • We can save many visual words and thus capture
    more features
  • Cons
  • We cant go sequentially over all words not
    perfectly accurate
  • Space we need to save many words

69
- Decision Tree
Fast Feature Recognition
  • Idea use a tree, but on each non terminal node
    make a very simple check
  • In order to overcome accuracy problems
  • We can save some frames in both subtrees
  • We need to recheck similarity when we reach to
    the leaves

S Obdrzalek, J Matas. Sub-linear indexing for
large scale object recognition. Proc. British
Machine Vision Conference, 2005.
70
- Decision Tree
Fast Feature Recognition
  • Assume Local Frames of Reference Descriptor was
    used
  • A very simple check in each non terminal node we
    check only one pixel and compare it to some
    threshold
  • The leaves are associated with a list of frames
  • The affine transformation is not perfect
  • Frames close to the threshold saved in both
    subtrees
  • We must recheck similarity of the frames in the
    leaves

71
- Decision Tree
Fast Feature Recognition
  • The tree construction
  • Every node gets a set of frames
  • If the number of frames is below some threshold
    or indistinguishable, create a leaf
  • Else find a weak classifier
  • All frames below or close to the threshold in
    pixel x, are added to left list
  • All fraes above or close to the threshold in
    pixel x, are added to right list
  • Continue recursively

72
Weak Classifier
- Decision Tree
Fast Feature Recognition
  • The goal minimize the expected recall time for
    query. we need to find that on average
    we reach the leaves in minimal time
  • Two requirements
  • The tree is balanced
  • The number of ambiguous frames that are stored in
    both subtrees is minimized.

73
- Decision Tree
Fast Feature Recognition
  • Recognition Phase
  • Traverse the tree according to the weak
    classifiers
  • Check similarity to the frames in the leaf

74
- Decision Tree
Fast Feature Recognition
  • Pros
  • We can save many visual words and thus capture
    more features
  • A very efficient test in each non terminal node
  • Cons
  • Since the normalization transformation is not
    perfect It wont work on all frames
  • Ambiguous saving
  • We need to make another check in the leaves

75
Fast Image Recognition
  • Inverted Files
  • Probabilistic approach
  • Implementation of pLSA Google Image Search

76
Fast Image Recognition
  • After recognizing the features in our query
    image, our aim is to recognize an image in the
    database or a group of images (classify) that is
    most similar to our query image.

77
Inverted Files
  • Each (visual) word is associated with a list of
    (images) documents containing it along with
    frequency and positions of (visual) words
  • Analogous to search engines ( )

78
Inverted Files cont.
  • Results are sorted according to a complex scoring
    method that google calls Page Ranking

79
Inverted Files Recognizing an Image
  • The list of features in our query image provides
    us with a list of corresponding images.
  • The task is to rank those images according to
    their similarity to the query image
  • Similarity is based on common words between query
    and DB image.

80
Inverted Files Voting
  • Each word independently votes on the relevance of
    images.
  • To improve recognition we add weights to the
    different words.

81
Inverted Files Possible Scoring of Words
Frequency of word i in query image
Number of Documents in Database
Word i in query image
Number of Documents containing word i
  • Considers how frequent the word is in the
    database. The more rare the word is, the higher
    score it will get (because it is more
    discriminative).
  • This is similar to the entropy of a word.

82
Inverted Files Scoring of images
Query Image
Database Image (Document)
  • q and d are vectors of the frequencies of all
    words in an image.
  • Similarity score between a DB image and query
    image
  • Lower score is better match.
  • Rare words, when their frequency does not match,
    contribute much more to the score.
  • Need to go over all words - implementation is not
    obvious with inverted files

83
Inverted Files Scoring Fast
Vectors are normalized
  • Score is a function of common words only.
  • Scoring is now straightforward with inverted files

84
Inverted Files Scoring Implications
  • This part can be done fast
  • Allows scaling of database without linear growth
    in search time because common features are rare
    (with a big vocabulary) for non-matching images.
  • Score is 0,2 with 0 the best match, 2 for
    images with no common features.

85
Experiment Effect of Vocabulary Size
  • 1400 image database (small)
  • Inverted Files
  • Detector MSER Maximally Stable Extremel
    Regions
  • Descriptor SIFT
  • Hierarchical k-means clustering

David Nister Henrik Stewenius 2006
86
Experiment Effect of Vocabulary Size
  • Best parameters
  • 1 million words, 6 levels tree
  • L1 Normalization Scheme
  • 90 success on first hit
  • Non-Hierarchical
  • 10,000 words, one level
  • Much slower (linear)
  • Only 86 correct on first hit
  • Good performance requires large vocabulary!

David Nister Henrik Stewenius 2006
87
Experiment Scalability
  • Performance scales well with database size
  • Losing lt5 hit rate while expanding the database
    100-fold (green curve is more realistic scenario)

David Nister Henrik Stewenius 2006
88
Demonstration Robustness
  • CD Cover is photographed using a digital camera.
  • Severe occlusions, specularities.
  • Viewpoint, Rotation and Scale are different.
  • CD Cover is identified in real-time from a
    database of 50,000 images.

David Nister Henrik Stewenius 2006
89
Summary
  • Big vocabulary can be used effectively and
    quickly for high performance image
    identification.
  • Using inverted files, database can be scaled
    relatively easy.
  • Bag of Words model allows for identification
    under noisy conditions.

David Nister Henrik Stewenius 2006
90
Bag of Words Drawbacks
  • Positional information is not taken into account.
  • These two images have the same frequency of
    words. Is this the best we can do?
  • In fact, in real life, the interaction between
    the features is important.

91
Probabilistic approach TSI-pLSA
  • Translation and Scale Invariant pLSA
  • Well try to guess a window surrounding our
    object.
  • Well use the position, as well as the frequency,
    of a word, to identify the object.
  • Well do all of this in an unsupervised manner.
  • First, what is pLSA?

R. Fergus et al. 2005
92
Probabilistic approach pLSA
  • Probabilistic, Unsupervised approach to object
    classification.
  • Probabilistic Instead of identifying a single
    topic in the image we view an image as a
    collection of topics in different proportions.
    e.g. 50 motorbike, 50 house.
  • Unsupervised Topics are not defined in advance,
    rather they are learned from the database.

R. Fergus et al. 2005
93
Probabilistic approach Image is made of Topics
D 7

? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ?
W 6
94
Probabilistic approach Image is made of Topics
Z 2
Car Horse
1 0
0 1
0 ½
0 1
1 0
? 0
D 7

Car ? ? ? ? ? ? ?
Horse ? ? ? ? ? ? ?
Z 2
W 6
95
Probabilistic approach The model pLSA
  • Distribution of words in documents is associated
    by a single variable z that represents the topic.
  • Assumption is that distribution of words is
    independent of specific document function of
    topic in document.

96
Probabilistic approach Topics and Words
  • Each topic (z) is characterized by its own
    frequencies of visual words
  • Each image (d) contains a mixture of topics,

97
Probabilistic approach Learning
  • In Learning we compute the best values for
  • EM is used to maximize the likelihood of the
    model over the data.

98
Probabilistic approach Learning
  • In Learning we compute the best values for
  • EM is used to maximize the likelihood of the
    model over the data.

Likelihood of data
Computed from the model
99
Probabilistic approach Topics
  • Note that learning is unsupervised.
  • Topics can be thought of as a group of features
    that tend to appear together in an image.
  • Only the number of topics is provided in advance.

Topic
1 2 3 4 5 6 7 8
Results of 8-topic learning with images of
motorbikes
100
Probabilistic approach Topics Example
  • The parts of a motorbike tend to appear together
    in an image and therefore, most likely, will be
    grouped under one topic.
  • Images sorted by their prominent topic show under
    Topic 7, the most images of motorbikes
  • Well name Topic 7 Motorbikes our classifier.

Topic
1 2 3 4 5 6 7 8
101
Probabilistic approach Recognition
  • In recognition, is locked after learning phase.
  • Using EM, is guessed.

102
Probabilistic approach Recognition Example

1 2 3 4 5
Topic
  • In recognition, we identify the most likely
    distribution of topics.
  • If topic 4 was our classifier for faces wed
    classify this image as Face

103
Probabilistic approach Choosing Topics
  • Number of topics is chosen in advance as a
    parameter.
  • It is not related to the actual number of object
    classes in the data.
  • We still need to pick the topic that best
    describes our object class.
  • One option is to pick by hand the best topic.

104
Probabilistic approach Validation set
  • Or we can use a validation set - a few high
    quality images, and automatically pick the single
    best classifier for this validation set.
  • In the future, a combination of topics could be
    used to give superior classification.

105
pLSA - Summary
  • Word frequency is a function of Topics in image.
  • Topics are learned and recognized using EM
  • Positional information is still not taken into
    account.

106
Probabilistic approach TSI-pLSA
  • Well guess a window surrounding our object.
  • Well use the position of a word, relative to the
    object, in the model.

107
Probabilistic approach Object Boundaries
  • c describes the boundaries of an object.
  • P(c) is Calculated by fitting a mixture of
    Gaussians to the features.
  • Center of the object is given as the mean of the
    Gaussian.
  • Scale is given by the variance.
  • Features are weighted by P(wz) for a given topic
  • This is repeated with k1,2,.K for all possible
    bounding boxes.

Image of two planes Fitting a Mixture of
Gaussians with k2 to the features, weighted by
their color, would give the two centers of the
planes.
108
Probabilistic approach Word Position
  • x describes the position of a word in relation to
    the object.
  • Locations are quantized to 36 internal positions
    and one background position.

109
Probabilistic approach Word Position
  • Word positions (x) are used in the model

110
Probabilistic approach Robustness
  • Hopefully, recognizing object centroids should
    allow TSI-pLSA to be scale invariant while
    preserving its translation-invariance.

111
Probabilistic approach Results
  • Z 2 (Number of topics)
  • K 2 (maximum of Gaussians)
  • Airplanes
  • R Fergus et al. 2005

112
Probabilistic approach Centroid selection
  • Red is first topics. It was found to correspond
    to airplanes
  • Green is second topic.
  • Bounding boxes are suggested centroids.
  • Solid rectangles are centroids with highest
    likelihood.

113
Probabilistic approach TSI-pLSA cont.
  • In learning, the results of moving the sub-window
    over all possible locations, c, are aggregated.
  • In addition, P(c) is a flat distribution since
    there is no more confidence in a single position
    over the other.

114
Probabilistic approach Parameters
  • Need to specify number of topics no theory on
    optimal number
  • Many parameters for the model, example
  • 350 visual words
  • 37 discreet positions (6x6 grid background
    position)
  • 8 topics (irrespectable of the way we divide the
    dataset)
  • 350x37x8 103,900 parameters to learn
  • Need to provide many data points
  • 500 images
  • 700 features in each image
  • Only 350,000 data points. That is only 3
    datapoints/parameter.

115
Probabilistic approach Summary
  • Number of topics needs to be specified in
    advance.
  • Images do not have to be annotated individually.
  • Group of images should contain a few object
    classes. Otherwise topics would be meaningless.

116
Usefulness?
  • So, we could use a database of airplanes and
    faces but we do not need to know which is which.
  • How does that help us???
  • TSI-pLSA can be used to improve on google image
    search.

117
Google image search
  • Returns many thousands of images.
  • Currently, many are low quality (bad viewpoint,
    small objects, junk)

118
Google image search Quality images
  • This is for the entire set of images returned for
    each keyword
  • Green is good images
  • Intermediate means some relevance (like a
    cartoon)
  • Bad means junk image, unrelated.

119
Google image search Validation set
  • Needs a set of images to serve as the ground
    truth to identify the best topics.
  • Could choose some 20 images by hand.
  • Empirically, the top 5 images returned by google
    search are usually good quality.
  • Could use Google Translate to obtain the top 5
    images in every language.
  • For example, 30 images of airplanes in 6
    languages

120
Google image search Probabilistic Approach
Results (TSI-pLSA)
  • 7 Keywords (datasets)
  • 600 images/keyword
  • Validation set 30 images/keyword
  • Z8 (Number of topics)
  • A single topic was chosen as classifier from the
    8 topics for each keyword using the validation
    set.
  • Descriptor SIFT
  • Vocabulary 350 words.

R. Fergus et al. 2005
121
Results cont.
  • Tested on pre-annotated set. (by hand)
  • Each labeled image had to be classified according
    to the pre-determined classifier for each
    category
  • Notable problems were (A)irplanes, half of them
    classified as (C)ars Rear.
  • Both (F)aces and (W)ristwatches classified as
    (G)uitars

122
Google image search Improving
  • Used TSI-pLSA to reorder the images returned by
    google according to the chosen topic.
  • Graph shows how many of 15 top hits are relevant
  • Note that since this method is completely
    unsupervised it could be used immediately to
    improve the search results of google image
    search, although at a high computational cost.

123
Google image search Automatic Classification
  • Give a keyword.
  • Use google image search to find images somewhat
    related.
  • Use TSI-pLSA to classify the images by topics.
  • Use validation set to choose best topic.
  • Classify any image in the world.

124
Summary
  • Probabilistic method uses results returned from
    image search engine, despite being extremely
    noisy, to construct automatically classifier for
    objects.
  • This classifier uses frequency and position of
    visual words to identify the most likely topic of
    the object in the image.
  • This topic can be further used to improve the
    relevance of such an image search.
  • Positional data not always helpful (results not
    shown). In some cases using only the frequencies
    of the words gives better performance.

125
References
  • R Fergus, L Fei-Fei, P Perona, A Zisserman.
    Learning Object Categories from Googles Image
    Search. ICCV 2005.
  • D Nister, H Stewenius. Scalable Recognition with
    a Vocabulary Tree. CVPR06. 2006.
  • J Sivic, A Zisserman. Video Google a text
    retrieval approach to object matching in videos.
    Computer Vision, 2003.
  • S Obdrzalek, J Matas. Sub-linear indexing for
    large scale object recognition. Proc. British
    Machine Vision Conference, 2005.
  • Li Fei-Fei, Rob Fergus, Antonio Torralba.
    Recognizing and Learning Object Categories. ICCV
    2005 short courses. http//people.csail.mit.edu/to
    rralba/iccv2005/
Write a Comment
User Comments (0)
About PowerShow.com