Contentbased Image Retrieval - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Contentbased Image Retrieval

Description:

Contentbased Image Retrieval – PowerPoint PPT presentation

Number of Views:625
Avg rating:3.0/5.0
Slides: 69
Provided by: cecsA
Category:

less

Transcript and Presenter's Notes

Title: Contentbased Image Retrieval


1
Content-based Image Retrieval
2
  • Retrieval
  • The process of getting back information that has
    been stored in a database
  • Image Retrieval
  • User, search engine, and database

3
  • Text-based image retrieval
  • Use keywords or captions,
  • interpret with high-level semantics,
  • retrieval by matching the keywords

Retrieval result of Google Images for Airplane
4
  • However,
  • Labelling is time consuming and expensive
  • Only partially describe the visual content
  • Humans subjectivity

sky (50 points), bird (60 points), soaring (120
points), or frigate bird (150 points).
5
  • Content-based image retrieval
  • Annotators are replaced by computers
  • Keywords are replaced by low-level visual
    features (Color, texture, shape)
  • Retrieval by comparing the visual features
  • Semantic gap
  • A critical issue in content-based image retrieval
  • Low-level visual features used by computers
  • High-level concepts used by human

6
  • Three retrieval modes
  • Target search (Pattern matching)
  • Search for a specific image
  • Category search (Most studied in CBIR)
  • Search for images from a particular class
  • Association search (Most difficult)
  • No specific target at the beginning of retrieval

Vincent Van Gogh Starry Night (1889)
7
  • The image domain in CBIR
  • Narrow domain
  • Limited and predictable variability in visual
    content
  • Domain knowledge can be very helpful
  • Semantic gap is small
  • Broad domain
  • Unlimited and unpredictable variability in visual
    content
  • Less domain knowledge for CBIR
  • Semantic gap is large

8
  • Visual features
  • Invariant vs. discriminative
  • local vs. global
  • Color
  • Color space RGB, Lab, HSV,
  • Color-based feature Color histogram, color
    moments,
  • Texture
  • Co-occurrence matrix, Tamura texture
    representation, Wavelet transform based
    representation,..
  • Shape
  • Boundary-based or region-based
  • Chain code, Fourier descriptor, Moment
    invariants, Finite element method,
  • Salient features
  • Focus attention on the most salient fragments or
    objects,
  • Local invariant features
  • Structure and lay-out
  • Spatial relationship between feature values,
    point sets, or object sets.
  • Graph-based description

9
  • Measure image similarity
  • Derives interpretation from the features
  • Measure with a similarity function
  • Euclidean distance
  • Mahalanobis distance
  • Minkowski distance
  • Distance between histograms
  • Minkowski distance, Histogram intersection,
    Kullback-Leibler divergence between
  • distributions, Chi-square statistics,
    Quadratic-form distance, Cumulative histogram
  • distance, Earth-mover distance,
  • Similarity of structural features, Similarity of
    salient features,
  • Similarity of two shapes,

A demonstration system
10
  • Learn a similarity metric
  • Narrow down the semantic gap
  • Adapt to different retrieval sessions
  • Deal with the subjectivity of human perception on
    visual content
  • Interaction between users and a retrieval system

11
  • Relevance feedback
  • Originates from text information retrieval
  • Judge the relevance of initial retrieval result
    and refine the query
  • Explicit feedback
  • User ranks or labels the retrieved documents
  • Implicit feedback
  • Read or not, how long the user read this document
  • Pseudo feedback
  • Assume the top k retrieved documents are relevant
  • Rocchio algorithm

12
  • Image retrieval with relevance feedback
  • Ask the user to evaluate the retrieved images and
    feed the evaluation back to the retrieval system
  • The system responses to the evaluation and give
    refined result
  • Explicit feedback dominates in CBIR
  • Relevant (positive) vs. Irrelevant (Negative)
    (most popular)
  • Multi-level score
  • Ranking
  • Relative judgment (Image A is more relevant than
    B)

13
  • What can be learned from user feedback
  • Refine the query
  • The importance of different features in retrieval
  • Combination of different visual features
  • Color, texture, shape,
  • The subspace where the relevant images lie
  • PCA, Kernel PCA, MDS,
  • A classifier separating relevant images from
    irelevant ones
  • Linear classifier (Fisher discriminant analysis)
  • Quadratic classifier (Model with Gaussian
    distributions)
  • Neural Networks
  • Support Vector Machines, Kernel-based classifier,
    Boosting,

14
  • The small sample problem
  • Nobody is willing to label many images in a
    retrieval session
  • The scarcity of labeled images prevents efficient
    and reliable learning
  • The class of irrelevant images is too
    heterogeneous to be represented by a small number
    of labeled images
  • Existing ways to deal with the small sample
    problem
  • Take assumptions about data distributions
  • Perform dimensionality reduction
  • Active learning
  • Learning with unlabelled data (Semi-supervised
    learning)
  • Incorporate prior knowledge
  • Long-term relevance feedback learning

15
  • Image retrieval with active learning
  • Goal Learn most but label least
  • What should be presented to the user for
    labeling?
  • Most relevant images vs. Most uncertain images
  • Active learning
  • A classifier, for example,
  • An initial labeled data set
  • A pool of unlabelled data
  • A query function
  • Goal maximize the classification accuracy with
    minimal labeling chance
  • Present the user with most uncertain images

16
  • Incorporate prior knowledge into CBIR
  • Learning Prior knowledge (domain theory)
    observation
  • Prior knowledge should be built into the design
    of a classifier
  • Knowledge into CBIR
  • PK1 For a randomly sampled unlabelled image,
    before seeing any additional evidence, its label
    is negative with higher probability
  • PK2 The negative image samples collected from
    relevance feedback do not efficiently represent
    the true distribution of negative image class,
  • PK3 For a randomly sampled unlabelled image, its
    probabilities of being positive can be predicted
    based on the given query and retrieval history

17
  • A general model of knowledge-based CBIR
  • Given
  • Training sample set
  • Prior knowledge set
  • A classifier
  • Find
  • which best
    fits both
  • That is,
  • Where denotes a loss
    function, is a regularization parameter

18
  • Kernel biased discriminant analysis
  • When finding a discriminative projection
    direction, different strategies are applied to
    relevant and irrelevant image classes
  • Relevant image samples are required to be well
    clustered after projection
  • Irrelevant image samples are only required to be
    far from the relevant images
  • S1 the scatter matrix of irrelevant images to
    the mean of relevant images
  • S2 the scatter matrix of relevant images to the
    mean of relevant images

19
  • Long-term learning from user feedback
  • The log of user feedback from different users and
    different retrieval sessions
  • It shows the consistency of human perception on
    the relationship between images
  • Also includes the subjectivity of human
    perception ---- the relationship may be quite
    noisy.
  • Boost the accuracy of the initial retrieval
    result to a higher level, and the user feedback
    is only for adjusting
  • Some research work has been seen on using log
    data in recent years, but it is still in the
    initial stage
  • Advanced machine learning and data analysis tools
    are needed to handle the vast amount of noisy
    data to extract the information behind them.

20
  • Concluding remarks on CBIR
  • The CBIR has passed its early years and now
    concentrate on deeper problems
  • On narrow domain, the retrieval performance can
    be quite satisfactory. However, the retrieval on
    broad domain still faces many problems
  • For generic CBIR, human is an indispensable
    component in a retrieval system
  • Incorporating the information in the log data
    will be a promising way to significantly improve
    the retrieval performance
  • Learning to narrow down the semantic gap
  • Integration of text-based retrieval with
    content-based ones.

21
Object retrieval in a supermarket--- Where Is
The WeetBix?
  • Efficiently locate an item in a supermarket
  • Formulated as an object retrieval problem
  • Query an image of the item you are looking for
  • Database the images of all the shelves in a
    supermarket

22
(No Transcript)
23
  • Characters of this retrieval problem
  • Target search (Pattern matching)
  • Relatively narrow domain (Images within a
    supermarket)
  • No relevance feedback
  • Query image only occupies a small area of the
    database image---all the other areas are actually
    clutters
  • Large scale difference between query image and
    database image
  • In each database image, there are often multiple
    copies of each item
  • Each database image is full of striking signs and
    patterns with all sorts of colors

How to find it?
24
  • A local invariant feature approach
  • Interest points/regions in an image
  • Invariant to scale and affine transformations
  • Robust to illumination and viewpoint changes
  • Corners, edges, ridges, multi-junctions,
    blob-like structures,
  • Repeatability
  • Descriptor of the detected regions
  • Each image becomes a Bag of descriptors

(Mikolajczyk Schmid, ECCV02)
25
Local invariant feature detection Hessia
n Affine Detector
26
The detected image region
27
A close-up view
28
A close-up view
29
  • Describe the visual content in a detected region
  • The SIFT descriptor (Histogram of local
    gradients)

29 0 1 2 0 0 8 24 132 1 26 14 89 1 0
30
2272 by 1704
20,000 SIFT feature
52 million SIFT feature
Database
3000 images
31
  • Each image becomes a Bag of decriptors
  • How to match the query with the database image?
  • A straightforward point-to-point comparison is
    computationally infeasible

32
  • Visual vocabulary
  • The descriptors from all images are clustered to
    k clusters. Each cluster is called a visual
    word.
  • Each image is projected as a k-dimensional
    histogram, representing the distribution of the
    visual words on this image
  • More efficient way to evaluate the similarity of
    images

33
52 million SIFT descriptors
Database
After normalized, each SIFT feature will map to a
point on a unit sphere in 128-d space.
34
52 million SIFT descriptors
1
100
10K
1M
Hierarchical clustering via k-means
35
Descriptors in the same cluster have similar
visual appearance ----- a visual word. If two
images contains the descriptors falling in the
same cluster, they share this visual word.
Some visual words
36
  • Each image becomes a histogram

w1 shows how many word1 can be found in this image
w1 , w2, , w1M

1M
Actually, this vector is very sparse. Only about
20k of the items are non-zero. This
representation is much more efficiency than a
matrix of 20k by 128 full of SIFT descriptors
37
1M
w1, w2, , w1M
38
  • Similarity between two images
  • counting the number of non-zero dimensions
    shared by the two sparse vectors
  • or taking the exact number of each visual words
    into account.

im1 0 0 1 0 1 0 0 3 0 im2
1 0 2 0 0 1 0 1 1
sum 0 0 1 0 0 0 0 1 0
39
However, In text categorization, when
identifying an file some of the words are not as
descriptive as others, such as the, and,
is, etc. Same situation exists in visual
words, too. How can we know what visual words
are discriminative, what visual words are
not?
40
Previous work in text categorization has a good
solution. A weighting standard named term
frequency-inverse document frequency
(tf-idf) nid --- the number of occurrences
of word i in the document d, nd --- the total
number of words in the document d, ni --- the
number of occurrences of term i in the whole
database N --- the number of documents in the
whole database ti --- the weight of word i.
41
  • Stop list
  • The weight of some visual words are extremely
    high, whereas some of them are extremely low.
    They need to be ignored in retrieval because
  • extremely low means such a visual word is so
    non-discriminative that it can help nothing in
    retrieval.
  • extremely high means such a visual word will
    dominate similarity evaluation.
  • A widely used practice for stop list is to ignore
    5 of the visual words with the highest or lowest
    weights.

42
  • A loose geometric constraint
  • The local features are much richer in a database
    image.
  • As a result, a high matching score can be easily
    obtained between a query image and most database
    images
  • Partitions a database image into 25 sub-images
    with overlapping, each of which is one ninth of
    the original one
  • A sub-image is large enough to contain the object
    to retrieve but has much less background clutter
  • Neighboring sub-images have the half area
    overlapped to reduce the risk of separating one
    object into two sub-images.

43
  • Similarity measure

44
One last issue, the speed
1M ( sparse )
im1,1 im1,2 im1,2m im2,1 . . . im3000,1
im3000,2m
3000

Although we can take advantage of the character
of sparse, scanning through 3000 images will
still cost some time.
45
What does an invert file mean?
1M ( sparse )
im1,1 im1,2 im1,1M im2,1 . . . im3000,1
im3000,1M
?
3000

46
Actually, in a normal file
we get the words it contains
given image id
im1,1 im1,2 im1,1m im2,1 . . . im3000,1
im3000,1m

but in an invert file
given an visual word, we know the id of images
containing it!
47
with visual vocabulary and invert file, we can
search through a database containing 52 million
128-dimension features with in 0.025 seconds.
48
(No Transcript)
49
(No Transcript)
50
A
B
Spatial consistence check
51
A
B
52
A
B
53
A
B
23
23
54
A
B
11
260
23
23
112
57
26
108
55
A
B
26
11
23
23
112
57
26
11
56
A
B
57
A
B
58
A
B
59
A
B
60
A
B
61
A
B
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
  • Thanks!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com