ContentBased Image Retrieval at the End of the Early Years - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

ContentBased Image Retrieval at the End of the Early Years

Description:

Local Shape refers to the identification of conspicuous geometric details. Texture is... Difficult except in very narrow domains such as mugshots ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 35

Provided by: scottd62

Category:

more less

Transcript and Presenter's Notes

Title: ContentBased Image Retrieval at the End of the Early Years

1
Content-Based Image Retrieval at the End of the
Early Years

AWM Smeulders, M Worring, S Santini, A Gupta, R
Jain

2
Overview

Scope domain, sources of knowledge
Image processing color, texture, shape
Image partition and feature identification
Image interpretation and similarity assessment
Use query, display, and interaction

3
Applications

Browsing large set of images
search by association
Looking for interesting things
Typically start with sketch or example
Iterative refinement of search
Target Search for
precise copy of an image (art catalogs)
another image of a particular object
Image similar to example images (general catalog
search)

4
Applications

Category Search
Looking for arbitrary image from a specific class
Search may be based on example(s)
Usually applied in specific domain with clear
definition of similarity
See Fig. 3

5
Image Domains

A narrow domain has a limited and predictable
variability in all relevant aspects of its
appearance.
Usually recording circumstances similar
Semantics well-defined and unique
Examples lithographs, frontal views of faces
A broad domain has an unlimited and unpredictable
variability in its appearance even for the same
semantic meaning.
Images polysemic, with partial semantic
descriptions
Objects with unknown class multiple possible
interpretations

6
Image Domains

Breadth of domain helps describe pattern of use,
selecting features, designing systems
Narrow domains have smaller gap between features
and interpretation
computational techniques may close that gap
Broad domains can only be described at high
levels
characterize illumination, occlusion, perspective

7
The Sensory Gap

The gap between the object in the world and the
information in a (computational) description
derived from a recording of that scene
Uncertainty about state of objects in scene
More severe without knowledge of recording
circumstances
Extra information about scene, sensing

8
Domain Knowledge

Image search and retrieval relies on explicit
knowledge of the image domain
Literal equality and similarity
Considering pixels/features regardless of
physical or perceptual causes (eg images with
blue in upper parts may be outdoor scenes)
Reasons for similarity irrelevant to this kind of
knowledge purely syntactic
Human perception of equality and similarity
Some color spaces designed to match human
perception of color similarity
Determination of conspicuous features or
regions may be determined by analysis of human
perception

9
Domain Knowledge

Differences in sensing and object surface
Physics of illumination, reflection, etc
Allow description of color features independent
of illumination, viewpoint
Patterns in space
Geometry, independent of surface or sensing
Eg objects near horizon appear smaller
Category-based knowledge
Characteristics common to a particular class
Most useful in narrow domains
Eg medieval art attaches particular meaning to
color and relative position
Man-made customs, culture, language
See Fig. 5

10
The Semantic Gap

The lack of coincidence between the information
that one can extract from the visual data and the
interpretation that the same data have for a user
in a given situation
Impossible to describe images linguistically
Unknown how to automatically recognize objects in
an image
Research focus on associating high-level
semantics with image data
Current solutions labeling (expensive, often
incomplete) and association with contextual text
User seeks semantic similarity database only
provides data processing

11
Scope

Target search related to problem of pattern
matching
Complicated by huge search space, incomplete
query specification, incomplete image
description, varied sensing conditions
Category search related to problem of object
recognition and statistical pattern matching
Complicated by huge number of object classes,
interaction, absence of learning/tuning.
Search by association limited by semantic gap

12
Image Processing

Goal enhance query-relevant aspects of image
data reduce the rest
One tool invariance
Deal with distortions introduced by the sensory
gap
Invariant features may carry more relevant
information
May allow identification of objects, though
losing some information content (which could be
OK)
Invariant shouldnt be so broad that it cant
distinguish among essential features

13
Color Image Processing

Two main issues
Recorded color very sensitive to sensing
conditions
Human perception of color (esp. similarity) is
complex
RGB colorspace not great for real world
applications
Other colorspaces offer more invariance
Opponent color scheme two chroma axes can be
downsampled due to perceptual low significance,
invariance to illumination and shadow
HSV hue invariant under orientation of object
with respect to light source and camera

14
Color Image Processing

Based on studies of reflection, other color
spaces developed that are robust against changes
in viewpoint
Human perception of color constancy despite
changes in illumination can be modeled with
carefully determined color maps

15
Shape and Texture Image Processing

Local Shape refers to the identification of
conspicuous geometric details
Texture is anything other than color and shape
Typically composed of units too small to be
perceived individually
Separating color/shape/texture may not be helpful
Only in combination can they help distinguish
within large image sets

16
Description of Content

Often desirable to divide image into subregions
or sets of points
Supports more precise description
Several ways to partition
Strong segmentation divide image into regions
that exactly represent silhouettes of objects
Difficult except in very narrow domains such as
mugshots
Weak segmentation divide image into regions with
similar characteristics hopefully objects are
contained within each region
Partitioning dividing up image regardless of
data/content

17
Features

Emerge from image division
Accumulating features operate across the entire
image or tiles of the image
Color histogram
Possibly augmented with texture or distance
information
Compression transforms
Intention preserve all relevant information
Should preserve distance information
Should carry semantics of image components
Should allow indexing of compressed image

18
Features

Salient features
Weak segmentation yields homogenous regions
Store information about the most conspicuous
regions
Goal store information most robust to changes
Often based on invariants
Yields regions or points with known location
Signs
Content of an image with one clear meaning
Icon, character, trademark
Semantic connection is clear

19
Features

Shape and Object Features
Often desirable to extract object-specific
information
Relies on effective segmentation, which can be
hard
May suffice to detect shapes in the image and
infer presence of object
Relies on techniques such as Fourier features,
edge detections, contour modeling

20
Interpretation and Similarity

How to give meaning to features?
One possibility assign a semantic interpretation
Semantic features aim at encoding interpretations
of the image which may be relevant to the
application
Each feature may tell us something about the
range of possible interpretations of the image,
yielding a weak semantics
Alternatively, try to characterize
images/features in terms of similarity

21
Similarity

Could be the distance between two features,
considered as vectors
Could be a probabilistic measure based on
psychological modeling of stimulus, noise
Could be any one of a measure of distance between
histograms
Or a measure of similarity of object shapes
Or a measure of similarity of layout or structure
(esp in narrow domains)
Or a measure of the identity of salient points

22
Interaction

User interaction with an image database more
complex than traditional information retrieval
Image meaning only emerges from context (this is
the semantic gap again)
Process depends on the user, the precise set of
images, and semantic interpretations
Depends on query space, within which specific
queries are specified and further refined through
interaction

23
Query Space

A query may have 4 components
Identification of a subset of images from a large
archive
E.g. in terms of owner, date of creation, URL
Selection of features along which to reach the
goal
At minimum, user should be able to indicate
shape/color/texture as being important or not
User may also be capable of indicating interest
in invariant qualities (eg viewpoint)
Similarity function
User may provide weights indicating how important
various features are in determining similarity
Labels that indicate semantics related to goal

24
Initial Query Space

At first, shouldnt bias domain, distance, so
Start with entire set of images
Features should be normalized to be equally
important
Unbiased similarity measure can be made by
normalizing similarity between individual
features to a fixed range
Assign a probability to each (image,label) pair,
rather than a certain label
Queries themselves maybe exact or approximate

25
Exact Queries

Answers will be a set of images that satisfy some
query criteria
Query by spatial predicate allows user to
describe spatial structure of image
Query by image predicate allows description of
global features of the image, eg mostly blue,
some yellow, amount of sky 50
Query by group predicate allows queries based on
labels, where the set of possible labels
partitions the set of images
Can be basis for hierarchical queries, either of
contextual information or of content

26
Approximate Queries

Allow the user to give (e.g.) a feature vector or
spatial arrangement of features no one image
will exactly satisfy the query
Query by spatial example allows user to provide
an feature-annotated sketch of what they want
Query by image example allows the user to provide
an image the answer is a set of the nearest
neighbors in feature space
Less computation if the query image is internal
Query by group example allows the user to provide
a set of images that together describe the goal
(or its opposite)

27
Display

Important to display answers in a meaningful
arrangement
If the query was exact, then the set of answers
can just be displayed in arbitrary order
If it was approximate, could just display some
number of the closest matches
Alternatively, images can be visualized within
feature space using techniques that illustrate
distance between them

28
Interaction in Query Space

Early basic model iterative revision of query
But with approximate queries, better to think of
the query space itself being modified as the
system learns the users goal
Requires some way of getting more information
from user (without big burden)
Not all of the query space is necessarily updated
Could zoom in on a subset of images
Could train the system about the desired
similarity measure
Ultimately requires dynamic indexing of database

29
Storage and Indexing

Computational performance is an issue!
Store feature vectors linearly in a file?
Some systems calculate 10,000 features per image,
or 5122 histograms of 512 bins each
Very difficult to index in high-dimension spaces
Some techniques to deal with this problem
Space partitioning
Data partitioning
Distance-based techniques

30
Space partitioning

Associate every node in a tree with a region of
feature space
If too many points in the region, split the
region into subregions that are children of the
original node
Example k-d tree, generalization of binary tree
to k dimensions
Splits full node along k dimensions and data
along the median values
Can be extended to produce balanced k-d B-tree

31
Data partitioning

Associate each point in feature space with a
region representing its neighborhood
R-tree indexes hyper-rectangles in M-dimensional
space
Leaf nodes are minimum bounding rectangles of
regions in feature space
Internal nodes are rectangles enclosing all
children
Variants have different policies about how
rectangles may overlap how nodes are split

32
Distance-based techniques

Based on particular examples (vantage points)
Feature space organized in concentric rings

33
Conclusions

Driving forces wide availability of sensors,
falling price of storage, and the Internet
Major need foundations.
Classification of usage types, purposes
Image retrieval is not the same as image
understanding
New perspective on image segmentation (or maybe
its not necessary at all)
Color image processing increasingly important as
domains get larger and broader
Human-based measures of similarity also
increasingly important

34
Many outstanding problems

New database techniques for high-dimensional
indexes, large data sets, new forms of queries
Techniques for evaluating image retrieval systems
Benchmarks, standards
Closing the semantic gap
Significant obstacle to widely usable tools
Need more complex ways of attaching other
information to images and queries
Integration of natural language processing and
computer vision?