ContentBased Image Retrieval at the End of the Early Years - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

ContentBased Image Retrieval at the End of the Early Years

Description:

Local Shape refers to the identification of conspicuous geometric details. Texture is... Difficult except in very narrow domains such as mugshots ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 35
Provided by: scottd62
Category:

less

Transcript and Presenter's Notes

Title: ContentBased Image Retrieval at the End of the Early Years


1
Content-Based Image Retrieval at the End of the
Early Years
  • AWM Smeulders, M Worring, S Santini, A Gupta, R
    Jain

2
Overview
  • Scope domain, sources of knowledge
  • Image processing color, texture, shape
  • Image partition and feature identification
  • Image interpretation and similarity assessment
  • Use query, display, and interaction

3
Applications
  • Browsing large set of images
  • search by association
  • Looking for interesting things
  • Typically start with sketch or example
  • Iterative refinement of search
  • Target Search for
  • precise copy of an image (art catalogs)
  • another image of a particular object
  • Image similar to example images (general catalog
    search)

4
Applications
  • Category Search
  • Looking for arbitrary image from a specific class
  • Search may be based on example(s)
  • Usually applied in specific domain with clear
    definition of similarity
  • See Fig. 3

5
Image Domains
  • A narrow domain has a limited and predictable
    variability in all relevant aspects of its
    appearance.
  • Usually recording circumstances similar
  • Semantics well-defined and unique
  • Examples lithographs, frontal views of faces
  • A broad domain has an unlimited and unpredictable
    variability in its appearance even for the same
    semantic meaning.
  • Images polysemic, with partial semantic
    descriptions
  • Objects with unknown class multiple possible
    interpretations

6
Image Domains
  • Breadth of domain helps describe pattern of use,
    selecting features, designing systems
  • Narrow domains have smaller gap between features
    and interpretation
  • computational techniques may close that gap
  • Broad domains can only be described at high
    levels
  • characterize illumination, occlusion, perspective

7
The Sensory Gap
  • The gap between the object in the world and the
    information in a (computational) description
    derived from a recording of that scene
  • Uncertainty about state of objects in scene
  • More severe without knowledge of recording
    circumstances
  • Extra information about scene, sensing

8
Domain Knowledge
  • Image search and retrieval relies on explicit
    knowledge of the image domain
  • Literal equality and similarity
  • Considering pixels/features regardless of
    physical or perceptual causes (eg images with
    blue in upper parts may be outdoor scenes)
  • Reasons for similarity irrelevant to this kind of
    knowledge purely syntactic
  • Human perception of equality and similarity
  • Some color spaces designed to match human
    perception of color similarity
  • Determination of conspicuous features or
    regions may be determined by analysis of human
    perception

9
Domain Knowledge
  • Differences in sensing and object surface
  • Physics of illumination, reflection, etc
  • Allow description of color features independent
    of illumination, viewpoint
  • Patterns in space
  • Geometry, independent of surface or sensing
  • Eg objects near horizon appear smaller
  • Category-based knowledge
  • Characteristics common to a particular class
  • Most useful in narrow domains
  • Eg medieval art attaches particular meaning to
    color and relative position
  • Man-made customs, culture, language
  • See Fig. 5

10
The Semantic Gap
  • The lack of coincidence between the information
    that one can extract from the visual data and the
    interpretation that the same data have for a user
    in a given situation
  • Impossible to describe images linguistically
  • Unknown how to automatically recognize objects in
    an image
  • Research focus on associating high-level
    semantics with image data
  • Current solutions labeling (expensive, often
    incomplete) and association with contextual text
  • User seeks semantic similarity database only
    provides data processing

11
Scope
  • Target search related to problem of pattern
    matching
  • Complicated by huge search space, incomplete
    query specification, incomplete image
    description, varied sensing conditions
  • Category search related to problem of object
    recognition and statistical pattern matching
  • Complicated by huge number of object classes,
    interaction, absence of learning/tuning.
  • Search by association limited by semantic gap

12
Image Processing
  • Goal enhance query-relevant aspects of image
    data reduce the rest
  • One tool invariance
  • Deal with distortions introduced by the sensory
    gap
  • Invariant features may carry more relevant
    information
  • May allow identification of objects, though
    losing some information content (which could be
    OK)
  • Invariant shouldnt be so broad that it cant
    distinguish among essential features

13
Color Image Processing
  • Two main issues
  • Recorded color very sensitive to sensing
    conditions
  • Human perception of color (esp. similarity) is
    complex
  • RGB colorspace not great for real world
    applications
  • Other colorspaces offer more invariance
  • Opponent color scheme two chroma axes can be
    downsampled due to perceptual low significance,
    invariance to illumination and shadow
  • HSV hue invariant under orientation of object
    with respect to light source and camera

14
Color Image Processing
  • Based on studies of reflection, other color
    spaces developed that are robust against changes
    in viewpoint
  • Human perception of color constancy despite
    changes in illumination can be modeled with
    carefully determined color maps

15
Shape and Texture Image Processing
  • Local Shape refers to the identification of
    conspicuous geometric details
  • Texture is anything other than color and shape
  • Typically composed of units too small to be
    perceived individually
  • Separating color/shape/texture may not be helpful
  • Only in combination can they help distinguish
    within large image sets

16
Description of Content
  • Often desirable to divide image into subregions
    or sets of points
  • Supports more precise description
  • Several ways to partition
  • Strong segmentation divide image into regions
    that exactly represent silhouettes of objects
  • Difficult except in very narrow domains such as
    mugshots
  • Weak segmentation divide image into regions with
    similar characteristics hopefully objects are
    contained within each region
  • Partitioning dividing up image regardless of
    data/content

17
Features
  • Emerge from image division
  • Accumulating features operate across the entire
    image or tiles of the image
  • Color histogram
  • Possibly augmented with texture or distance
    information
  • Compression transforms
  • Intention preserve all relevant information
  • Should preserve distance information
  • Should carry semantics of image components
  • Should allow indexing of compressed image

18
Features
  • Salient features
  • Weak segmentation yields homogenous regions
  • Store information about the most conspicuous
    regions
  • Goal store information most robust to changes
  • Often based on invariants
  • Yields regions or points with known location
  • Signs
  • Content of an image with one clear meaning
  • Icon, character, trademark
  • Semantic connection is clear

19
Features
  • Shape and Object Features
  • Often desirable to extract object-specific
    information
  • Relies on effective segmentation, which can be
    hard
  • May suffice to detect shapes in the image and
    infer presence of object
  • Relies on techniques such as Fourier features,
    edge detections, contour modeling

20
Interpretation and Similarity
  • How to give meaning to features?
  • One possibility assign a semantic interpretation
  • Semantic features aim at encoding interpretations
    of the image which may be relevant to the
    application
  • Each feature may tell us something about the
    range of possible interpretations of the image,
    yielding a weak semantics
  • Alternatively, try to characterize
    images/features in terms of similarity

21
Similarity
  • Could be the distance between two features,
    considered as vectors
  • Could be a probabilistic measure based on
    psychological modeling of stimulus, noise
  • Could be any one of a measure of distance between
    histograms
  • Or a measure of similarity of object shapes
  • Or a measure of similarity of layout or structure
    (esp in narrow domains)
  • Or a measure of the identity of salient points

22
Interaction
  • User interaction with an image database more
    complex than traditional information retrieval
  • Image meaning only emerges from context (this is
    the semantic gap again)
  • Process depends on the user, the precise set of
    images, and semantic interpretations
  • Depends on query space, within which specific
    queries are specified and further refined through
    interaction

23
Query Space
  • A query may have 4 components
  • Identification of a subset of images from a large
    archive
  • E.g. in terms of owner, date of creation, URL
  • Selection of features along which to reach the
    goal
  • At minimum, user should be able to indicate
    shape/color/texture as being important or not
  • User may also be capable of indicating interest
    in invariant qualities (eg viewpoint)
  • Similarity function
  • User may provide weights indicating how important
    various features are in determining similarity
  • Labels that indicate semantics related to goal

24
Initial Query Space
  • At first, shouldnt bias domain, distance, so
  • Start with entire set of images
  • Features should be normalized to be equally
    important
  • Unbiased similarity measure can be made by
    normalizing similarity between individual
    features to a fixed range
  • Assign a probability to each (image,label) pair,
    rather than a certain label
  • Queries themselves maybe exact or approximate

25
Exact Queries
  • Answers will be a set of images that satisfy some
    query criteria
  • Query by spatial predicate allows user to
    describe spatial structure of image
  • Query by image predicate allows description of
    global features of the image, eg mostly blue,
    some yellow, amount of sky 50
  • Query by group predicate allows queries based on
    labels, where the set of possible labels
    partitions the set of images
  • Can be basis for hierarchical queries, either of
    contextual information or of content

26
Approximate Queries
  • Allow the user to give (e.g.) a feature vector or
    spatial arrangement of features no one image
    will exactly satisfy the query
  • Query by spatial example allows user to provide
    an feature-annotated sketch of what they want
  • Query by image example allows the user to provide
    an image the answer is a set of the nearest
    neighbors in feature space
  • Less computation if the query image is internal
  • Query by group example allows the user to provide
    a set of images that together describe the goal
    (or its opposite)

27
Display
  • Important to display answers in a meaningful
    arrangement
  • If the query was exact, then the set of answers
    can just be displayed in arbitrary order
  • If it was approximate, could just display some
    number of the closest matches
  • Alternatively, images can be visualized within
    feature space using techniques that illustrate
    distance between them

28
Interaction in Query Space
  • Early basic model iterative revision of query
  • But with approximate queries, better to think of
    the query space itself being modified as the
    system learns the users goal
  • Requires some way of getting more information
    from user (without big burden)
  • Not all of the query space is necessarily updated
  • Could zoom in on a subset of images
  • Could train the system about the desired
    similarity measure
  • Ultimately requires dynamic indexing of database

29
Storage and Indexing
  • Computational performance is an issue!
  • Store feature vectors linearly in a file?
  • Some systems calculate 10,000 features per image,
    or 5122 histograms of 512 bins each
  • Very difficult to index in high-dimension spaces
  • Some techniques to deal with this problem
  • Space partitioning
  • Data partitioning
  • Distance-based techniques

30
Space partitioning
  • Associate every node in a tree with a region of
    feature space
  • If too many points in the region, split the
    region into subregions that are children of the
    original node
  • Example k-d tree, generalization of binary tree
    to k dimensions
  • Splits full node along k dimensions and data
    along the median values
  • Can be extended to produce balanced k-d B-tree

31
Data partitioning
  • Associate each point in feature space with a
    region representing its neighborhood
  • R-tree indexes hyper-rectangles in M-dimensional
    space
  • Leaf nodes are minimum bounding rectangles of
    regions in feature space
  • Internal nodes are rectangles enclosing all
    children
  • Variants have different policies about how
    rectangles may overlap how nodes are split

32
Distance-based techniques
  • Based on particular examples (vantage points)
  • Feature space organized in concentric rings

33
Conclusions
  • Driving forces wide availability of sensors,
    falling price of storage, and the Internet
  • Major need foundations.
  • Classification of usage types, purposes
  • Image retrieval is not the same as image
    understanding
  • New perspective on image segmentation (or maybe
    its not necessary at all)
  • Color image processing increasingly important as
    domains get larger and broader
  • Human-based measures of similarity also
    increasingly important

34
Many outstanding problems
  • New database techniques for high-dimensional
    indexes, large data sets, new forms of queries
  • Techniques for evaluating image retrieval systems
  • Benchmarks, standards
  • Closing the semantic gap
  • Significant obstacle to widely usable tools
  • Need more complex ways of attaching other
    information to images and queries
  • Integration of natural language processing and
    computer vision?
Write a Comment
User Comments (0)
About PowerShow.com