Title: CSM06 Information Retrieval
1CSM06 Information Retrieval
- LECTURE 7 Tuesday 16th November
- Dr Andrew Salway
- a.salway_at_surrey.ac.uk
2Lecture 8 Image Retrieval
- Different kinds of metadata for visual
information - Manual Annotation of Images
- Similarity-based Image Retrieval using Perceptual
Features, e.g. QBIC and Blobworld - The Sensory Gap, and the Semantic Gap
- Automatic Image Annotation using Collateral Text,
e.g. WebSEEK system
3Image Data
- Raw image data a bitmap with a value for every
picture element (pixel) (cf. vector graphics) - Captured with digital camera or scanner
- Different kinds of images
- Photographs people, scenes, actions holiday
albums, criminal investigations - Fine art and museum artefacts
- Medical images x-rays, scans
- Geographic Information Systems images from
satellites - Meteorological Images
4Querying Strategies
- Language-based query
- Text-based query describing entities, actions,
meanings, etc - By visual example
- Sketch-based query draw coloured regions
- Image choose an image and ask for more which are
visually similar
5Image Description Exercise
- Imagine you are the indexer of an image
collection. - 1) List all the words you can think of that
describe the following image, so that it could be
retrieved by as many users as possible who might
be interested in it. Your words do NOT need to
be factually correct, but they should show the
range of things that could be said about the
image - 2) Try and put your words into groups so that
each group of words says the same sort of thing
about the image - 3) Which words (metadata) do you think a machine
could extract from the image automatically?
6(No Transcript)
7Words to describe the image
8Organising Image Metadata
- Picture worth a thousand words its a cliché
but then clichés are often true - These words relate to different aspects of the
image ? - we need to have labels to talk about different
kinds of metadata for images - and to structure how we store metadata
- Some kinds of metadata are more likely to require
human input than others
9Metadata for Visual Information
- Del Bimbo (1999) content-independent
content-dependent content-descriptive - Shatford (1986) in effect refines content
descriptive ? pre-iconographic iconographic
iconological based on Panofskys ideas
103 Kinds of Metadata for Visual Information (Del
Bimbo 1999)
- Content-independent data which is not directly
concerned with image content, and could not
necessarily be extracted from it, e.g. artist
name, date, ownership - Content-dependent perceptual facts to do with
colour, texture, shape can be automatically (and
therefore objectively) extracted from image data - Content-descriptive entities, actions,
relationships between them as well as meanings
conveyed by the image more subjective and much
harder to extract automatically
11Three levels of visual content
- Pre-iconographic generic who, what, where, when
- Iconographic specific who, what, where, when
- Iconological abstract aboutness
- Based on Panofsky (1939) adapted by Shatford
(1986) for indexing visual information
12(No Transcript)
13(No Transcript)
14Image Annotation manual
- Keyword-based descriptions of image content can
be manually annotated - May use a controlled vocabulary and consensus
decisions to minimise subjectivity and ambiguity
15Systems
- For examples of manually annotated image
libraries, see - www.tate.org.uk
- www.corbis.com
- Iconclass has been developed as an extensive
classification scheme for the content of
paintings, see - www.iconclass.nl
16(No Transcript)
17(No Transcript)
18Visual Similarity
- Remember images can be indexed / queried at
different levels of abstraction (cf. del Bimbos
metadata scheme) - When dealing with content-dependent metadata
(e.g. perceptual features like colour, texture
and shape) it is possible to automate indexing - To query
- draw coloured regions (sketch-based query)
- or choose an example image (query by example)
- Images with similar perceptual features are
retrieved (not necessarily similar semantic
content)
19(No Transcript)
20(No Transcript)
21Similarity-based Retrieval
- Perceptual Features (for visual similarity)
- Colour
- Texture
- Shape
- Spatial Relations
- These features can be computed directly from
image data they characterise the pixel
distribution in different ways - Different features may help retrieve different
kinds of images
22(No Transcript)
23Perceptual Features Colour
- Colour can be computed as a global metric, i.e. a
feature of an entire image or of a region - Colour is considered a good metric because it is
invariant to image translation and rotation and
changes only slowly under effects of different
viewpoints, scale and occlusion - Colour values of pixels in an image are
discretized and a colour histogram is made to
represent the image / region
24Perceptual Features Texture
- There is less agreement about what constitutes
texture and a variety of metrics - Generally they capture patterns in the image data
(or lack of them), e.g. repetitiveness and
granularity - (Compare the texture of a brick wall, a stainless
steel kettle, ripples in a puddle and a grassy
field)
25Perceptual Features Shape
- Unless a simple geometric form (e.g. rectangle,
circle, triangle) then an objects shape will be
captured by a set of features relating to, e.g. - Area
- Elongatedness
- Major axis orientation
- Shape outline
26Similarity-based Retrieval
- Based on a distance function - one metric, or
combination of metrics, e.g. colour, shape,
texture, is chosen to measure similarity between
images or regions - Key features may be extracted for each
image/region to reduce dimensionality - Retrieval is a matter of finding nearest
neighbours to query (sketch-based or example
image) - Similarity-based Retrieval is more appropriate
for some kinds of image collections / users than
others
27Systems
- For examples of image retrieval systems using
visual similarity see - QBIC (Query By Image Content), developed by IBM
and used by, among others, the Hermitage Art
Museum - http//wwwqbic.almaden.ibm.com/
- Blobworld - developed by researchers at the
University of California - http//elib.cs.berkeley.edu/photos/blobworld/start
.html
28(No Transcript)
29(No Transcript)
30The Sensory Gap
- The sensory gap is the gap between the object in
the world and the information in a
(computational) description derived from a
recording of that scene - (Smeulders et al 2000).
31The Semantic Gap
- The semantic gap is the lack of coincidence
between the information that one can extract from
the visual data and the interpretation that the
same data have for a user in a given situation - (Smeulders et al 2000).
32Possible Solution?
- One way to resolve the semantic gap comes from
sources outside the image by integrating other
sources of information about the image in the
query. Information about an image can come from
a number of different sources the image content,
labels attached to the image, images embedded in
a text, and so on. - (Smeulders et al 2000).
33Image Annotation using collateral text
- Images are often accompanied by, or associated
with, collateral text, e.g. the caption of a
photograph in a newspaper, or the caption of a
painting in an art gallery - Keywords, and possibly other information, can be
extracted from the collateral text and used to
index the image
34Image Annotation using collateral text WebSEEK
System
- The WebSEEK system processes HTML tags linking to
image data files, (as well as processing the
image data itself), in order to index visual
information on the Web - Here we concentrate on how WebSEEK exploits
collateral text, e.g. HTML tags, for image
indexing-retrieval NB. Current web search
engines, like Google and AltaVista, appear to be
doing something similar
35Image Annotation with collateral text WebSEEK
(Smith and Chang 1997)
- Keyword indexing and subject-based classification
for WWW-based image retrieval user can query or
browse hierarchy - System trawls Web to find HTML pages with links
to images - The HTML text in which the link to an image is
embedded is used for indexing and classifying the
video - gt500,000 images and videos indexed with 11,500
terms 2,128 classes manually created
36Image Annotation using collateral text
- The WebSeek system processed HTML tags linking to
image and video data files in order to index
visual information on the Web - The success of this kind of approach depends on
how well the keywords in the collateral text
relate to the image - Keywords are mapped automatically to subject
categories the categories are created previously
with human input
37Image Annotation using collateral text WebSEEK
System
- Term Extraction terms extracted from URLs, alt
tags and hyperlink text, e.g. - http//www.mynet.net/animals/domestic-beasts/dog37
.jpg - animals, domestic, beasts, dog
- Terms used to make an inverted index for
keyword-based retrieval - Directory names also extracted, e.g.
animals/domestic-beasts
38Image Annotation using collateral text WebSEEK
System
- Subject Taxonomy manually created is-a
hierarchy with key-term mappings to map key-terms
automatically to subject classes - Facilitates browsing of the image collection
39(No Transcript)
40(No Transcript)
41Image Annotation using collateral text WebSEEK
System
- The success of this kind of approach depends on
how well the keywords in the collateral text
relate to the image - URLs, alt tags and hyperlink text may or may not
be informative about the image content even if
informative they tend to be brief perhaps
further kinds of collateral text could be
exploited
42Image Retrieval in Google
- Rather like WebSEEK, Google appears to match
keywords in file names and in alt caption, e.g. - ltimg src"/images/020900.jpg" width150
height180 alt"David Beckham tussles with
Emmanuel Petit"gt
43Essential Exercise
- Image Retrieval Exercise download this from
module webpage.
44Further Reading
- A paper about the WebSEEK system
- Smith and Chang (1997), Visually Searching the
Web for Content, IEEE Multimedia July-September
1997, pp. 12-20. Available via librarys
eJournal service. - Different kinds of metadata for images, and an
overview of content-based image retrieval - Excerpts from del Bimbo (1999), Visual
Information Retrieval available in library
short-term loan articles. - For a comprehensive review of CBIR, and
discussions of sensory gap and semantic gap - Smeulders, A.W.M. Worring, M. Santini, S.
Gupta, A. Jain, R. (2000), Content-based image
retrieval at the end of the early years. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, Volume 22, number 12, pp.1349-1380.
Available online through librarys
eJournals. - Eakins (2002), Towards Intelligent Image
Retrieval, Pattern Recognition 35, pp. 3-14. - Enser (2000), Visual Image Retrieval seeking
the alliance of concept-based and content-based
paradigms, Journal of Information Science 26(4),
pp. 199-210.