Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 5 book chapter 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 5 book chapter 1

Description:

Example: data blades of Informix. Content-based functions on text and images ... Informix: 2D, 3D data blades. Boxes, vectors, ... Operations: intersect, ... – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 30
Provided by: alexande95
Category:

less

Transcript and Presenter's Notes

Title: Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 5 book chapter 1


1
Special Topics in Computer Science Advanced
Topics in Information Retrieval Lecture 5 (book
chapter 11) Multimedia IRModels and
Languages
  • Alexander Gelbukh
  • www.Gelbukh.com

2
Previous Chapter Conclusions
  • Inverted files seem to be the best option
  • Other structures are good for specific cases
  • Genetic databases
  • Sequential searching is an integral part of
    manyindexing-based search techniques
  • Many methods to improve sequential searching
  • Compression can be integrated with search

3
Previous Chapter Research topics
  • Perhaps, new details in integration of
    compression and search
  • Linguistic indexing allowing linguistic
    variations
  • Search in plural or only singular
  • Search with or without synonyms

4
Motivation
  • Applications
  • office,
  • CAD,
  • medical,
  • Internet
  • Example
  • Artists sings a melody and sees all the songs
    with similar melody

5
Whats different
  • Different from text IR
  • Structure of data is more complex. Efficiency is
    an issue
  • Using of metadata
  • Characteristics of multimedia data
  • Operations to be performed
  • Aspects
  • Data modeling Extract and maintain the features
    of objects
  • Data retrieval based not only on description but
    on content

6
Retrieval process
  • Query specification
  • fuzzy predicates similar to
  • content predicates images containing an apple
  • data type predicates video, ...
  • Query processing and optimization
  • Parsed, compiled, optimized for order of
    execution
  • Problem many data types, different processing
    for each
  • Answer
  • Relevance similarity to query
  • Iteration
  • Bad quality, so need to refine

7
Modeling
8
Data modeling
  • To model is to simplify, in order to make
    manageable. We will represent an image as...
  • From the users point of view
  • From the systems point of view (technically)
  • A problem very large storage size. Modeling
    needed
  • Objects are represented as feature vectors
  • Images / Video shape. House, car, ...
  • Sound style. Music Merry, sad, ...
  • Features are defined directly or by comparison
  • Degree of certainty is stored

9
Multimedia support in commercial DBMSs (1999)
  • Variable length data.
  • Non-standard
  • Different and usually very limited sets of
    operations
  • SQL3
  • provides user-extensible data types
  • Object-oriented
  • Implemented partially in many systems
  • Example data blades of Informix
  • Content-based functions on text and images
  • E.g. date 1997 AND contains (car)

10
Spatial data types
  • Informix 2D, 3D data blades
  • Boxes, vectors, ...
  • Operations intersect, contains, center, ...
  • Text containWords, ....
  • Supports query images by content

11
Example MULTOS
  • Multimedia document server
  • Documents are described by
  • logical structure title, into, chapter, ...
  • layout structure pages, frames, ...
  • conceptual structure allows content-based
    queries
  • Docs similar in conceptual structures are grouped
    into conceptual types
  • Example Generic_Letter

12
Example of conceptual structure...
13
...continued
14
Image data in MULTOS
  • Analysis
  • low level detect objects and positions
  • high level image interpretation
  • Result of analysis
  • description of objects found and their classes
  • certainty values
  • Indices are used for fast access to this info
  • Object index. Includes pointers to objects and
    certainty values
  • Cluster index, with fuzzy clusters of similar
    images

15
Internet
  • How Google does it?
  • No image processing. Textual context!
  • File names, nearby words
  • Distance from image to words
  • give me images with flower in the file name or
    near the image

16
Languages
17
Query languages
  • As a query, either a description of the object or
    an example object is submitted
  • show me images similar to this one
  • in what respects similar?!
  • Exact match is inadequate. Additional means are
    needed
  • Content is not a single feature

18
What defines query language
  • Interface. How to enter the query
  • Types of conditions to specify
  • Handling of uncertainty, proximity, weights

19
Interface
  • Browsing and navigation
  • Search description or query by example
  • Query by example
  • specify what features are important. Give me all
    houses with similar shape but different colors
  • Libraries of examples can be provided

20
Conditions...
  • Attribute predicates
  • structured content the predefined types
    extracted beforehand
  • Exact match. E.g. size, type (video, audio, ...)
  • Structural predicates
  • structure title, sections, ...
  • metadata are used. Find objects containing an
    image and a video clip
  • Semantic predicates
  • unrestricted content.
  • Find all red houses red ?, house ? Fuzzy

21
... conditions
  • Predicates
  • Spatial contain, intersect, is contained in, is
    adjacent to ...
  • Temporal Find audio where first politics and
    then economy is discussed
  • Spatial and temporal predicates can be combined
    Find clips where the logo disappears and then a
    graph appears at the same place
  • A predicate can be applied to a part of document
  • As path expressions in OO databases

22
Uncertainty, proximity, weights
  • Similarity function
  • The user can assign importance weights to
    individual predicates in a complex query
  • This gives ranking, as in text IR
  • The same models can be used, e.g., probabilistic
    model

23
Examples of query languages SQL3
  • Functions and stored procedures user-defined
    data manipulation
  • Active database support database reacts on the
    events, not only commands. This enforces
    integrity constraints
  • Good news rather standard
  • Bad news no ranking supported!
  • Effort to integrate SQL3 with IR techniques.SQL
    MM Full Text and other similar languages

24
... examples MULTOS
  • One of design goals easy navigation
  • Paths are supported
  • Identification of components by type, not by
    position
  • All images in the document, not the image in 3rd
    chapter
  • Types of predicates
  • on data attributes, on textual components, on
    images (image type, objects contained, ...)
  • Example

25
MULTOS example
26
Another example of MULTOS
27
Research topics
  • How similarity function can be defined?
  • What features of images (video, sound) there are?
  • How to better specify the importance of
    individualfeatures? (Give me similar houses
    similar size?color? strructure? Architectural
    style?)
  • How to determine the objects in an image?
  • Integration with DBMSs and SQL for fast access
    and rich semantics
  • Integration with XML
  • Ranking by similarity, taking into account
    history, profile

28
Conclusions
  • Basically, images are handled as text described
    them
  • Namely, feature vectors (or feature hierarchies)
  • Context can be used when available to determine
    features
  • Also, queries by example are common
  • From the point of view of DBMS, integration with
    IRand multimedia-specific techniques is needed
  • Object-oriented technology is adequate

29
Thank you! Till ??, 6 pm
Write a Comment
User Comments (0)
About PowerShow.com