AVIR: Audio-Visual Information Retrieval for Non Expert Users PowerPoint PPT Presentation

presentation player overlay
1 / 28
About This Presentation
Transcript and Presenter's Notes

Title: AVIR: Audio-Visual Information Retrieval for Non Expert Users


1
AVIR Audio-Visual InformationRetrieval for Non
Expert Users
R. Leonardi, Univ. of Brescia Email
leon_at_ing.unibs.it http//www.extra.research.philip
s.com/euprojects/avir
2
AVIR PROJECT
  • Audio Video Indexing and Retrieval
  • for non-IT-expert users
  • ESPRIT project 28798
  • Start date September 98
  • Duration 2 years
  • Theme Information Access Interfaces
  • Context Video Metadata production and
    applications to digital TV programme guides

3
AVIR Consortium
  • Philips - NL (Prime contractor)
  • Philips LEP - F
  • RAI Radiotelevisione Italiana - I
  • Tecmath - D
  • TV Spielfilm Verlag - D
  • University of Brescia - I
  • University of Paris, Pierre et Marie Curie - F
  • BBC Archive - GB (sponsor)

4
AVIR objective
  • Audio Video Indexing and Retrieval for
    non-IT-expert users
  • Objective create end-to-end solutions for
    delivering new added value services on top of
    video broadcast systems
  • Focus - Personalised TV information access
  • Content/Service provider Indexing system
    generating metadata
  • Delivery of service stream of AV content
    descriptors
  • Consumer system advanced EPG on personalised TV
    receiverrecorder, with intelligent filtering
    and search.

5
AVIR delivery chain
Delivery System
Service Consumer System
Service Provider System
Content Provider Systems
Video
A/V Content
A/V Archive
Metadata
DVB
Metadata DB
Receiver
6
AVIR broadcast services
  • Two kinds of services
  • enriched TV programme description - attractors
    (RAI).
  • full-fledged electronic program guide
    (TVSpielfilm)
  • No return channel needed.
  • Usage of intelligent software agent based on user
    profiles.
  • Multimodal interaction for information filtering
    and advanced retrieval.
  • A key issue is the usage of high capacity
    consumer videorecorders that will result in a
    paradigm shift from VCR to personal multimedia
    repository (VOD).

7
Home Storage and Interoperability
  • Keywords low costs, short term exploitation
  • Cost of storage decreases quickly, the cost of
    bandwidth does not gt full interactive services
    will not arrive soon
  • High capacity home digital video-recorders will
    soon become available (DVHS, 50hrs, 99 - Video
    discs, 10-12GB, 2001)
  • A broadband delivery channel such as DVB is
    suited to deliver service information commonly
    used by many users
  • Low-cost home storage devices can satisfy the
    different interests of each user
  • Shift from linear model of broadcast services to
    interactive system for infotainment, thanks to
    intermediating role of storage device

8
Research issues in AVIR
  • AV content analysis and indexing
  • Speaker-independent continuous speech recognition
    with noisy
  • environment
  • Intelligent software agents for information
    filtering and searching
  • User profiling, cooperative annotation and
    filtering
  • Multimodal interfaces (representation and
    interfaces)
  • AV search and retrieval based on text or visual
    info
  • Voice control (speech recognition)
  • Applications on consumer platforms

9
Content and Service Provider Systems
  • AVIR will develop new techniques for
    semi-automatic content extraction from AV
    material
  • Unsupervised learning system for video sequence
    indexing
  • Structured key-info in database (text, pics,
    clips) with content description interface to
    ensure interoperability with consumer systems
  • Procedures for operators to generate metadata
    (annotation) for internal management and
    distribution to public
  • Descriptors must be streamable, partly linked
    with the content, partly repeated in a carousel
  • Multiplexing at system level with content in DVB
    stream

10
Consumer System
  • Descriptors are extracted, analysed and stored in
    a database (automatic indexing) with references
    (locators) to AV material and documents
  • Descriptors help users to easily navigate between
    different resources (DVB/Internet programs and
    services, on-air, scheduled, or stored on the
    system)
  • Intelligent software agents, based on user
    interest profiles, can take care of
    filtering/record AV programmes and information on
    behalf of the user
  • Metadata will also be used for easy management
    of AV material and resources in the storage
    system (e.g. garbage collection)

11
Metadata - Information flow
12
Metadata in AVIR
  • Interest in international standardization (MPEG7)
    as to
  • AV consumer applications (specific profile?)
  • push and broadcast applications (streamability,
    scalability etc.)
  • consumer browsing and search on local AV
    databases (user-friendliness of procedures, etc)
  • Definition of adequate DSs and Ds for
    application needs
  • Applications will be tested in experiments with
    users.
  • Metadata for TV broadcasting
  • MPEG7 I.S. ready in year 2001 short term
    solutions needed for DVB?
  • DVB-SI extended with TV-Anytime
  • New MHP (Digital Home Park) solution using
    DVB-Data carousels

13
Visual content extraction methods
  • Temporal segmentation of video
  • Shot separation
  • Correlations between non consecutive camera
    records (VQ)
  • Shot description
  • Editing effects
  • Mosaicing, outlier detection
  • Camera motion descriptors

14
Audio Analysis
  • Speech / Music / Noise / Silence separation
  • Audio model
  • Characteristic features
  • Classification method
  • Speaker indexing and clustering
  • Script alignment with speech for 3 movies
  • ( 270 min.)
  • Specification of vocal server experimentation for
    speech transcription (French language)

15
Contributions to MPEG-7
  • DDL Description scheme Definition Language (2)
  • DS Description Scheme (3)
  • D Descriptors (10)
  • Non normative tools
  • (extraction methods) (3)
  • P625b, m4591 (UPMC)
  • 655 (PhNL), 502 (UNIBS), 624 (UPMC)
  • 635, 636 (LEP), 384, 488, 490, 491, 492, 493,
    494, 497 (UNIBS)
  • 499, 500, 501 (UNIBS)

16
Editing effect extraction method (XM)
Cut
Wipe
Dissolve
University of Brescia
17
Statistical independence of shots
  • Associated histograms those of two independent
    R.V.

University of Brescia
18
Statistical independence of shots
  • Histogram of central frame of a dissolve
    convolution of scaled In and Out shot histograms.

University of Brescia
19
Mosaic generation process
Warped Image WFn
Warping
Perspective motion model
Object based weighting operator
Current Image Fn
Weight Map
Blending
Warping Estimation
Error Map
Mosaic Mn
Mosaic Accretion
Previous Mosaic Mn-1
20
Camera model
For any image point, the velocity induced by the
camera motion is given by
Ty
Y
Booming
Ry
Tracking
Tx
Panning
Rx
O
X
p
Tilting
x
q max
f
y
Rzoom
P(X,Y,Z)
Rz
Zooming
Image plane
Rolling
Z
An external coordinate system OXYZ moving with
the camera, and the corresponding retinal
coordinates (x,y)
21
Camera motion parameters extraction
22
Results on Stefan sequence
  • Key-frame Mosaic

Camera motion parameters
23
Results on Coastguard sequence
  • Key-frame Mosaic

Camera motion parameters
24
Measuring Shot Correlations
  • For each shot, construct a VQ codebook
    (videms), so as to allow a given reconstruction
    quality.
  • Two shots are declared similar when
  • d(S1, S2 ) DC2 (S1)- DC1 (S1)
  • DC1 (S2)- DC2 (S2) sufficiently small
    !
  • Assign indices accordingly.

Dialogue
25
Query Engine for MPEG-7 description
  • Characteristics of Query Engine
  • Parsing DS and Descriptions checking description
    validity vs DS
  • Querying Descriptions
  • TOCAI based
  • query-by-example / similarity based retrieval
  • value based query associated to specific
    attribute
  • agent based querying
  • Architecture issues under investigation
  • Need for standard parser interface
  • Need for persistent parsing representation
  • Need to meet consumer system specification

26
TOCAI description scheme
  • Features
  • multiple levels of abstraction
  • multiple ordering capability chronological/alpha
    betical
  • Analogy indexing of a book (with enhanced
    features)
  • Table of Content (ToC)
  • What is the book about ? (chapters/sections/subsec
    tions/paragraphs)
  • Analytical index (AI)
  • Find all pages containing this topic keyword
    search.

27
TOCAI description scheme
  • Table of Content (ToC) ? NAVIGATION
  • Maintain the chronological order
  • Hierarchical overview (multi-layer semantics)
  • Analytical index (AI) ? RETRIEVAL
  • Create an order of key elements according to a
    certain ordering key
  • ordering key color, size, speed, scene type...
  • key element
  • key-image mosaic, MPEG-4 object.
  • key-scene dialogue, action,

University of Brescia
28
Conclusion
  • AVIR objective
  • AVIR delivery chain
  • Consumer provider system specification
  • Automatic extraction tools
  • Adequate DS (TOCAI) for navigation and retrieval
  • Adequate Ds camera motion parameters, editing
    effects, mosaicing, temporal video segmentations
    (shots/scenes)
Write a Comment
User Comments (0)
About PowerShow.com