Title: AVIR: Audio-Visual Information Retrieval for Non Expert Users
1AVIR Audio-Visual InformationRetrieval for Non
Expert Users
R. Leonardi, Univ. of Brescia Email
leon_at_ing.unibs.it http//www.extra.research.philip
s.com/euprojects/avir
2AVIR PROJECT
- Audio Video Indexing and Retrieval
- for non-IT-expert users
- ESPRIT project 28798
- Start date September 98
- Duration 2 years
- Theme Information Access Interfaces
- Context Video Metadata production and
applications to digital TV programme guides
3AVIR Consortium
- Philips - NL (Prime contractor)
- Philips LEP - F
- RAI Radiotelevisione Italiana - I
- Tecmath - D
- TV Spielfilm Verlag - D
- University of Brescia - I
- University of Paris, Pierre et Marie Curie - F
- BBC Archive - GB (sponsor)
4AVIR objective
- Audio Video Indexing and Retrieval for
non-IT-expert users - Objective create end-to-end solutions for
delivering new added value services on top of
video broadcast systems - Focus - Personalised TV information access
- Content/Service provider Indexing system
generating metadata - Delivery of service stream of AV content
descriptors - Consumer system advanced EPG on personalised TV
receiverrecorder, with intelligent filtering
and search.
5AVIR delivery chain
Delivery System
Service Consumer System
Service Provider System
Content Provider Systems
Video
A/V Content
A/V Archive
Metadata
DVB
Metadata DB
Receiver
6AVIR broadcast services
- Two kinds of services
- enriched TV programme description - attractors
(RAI). - full-fledged electronic program guide
(TVSpielfilm) - No return channel needed.
- Usage of intelligent software agent based on user
profiles. - Multimodal interaction for information filtering
and advanced retrieval. - A key issue is the usage of high capacity
consumer videorecorders that will result in a
paradigm shift from VCR to personal multimedia
repository (VOD).
7Home Storage and Interoperability
- Keywords low costs, short term exploitation
- Cost of storage decreases quickly, the cost of
bandwidth does not gt full interactive services
will not arrive soon - High capacity home digital video-recorders will
soon become available (DVHS, 50hrs, 99 - Video
discs, 10-12GB, 2001) - A broadband delivery channel such as DVB is
suited to deliver service information commonly
used by many users - Low-cost home storage devices can satisfy the
different interests of each user - Shift from linear model of broadcast services to
interactive system for infotainment, thanks to
intermediating role of storage device
8Research issues in AVIR
- AV content analysis and indexing
- Speaker-independent continuous speech recognition
with noisy - environment
- Intelligent software agents for information
filtering and searching - User profiling, cooperative annotation and
filtering - Multimodal interfaces (representation and
interfaces) - AV search and retrieval based on text or visual
info - Voice control (speech recognition)
- Applications on consumer platforms
9Content and Service Provider Systems
- AVIR will develop new techniques for
semi-automatic content extraction from AV
material - Unsupervised learning system for video sequence
indexing - Structured key-info in database (text, pics,
clips) with content description interface to
ensure interoperability with consumer systems - Procedures for operators to generate metadata
(annotation) for internal management and
distribution to public - Descriptors must be streamable, partly linked
with the content, partly repeated in a carousel - Multiplexing at system level with content in DVB
stream
10Consumer System
- Descriptors are extracted, analysed and stored in
a database (automatic indexing) with references
(locators) to AV material and documents - Descriptors help users to easily navigate between
different resources (DVB/Internet programs and
services, on-air, scheduled, or stored on the
system) - Intelligent software agents, based on user
interest profiles, can take care of
filtering/record AV programmes and information on
behalf of the user - Metadata will also be used for easy management
of AV material and resources in the storage
system (e.g. garbage collection)
11Metadata - Information flow
12Metadata in AVIR
- Interest in international standardization (MPEG7)
as to - AV consumer applications (specific profile?)
- push and broadcast applications (streamability,
scalability etc.) - consumer browsing and search on local AV
databases (user-friendliness of procedures, etc) - Definition of adequate DSs and Ds for
application needs - Applications will be tested in experiments with
users. - Metadata for TV broadcasting
- MPEG7 I.S. ready in year 2001 short term
solutions needed for DVB? - DVB-SI extended with TV-Anytime
- New MHP (Digital Home Park) solution using
DVB-Data carousels
13Visual content extraction methods
- Temporal segmentation of video
- Shot separation
- Correlations between non consecutive camera
records (VQ) - Shot description
- Editing effects
- Mosaicing, outlier detection
- Camera motion descriptors
14Audio Analysis
- Speech / Music / Noise / Silence separation
- Audio model
- Characteristic features
- Classification method
- Speaker indexing and clustering
- Script alignment with speech for 3 movies
- ( 270 min.)
- Specification of vocal server experimentation for
speech transcription (French language)
15Contributions to MPEG-7
- DDL Description scheme Definition Language (2)
- DS Description Scheme (3)
- D Descriptors (10)
- Non normative tools
- (extraction methods) (3)
- P625b, m4591 (UPMC)
- 655 (PhNL), 502 (UNIBS), 624 (UPMC)
- 635, 636 (LEP), 384, 488, 490, 491, 492, 493,
494, 497 (UNIBS) - 499, 500, 501 (UNIBS)
16Editing effect extraction method (XM)
Cut
Wipe
Dissolve
University of Brescia
17Statistical independence of shots
- Associated histograms those of two independent
R.V.
University of Brescia
18Statistical independence of shots
- Histogram of central frame of a dissolve
convolution of scaled In and Out shot histograms.
University of Brescia
19Mosaic generation process
Warped Image WFn
Warping
Perspective motion model
Object based weighting operator
Current Image Fn
Weight Map
Blending
Warping Estimation
Error Map
Mosaic Mn
Mosaic Accretion
Previous Mosaic Mn-1
20Camera model
For any image point, the velocity induced by the
camera motion is given by
Ty
Y
Booming
Ry
Tracking
Tx
Panning
Rx
O
X
p
Tilting
x
q max
f
y
Rzoom
P(X,Y,Z)
Rz
Zooming
Image plane
Rolling
Z
An external coordinate system OXYZ moving with
the camera, and the corresponding retinal
coordinates (x,y)
21Camera motion parameters extraction
22Results on Stefan sequence
Camera motion parameters
23Results on Coastguard sequence
Camera motion parameters
24Measuring Shot Correlations
- For each shot, construct a VQ codebook
(videms), so as to allow a given reconstruction
quality. - Two shots are declared similar when
- d(S1, S2 ) DC2 (S1)- DC1 (S1)
- DC1 (S2)- DC2 (S2) sufficiently small
! - Assign indices accordingly.
Dialogue
25 Query Engine for MPEG-7 description
- Characteristics of Query Engine
- Parsing DS and Descriptions checking description
validity vs DS - Querying Descriptions
- TOCAI based
- query-by-example / similarity based retrieval
- value based query associated to specific
attribute - agent based querying
- Architecture issues under investigation
- Need for standard parser interface
- Need for persistent parsing representation
- Need to meet consumer system specification
26TOCAI description scheme
- Features
- multiple levels of abstraction
- multiple ordering capability chronological/alpha
betical - Analogy indexing of a book (with enhanced
features) - Table of Content (ToC)
- What is the book about ? (chapters/sections/subsec
tions/paragraphs) - Analytical index (AI)
- Find all pages containing this topic keyword
search.
27TOCAI description scheme
- Table of Content (ToC) ? NAVIGATION
- Maintain the chronological order
- Hierarchical overview (multi-layer semantics)
- Analytical index (AI) ? RETRIEVAL
- Create an order of key elements according to a
certain ordering key - ordering key color, size, speed, scene type...
- key element
- key-image mosaic, MPEG-4 object.
- key-scene dialogue, action,
University of Brescia
28Conclusion
- AVIR objective
- AVIR delivery chain
- Consumer provider system specification
- Automatic extraction tools
- Adequate DS (TOCAI) for navigation and retrieval
- Adequate Ds camera motion parameters, editing
effects, mosaicing, temporal video segmentations
(shots/scenes)