Title: Content-based Video Indexing, Classification
1 Content-based Video Indexing,
Classification Retrieval
- Presented by HOI, Chu Hong
- Nov. 27, 2002
2Outline
- Motivation
- Introduction
- Two approaches for semantic analysis
- A probabilistic framework (Naphade, Huang 01)
- Object-based abstraction and modeling Lee, Kim,
Hwang 01 - A multimodal framework for video content
interpretation - Conclusion
3Motivation
- There is an amazing growth in the amount of
digital video data in recent years. - Lack of tools for classify and retrieve video
content - There exists a gap between low-level features and
high-level semantic content. - To let machine understand video is important and
challenging.
4Introduction
- Content-based Video indexing
- the process of attaching content based labels to
video shots - essential for content-based classification and
retrieval - Using automatic analysis techniques
- - shot detection, video segmentation
- - key frame selection
- - object segmentation and recognition
- - visual/audio feature extraction
- - speech recognition, video text, VOCR
5Introduction
- Content-based Video Classification
- Segment classify videos into meaning categories
- Classify videos based on predefined topic
- Useful for browsing and searching by topic
- Multimodal method
- Visual features
- Audio features
- Motion features
- Textual features
- Domain-specific knowledge
6Introduction
- Content-based Video Retrieval
- Simple visual feature query
- Retrieve video with key-frame Color-R(80),G(10)
,B(10) - Feature combination query
- Retrieve video with high motion upward(70),
Blue(30) - Query by example (QBE)
- Retrieve video which is similar to example
- Localized feature query
- Retrieve video with a running car toward right
- Object relationship query
- Retrieve video with a girl watching the sun set
- Concept query (query by keyword)
- Retrieve explosion, White Christmas
7Introduction
- Feature Extraction
- Color features
- Texture features
- Shape features
- Sketch features
- Audio features
- Camera motion features
- Object motion features
8Semantic Indexing Querying
- Limitation of QBE
- Measuring similarity using only low-level
features - Lack reflection of users perception
- Difficult annotation of high level features
- Syntactic to Semantic
- Bridge the gap between low-level feature and
semantic content - Semantic indexing, Query By Keyword (QBK)
- Semantic description scheme MPEG-7
- Semantic interaction between concepts
- no scheme to learn the model for individual
concepts
9Semantic Modeling Indexing
- Two approaches
- Probabilistic framework, Multiject
(Naphade01) - Object-based abstraction and indexing Lee, Kim,
Hwang 01
10A probabilistic approach (Multiject
Multinet) (Naphade, Huang 01)
- a probabilistic multimedia object
- 3 categories semantic concepts
- Objects
- Face, car, animal, building
- Sites
- Sky, mountain, outdoor, cityscape
- Events
- Explosion, waterfall, gunshot, dancing
11Multiject for semantic concept
P( Outdoor Present features, other
multijects) 0.7
Other multijects
Outdoor
Visual features
Audio features
Text features
12How to create a Multiject
- Shot-boundary detection
- Spatio-temporal segmentation of within-shot
frames - Feature extraction (color, texture, edge
direction, etc ) - Modeling
- Sites mixture of Gaussians
- Events hidden Markov models (HMMs) with
observation densities as gaussian mixtures - All audio events modeled using HMMs
- Each segment is tested for each concept and the
information is then composed at frame level
13Multiject Hierarchical HMM
ss1 - ssm state sequence for supervisor HMM sa1
- sam state sequence for audio HMM xa1 - xam
audio observations sv1 - svm state sequence for
video HMM xv1 - xvm video observations
14Multinet Concept Building based on Multiject
- A network of multijects modeling interaction
between them - / - positive/negative interaction between
multijects
15Bayesian Multinet
- Nodes binary random variables
(presence/absence of multiject) - Layer 0 frame-level multiject-based semantic
features - Layer 1 inference from layer 0
- Layer 2 higher level for performance
improvement
16Object-based Semantic Video Modeling
17Object Extraction based on Object Tracking Kim,
Hwang 00
18Semantic Feature Modeling
- Modeling based on temporal variation of object
features - Boundary shape and motion statistics of object
area
19HMM Modeling
1. Observation Sequence O1
. OT
. .
. .
object features
2. Left-Right 1-D HMM modeling
20Video Modeling Three Layer Structure
Three layer structure of video modeling, compared
to NLP
Video Understanding
Natural Language Processing
Content Interpretation
Interpretation
Semantic Video Modeling
Frame-based Structural Modeling
Object-based Structural Modeling
Sentence Structure grammar
Word Recognition
Audio-Visual Feature Extraction
21A Multimodal Framework for Video Content
Interpretation
- Long-term goal
- Application on automatic TV Programs Scout
- Allow user to request topic-level programs
- Integrate multiple modalities visual, audio and
Text information - Multi-level concepts
- Low low-level feature
- Mid object detection, event modeling
- High classification result of semantic content
- Probabilistic model, Using Bayesian network for
classification (causal relationship,
domain-knowledge)
22(No Transcript)
23How to work with the framework?
- Preprocessing
- Story segmentation (shot detection)
- VOCR, Speech Recognition
- Key frame selection
- Feature Extraction
- Visual features based on key-frame
- Color, texture, shape, sketch, etc.
- Audio features
- average energy, bandwidth, pitch, mel-frequency
cepstral coefficients, etc. - Textual features (Transcript)
- Knowledge tree, a lot of keyword categories
politics, entertainment, stock, art, war, etc. - Word spotting, vote histogram
- Motion features
- Camera operation Panning, Tilting, Zooming,
Tracking, Booming, Dollying - Motion trajectories (moving objects)
- Object abstraction, recognition
- Building and training the Bayesian network
24Challenging points
- Preprocessing is significant in the framework.
- Accuracy of key-frame selection
- Accuracy of speech recognition VOCR
- Good feature extraction is important for the
performance of classification. - Modeling semantic video objects and events
- How to integrate multiple modalities still need
to be well considered.
25Conclusion
- Introduction of several basic concepts
- Semantic video modeling and indexing
- Propose a multimodal framework for topic
classification of Video - Discussion of Challenging problems
26Q A