Content-based Video Indexing, Classification - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Content-based Video Indexing, Classification

Description:

Lack of tools for classify and retrieve video content ... tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc. ... – PowerPoint PPT presentation

Number of Views:367
Avg rating:3.0/5.0
Slides: 27
Provided by: chh2
Category:

less

Transcript and Presenter's Notes

Title: Content-based Video Indexing, Classification


1
Content-based Video Indexing,
Classification Retrieval
  • Presented by HOI, Chu Hong
  • Nov. 27, 2002

2
Outline
  • Motivation
  • Introduction
  • Two approaches for semantic analysis
  • A probabilistic framework (Naphade, Huang 01)
  • Object-based abstraction and modeling Lee, Kim,
    Hwang 01
  • A multimodal framework for video content
    interpretation
  • Conclusion

3
Motivation
  • There is an amazing growth in the amount of
    digital video data in recent years.
  • Lack of tools for classify and retrieve video
    content
  • There exists a gap between low-level features and
    high-level semantic content.
  • To let machine understand video is important and
    challenging.

4
Introduction
  • Content-based Video indexing
  • the process of attaching content based labels to
    video shots
  • essential for content-based classification and
    retrieval
  • Using automatic analysis techniques
  • - shot detection, video segmentation
  • - key frame selection
  • - object segmentation and recognition
  • - visual/audio feature extraction
  • - speech recognition, video text, VOCR

5
Introduction
  • Content-based Video Classification
  • Segment classify videos into meaning categories
  • Classify videos based on predefined topic
  • Useful for browsing and searching by topic
  • Multimodal method
  • Visual features
  • Audio features
  • Motion features
  • Textual features
  • Domain-specific knowledge

6
Introduction
  • Content-based Video Retrieval
  • Simple visual feature query
  • Retrieve video with key-frame Color-R(80),G(10)
    ,B(10)
  • Feature combination query
  • Retrieve video with high motion upward(70),
    Blue(30)
  • Query by example (QBE)
  • Retrieve video which is similar to example
  • Localized feature query
  • Retrieve video with a running car toward right
  • Object relationship query
  • Retrieve video with a girl watching the sun set
  • Concept query (query by keyword)
  • Retrieve explosion, White Christmas

7
Introduction
  • Feature Extraction
  • Color features
  • Texture features
  • Shape features
  • Sketch features
  • Audio features
  • Camera motion features
  • Object motion features

8
Semantic Indexing Querying
  • Limitation of QBE
  • Measuring similarity using only low-level
    features
  • Lack reflection of users perception
  • Difficult annotation of high level features
  • Syntactic to Semantic
  • Bridge the gap between low-level feature and
    semantic content
  • Semantic indexing, Query By Keyword (QBK)
  • Semantic description scheme MPEG-7
  • Semantic interaction between concepts
  • no scheme to learn the model for individual
    concepts

9
Semantic Modeling Indexing
  • Two approaches
  • Probabilistic framework, Multiject
    (Naphade01)
  • Object-based abstraction and indexing Lee, Kim,
    Hwang 01

10
A probabilistic approach (Multiject
Multinet) (Naphade, Huang 01)
  • a probabilistic multimedia object
  • 3 categories semantic concepts
  • Objects
  • Face, car, animal, building
  • Sites
  • Sky, mountain, outdoor, cityscape
  • Events
  • Explosion, waterfall, gunshot, dancing

11
Multiject for semantic concept
P( Outdoor Present features, other
multijects) 0.7
Other multijects
Outdoor
Visual features
Audio features
Text features
12
How to create a Multiject
  • Shot-boundary detection
  • Spatio-temporal segmentation of within-shot
    frames
  • Feature extraction (color, texture, edge
    direction, etc )
  • Modeling
  • Sites mixture of Gaussians
  • Events hidden Markov models (HMMs) with
    observation densities as gaussian mixtures
  • All audio events modeled using HMMs
  • Each segment is tested for each concept and the
    information is then composed at frame level

13
Multiject Hierarchical HMM
ss1 - ssm state sequence for supervisor HMM sa1
- sam state sequence for audio HMM xa1 - xam
audio observations sv1 - svm state sequence for
video HMM xv1 - xvm video observations
14
Multinet Concept Building based on Multiject
  • A network of multijects modeling interaction
    between them
  • / - positive/negative interaction between
    multijects

15
Bayesian Multinet
  • Nodes binary random variables
    (presence/absence of multiject)
  • Layer 0 frame-level multiject-based semantic
    features
  • Layer 1 inference from layer 0
  • Layer 2 higher level for performance
    improvement

16
Object-based Semantic Video Modeling
17
Object Extraction based on Object Tracking Kim,
Hwang 00
18
Semantic Feature Modeling
  • Modeling based on temporal variation of object
    features
  • Boundary shape and motion statistics of object
    area

19
HMM Modeling
1. Observation Sequence O1
. OT
. .
. .
object features
2. Left-Right 1-D HMM modeling
20
Video Modeling Three Layer Structure
Three layer structure of video modeling, compared
to NLP
Video Understanding
Natural Language Processing
Content Interpretation
Interpretation
Semantic Video Modeling
Frame-based Structural Modeling
Object-based Structural Modeling
Sentence Structure grammar
Word Recognition
Audio-Visual Feature Extraction
21
A Multimodal Framework for Video Content
Interpretation
  • Long-term goal
  • Application on automatic TV Programs Scout
  • Allow user to request topic-level programs
  • Integrate multiple modalities visual, audio and
    Text information
  • Multi-level concepts
  • Low low-level feature
  • Mid object detection, event modeling
  • High classification result of semantic content
  • Probabilistic model, Using Bayesian network for
    classification (causal relationship,
    domain-knowledge)

22
(No Transcript)
23
How to work with the framework?
  • Preprocessing
  • Story segmentation (shot detection)
  • VOCR, Speech Recognition
  • Key frame selection
  • Feature Extraction
  • Visual features based on key-frame
  • Color, texture, shape, sketch, etc.
  • Audio features
  • average energy, bandwidth, pitch, mel-frequency
    cepstral coefficients, etc.
  • Textual features (Transcript)
  • Knowledge tree, a lot of keyword categories
    politics, entertainment, stock, art, war, etc.
  • Word spotting, vote histogram
  • Motion features
  • Camera operation Panning, Tilting, Zooming,
    Tracking, Booming, Dollying
  • Motion trajectories (moving objects)
  • Object abstraction, recognition
  • Building and training the Bayesian network

24
Challenging points
  • Preprocessing is significant in the framework.
  • Accuracy of key-frame selection
  • Accuracy of speech recognition VOCR
  • Good feature extraction is important for the
    performance of classification.
  • Modeling semantic video objects and events
  • How to integrate multiple modalities still need
    to be well considered.

25
Conclusion
  • Introduction of several basic concepts
  • Semantic video modeling and indexing
  • Propose a multimodal framework for topic
    classification of Video
  • Discussion of Challenging problems

26
Q A
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com