ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery presentation

About This Presentation

Transcript and Presenter's Notes

Title: ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery

1
Content-Based Video Analysis based on Audiovisual
Features for Knowledge Discovery

Chia-Hung Yeh
Signal and Image Processing Institute
Department of Electrical Engineering
University of Southern California

2
Vision
Parsing or Segmentation
3
Guidelines

Motivation
Introduction
Overview of visual and audio content
Video abstraction
Multimodal information concept
Knowledge discovery via video mining
Our previous work
Conclusion and future work

4
Motivation

Amazing growth in the amount of digital video
data in recent years.
Develop tools for classify, retrieve and abstract
video content
Develop tools for summarization and abstraction
Bridge a gap between low-level features and
high-level semantic content
To let machine understand video is important and
challenging

5
Why, What and How

Why video content analysis?
Modern multimedia technologies have led to huge
amount of digital video collections. But,
efficient access to video content is still in its
infancy, because of its bulky data volume and
unstructured data format.
What is video content analysis?
Video content analysis analyzes the video content
and attempts to automatically understand the
embedded video semantics as humans do
How to do video content analysis?

6
Overview of Visual Content

Structured analysis
Extract hierarchical video structure

Key sentences
Sentences
grouped into
Words
segmented into
Text Document
7
Overview of Audio Content

Continuous in the time domain, not like visual
Multiple sound source exists in a sound track
like many objects in a single frame
It is tough to separate audio content and give a
suitable description
Framework in MPEG-7, silence, timbre, waveform,
spectal, harmonic and fundamental frequency
Some special features for music and speech

8
Content-Based Video Indexing

Process of attaching content based labels to
video shots
Essential for content-based classification and
retrieval
Some required techniques
Shot detection
Key frame selection
Object segmentation and recognition
Visual/audio feature extraction
Speech recognition, video text, VOCR

9
Content-Based Video Classification

Segment classify videos into meaning categories
Classify videos based on predefined topic
Multimodal concept
Visual features
Audio features
Metadata features
Domain-specific knowledge

10
Query (Retrieval Methods)

Simple visual feature query
Feature combination query
Query by example (QBE)
Retrieve video which is similar to example
Localized feature query
Example retrieve video with a running car toward
right
Object relationship query
Concept query (query by keyword)
Metadata
Time, date and etc.

11
The Ways to Browse a Video

Playback faster
Audio time scale modification time saving
factor 1.5 to 2.5
15 - 20 time reduction by removing and
shortening pauses
Storyboard
Composed of representative still frames
(Keyframes)
Moving storyboard
Display keyframes while synchronized with the
original audio track
Highlight
Pre-defined special event (example sport and
news)
Skimming
Extract short video clips to build a much shorter
video

12
Timeline of Related Technique Development
13
Image Retrieval and Video Browsing

Query by Image Content (QBIC), IBM, 1995
Complex multi-feature and multi-object queries
Video browsing
Quickly and efficiently Discover the information
Browsing and searching are usually complement
each other
Visual content browsing us easier than audio
content
Achieved by static storyboard, dynamic video
clips, fast forward
Representative work
Gary Marchionini, University of Maryland
S.-F. Chang, Columbia University

14
Video Abstraction

Video summarization and video skimming
Belong to video abstraction and different from
video browsing
Automatically retrieve the most significant and
most representative a collection of segments
Required techniques
Shot detection, scene generation
Motion analysis
Face recognition
Audio segmentation
Text detection
Music detection

15
Video Abstraction

A video abstract
A sequence of still or moving images which
preserve essential original video content while
it is much shorter than the original one
Applications
Automated authoring of web
content
Web news
Web seminar
Consumer domain applications
Analyzing, filtering, and browsing

16
Video Summarization (I)

A collection of salient frames that represent the
underlying content
Most related work focus on the ways to extract
still frame
Categorize into three classes
Frame-based
Randomly or uniformly select
Shot-based
Keyframe
Feature-based
Motion, color and so on

17
Video Summarization (II)

Representative work
Y. Taniguchi, (1995)
Frame-based scheme
Simple but may not representative due to not
uniform length of shots
H.-J. Zhang, Microsoft Research China (1997)
Keyframe based on color histogram
Gong and Liu, NEC Laboratories of American (2003)
SVD (Single Value Decomposition)
Capture temporal and spatial characteristics
Tseng, Lin and J. R. Smith, IBM T. J. Research
Center (2002)
Video summarization scheme for pervasive mobile
device

18
Video Skimming

A good skim is much like a movie trailer
A synopsis of the entire video
Representative work
M. Smith and T. Kanade, Carnegie Mellon
University (1995)
Audio and image characterization
S. Pfeiffer, University of Mannheim (1996)
VAbstract system
Detection of special events such as dialogs,
explosions and text occurrences
H. Sundaram and S.-F. Chang, Columbia University
(2001)
A semantics skimming system
Visual complexity for human understanding
Film syntax

19
Video Skimming Application

Video content transcoding
Content-based live sport video filtering

20
Video Shot Structure

Shot, a cinematic term, is the smallest
addressable video unit (the building block). A
shot contains a set of continuously recorded
frames
Two types of video shots
Camera break ? abrupt content change between
neighboring frames. Usually corresponds to an
editing cut
Gradual transition ? smooth content change over a
set of consecutive frames. Usually caused by
special effects
Shot detection is usually the first step towards
video content analysis

21
Scene Characteristics

Scene is a semantic concept which refers to a
relatively complete video paragraph with coherent
semantic meaning It is subjectively defined
Shots within a movie scene have following 3
features
Visual similarity ?
Since a scene could only be developed within
certain spatial and temporal localities, the
directors have to repeat some essential shots to
convey parallelism and continuity of activities
due to the sequential nature of film making
Audio similarity
Similar background noises
Speeches from the same person have similar
acoustic characteristics
Time locality
Visually similar shots should also be temporally
close to each other if they do belong to the same
scene

22
Basic Audio Features

Energy
Silence or pause detection
Zero crossing rate (ZCR)
The frequency of the audio signal amplitude
passing through the zero value in a given time
Energy centroid
Speech range 100 Hz to 7k Hz
Music range 16 Hz to 16000 Hz
Band periodicity
Harmonic sounds
Music High frequency components are integer
multiples of the lowest one
Speech Pitch
MFCC - (Mel-Frequency Cepstral Coefficients)
13 linearly-spaced filters

23
Multimodal Information Concept
24
Multimodal Framework for Video Content
Interpretation

Application on automatic TV Programs abstraction
Allow user to request topic-level programs
Integrate multiple modalities visual, audio and
text information
Multi-level concepts
Low low-level feature
Mid object detection, event modeling
High classification result of semantic content
Probabilistic model using Bayesian network for
classification (causal relationship,
domain-knowledge)

25
Probabilistic Model Data Fusion
26
How to Work with the Framework

Preprocessing
Video segmentation (shot detection) and key frame
selection
VOCR, speech recognition
Feature Extraction
Visual features based on key-frame
Color, texture, shape, sketch, etc.
Motion features
Camera operation Panning, Tilting, Zooming,
Tracking, Booming, Dollying
Motion trajectories (moving objects)
Object abstraction, recognition
Audio features
average energy, bandwidth, pitch, mel-frequency
cepstral coefficients, etc.
Textual features (Transcript)
Knowledge tree, a lot of keyword categories
politics, entertainment, stock, art, war, etc.
Word spotting, vote histogram
Building and training the Bayesian network

27
Challenging Points

Preprocessing is significant in the framework.
Accuracy of key-frame selection
Accuracy of speech recognition VOCR
Good feature extraction is important for the
performance of classification.
Modeling semantic video objects and events
How to integrate multiple modalities still need
to be well considered

28
Knowledge Discovery via Video Mining

Objectives
Find the hidden links between isolated news,
events, etc.
Find the general trend of an event development
Predict the possible future event
Discover abnormal events
Required Technologies
Domain-specific knowledge model
Mining association rules, sequential patterns and
correlations
Effective and fast classification and clustering
Challenges
Model build-up in special knowledge domain
Integration of semantic mining and feature-based
mining
Effective and scalable classification and
clustering algorithms

29
Video Mining Issues

Frequent/Sequential Pattern Discovery
Fast and scalable algorithms for mining frequent,
sequential and structured patterns and for
correlation analysis
Similarity of rule/event search/measurement
Efficient and fast classification and clustering
algorithms
Constraint-based classification and clustering
algorithms
Spatiotemporal data mining algorithms
Stream data mining (classification and
clustering) algorithms
Surprise/outlier discovery and measurement
Detection of outliers based on similarity and
trend analysis
Detection of outliers and surprised events based
on stream data mining algorithms
Multidimensional data mining for trend prediction

30
Framework of Video Mining
31
Our Previous Work

TV Commercial Detection
Visual/audio information processing
Cinema rules
Intensity mapping
Tempo analysis in digital video (Professional
video)
Audio tempo
Motion tempo
Home video processing (Non-professional)
Quality enhancement (Bad shot detection)
Music and video matching

32
Commercial Detection

First step to do any TV program content
management
Monitor broadcast
Government
Advertisement Company
Commercial features
Delimiting black frame (not available in some
countries)
High cut frequency and short shot interval
(important feature)
Still images
Special editing styles and effects
Text and logo

33
Commercial Detection

Visual information processing
Black frame detection
Shot detection its statistic analysis
Still image detection
Text-region detection
Edge change rate detection
Audio information processing
Volume control
Silence

34
Commercial Detection

Structure of TV program

Normal program
Normal Program with Station logo
Spot
Spot
Normal program
Black frame
Structure of TV program
35
Shot Detection Its Statistic Analysis
Commercial Start point
36
Still Image Detection

Still Image
Video Clip is composed of a sequence of image
Find out a set of consecutive images that have
little change over a period of time
Difficulty
Even though we feel that video clip is still, the
difference between two consecutive images is
seldom zero
It is tough to measure the moving part. (human
eyes are sensitive to motion)
Main idea
Quantify motion in each image to detect still
image

37
Still Image Detection
Error detection
Really still images
38
Tempo Analysis and Cinema Rules

The visual story - seeing the structure of film,
TV, and new media, Bruce Block
Relationship between story structure and visual
structure
Their intensity maps are correlated
Principle of contrast and affinity
The greater the contrast in a visual component,
the more the visual intensity or dynamic
increases

39
Cinema Rules

Every feature film has a well designed story
structure, which contains the beginning
(exposition), the middle (conflict), and the end
(resolution)

EX exposition ? gives the facts needed to begin
the story CO conflict ? contains rising actions
or conflict CX climax R resolution ? end the
story
40
Cinema Rules

Scene
A simple theme in a scene
Each scene is composed of setup part, progressing
part, and resolution part
Final film is just a way to present this theme
Dialog
Close-up view
A story unit
A example of scene
Main actors drove the main actress from train
station back to home
A simple action
Met at train station -gtOn the road-gtAnother main
actor joined them -gt Arrive home

41
Audio Tempo

Music tempo
Definition in music
Note
Meter A longer period contains many beats. For
example, we can count as ONE-two-three,
ONE-two-three
Tempo (pace/beat period)
It is often indicated in the beginning. For
example, the rate should be 100 quarter notes per
minute (100 times we clap per minute)

42
Audio Tempo

Speech tempo
Emotion detection
Segmental durations
Syllable or phoneme
Audio tempo
Short time pace
Short-term memory
The number of sound events per unit of time
The more events, the faster it seems to go
Onset
A new note or a new syllable

43
Audio Tempo

Diagram of audio tempo analysis

44
Audio Tempo

Frequency filterbank
Perceptual frequency
Critical bands
Wavelet-packet
Multirate system
Envelope extractor
Rectify
Filtering 50 ms half-Hamming window
Differentiator
First-order difference
Half-wave rectified

Input signal and detected onsets
45
Audio Tempo

Boundary of story units
Local minima of audio tempo
Post signal processing
Help to get local minima
Three steps
Lowpass filtering
Morphological operation
Minmax
Close operation
Detect local minima
Detected valleys

Post processing for audio tempo analysis
46
Motion Analysis

The variance of motion vector
Where is a window, is the
average length of motion vectors for each shot,
and is shot index

47
Motion Analysis

Boundary of story units
Transition Edges
Post processing
Morphological operation
Median
Maxmin
Minmax
Gradient
Detect edges

Post processing for visual tempo
48
Skimming Video

Test data
Legends of The Fall
Beginning 26 minutes
MPEG format
352240 pixels
44.1 KHz

49
Home Video Processing

Home video characteristics
Fragmental
Sound may not be very important
Bad shots
Stabilization
Focus
Lighting

50
Bad Shots

Shaky
Drive
Walk
Vibration of the camera motions of successive
frames

51
Bad Shots

Ill-light
Too dark/bright
Variance too much
Diaphragm
Lighting Problem
Average of luminance
Highest 1/3 pixels and lowest 1/3 pixels
Negative feedback

52
Bad Shots

Blur
Motion blur
Out-of-focus blur
Foggy blur

53
Music and Video Matching

Shot detection
Remove bad shots
Match music tempo
Shot length
Motion activity

54
Authoring Scheme

Match music tempo
High tempo
Small segment length
Transition time
High motion activity

55
Experimental Results

Test data
Input music 5.5-minutes music, Canon
Input video clips
Activities of babies of 0 3 years old
Man-made bad shots
Average clip length is about 20 seconds
Total length is 50 minutes

56
Well-Known Research in Video Content Analysis
Field

Well-known university
Digital Video Multimedia laboratory (DVMM),
Columbia University
MIT Media laboratory
Information Digital Video Understanding, Carnegie
Mellon University
Department of Electrical and Computer
Engineering, University of Illinois of
Urbana-Champaign
Signal and Image Processing Institute, University
of Southern California
Department of Electrical Engineering, Princeton
University
Language and media processing laboratory,
University of Maryland

57
Well-Known Research in Video Content Analysis
Field

Well-known RD laboratory
IBM T. J. Watson research center
IBM Almaden research center
Intel corporation
Sharp Laboratory of America (SLA)
Microsoft research laboratory
Microsoft research China
Hawlett-Packard research laboratory
ATT Bell laboratory
InterVideo
Pinnacle

58
Conclusion

Introduction of several basic concepts
Basic processing and low-level feature extraction
Semantic video modeling and indexing
Multimodal framework for topic classification of
Video
Knowledge discovery via video mining
Our research results
Discussion of Challenging problems

59
Questions
Thank You

Write a Comment

User Comments (0)

About PowerShow.com

ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery PowerPoint PPT Presentation