Semantic Analysis for Video Contents Extraction Spotting by Association in News Video

About This Presentation

Title:

Semantic Analysis for Video Contents Extraction Spotting by Association in News Video

Description:

Semantic Analysis for Video Contents Extraction -Spotting by Association in News ... For semantic checking, use Hypernym relation in WordNet ... – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 27

Provided by: bradh8

Category:

more less

Transcript and Presenter's Notes

Title: Semantic Analysis for Video Contents Extraction Spotting by Association in News Video

1
Semantic Analysis for Video Contents Extraction
-Spotting by Association in News Video

Paper by
Yuichi NAKAMURA
Takeo KANADE
Presented By- Hemant Joshi

2
Introduction

Enormous amount of multimedia data
Linking two news matters together
Semantic linking
Using closed-captioning along with video

3
Video Content Spotting by Association

Necessity for multiple Modalities
video content extraction from only language or
image data is not reliable
They say'' difficult to determine without
semantics.

4
Situation Spotting by Association

Association between language and image clues is
important key.
Two advantages
Reliable detection utilizing both images and
language
The data explained by both modalities is clearly
understandable to users.

5
Situation Spotting by Association (Con.)
6
Situation Spotting by Association (Con.)
7
Language Clue Detection

Simple Keyword Spotting
Direct Vs. Indirect narration
Keyword usage for speech

8
Language Clue Detection (Cont.)

Keyword usage for meeting and visiting

9
Screening Keywords

To avoid false detection of keywords not related
to subject matter of interest, parse the sentence
in transcripts, check the role of each keyword
and check the semantics of the subject, the verb
and the objects. Also consider following
Part-of-speech of each word can be used as
keyword. Example- talk as verb
If keyword is verb, subject or object is checked
semantically. For semantic checking, use Hypernym
relation in WordNet
Negative sentences or those in future tense can
be ignored.
Location name which follows several kinds of
prepositions such as in, to is considered as
a language clue.

10
Process - Conditions for key-sentence detection

In key-sentence detection, keywords are detected
from transcripts.
Keywords are syntactically and semantically
checked and evaluated by using the parsing
results.
we focus only on subjects and verbs, results are
more acceptable. (80 correct CNN news
headlines)
A sentence including one or more words which
satisfy these conditions is considered a
key-sentence.

11
Process - Key-sentence detection result

The figure (X/Y/Z) in each table shows the
numbers of detected key-sentence
X is the number of sentences which include
keywords
Y is the sentences removed by the above keyword
screening
Z is the number of sentences incorrectly removed

12
Image Clue Detection Key Image

Image Clues ?
Face close-ups
People Images
Outdoor Scenes
Usage of Face close-up

13
Key Image Usage of People Images

usage of people images is the description about
crowds, such as people in a demonstration

14
Key Image Outdoor Scenes

In the case of outdoor scenes, images describe
the place, the degree of a disasters, etc.

15
Key Image Detection

Face Close-up Detection
In this research, human faces are detected by the
neural-network based face detection program. Most
face close-ups are easily detected because they
are large and frontal. Therefore, most frontal
faces, less than half of the small faces and
profiles are detected.
People Image and Outdoor Scene Detection
As for images with many people, the problem
becomes difficult because small faces and human
figures are more difficult to detect. The same
can be said of outdoor scene detection.
Automatic face and outdoor scene detection is
still under development. For the experiments in
this paper, we manually pick them. Since the
representative image of each cut is automatically
detected, it takes only a few minutes for us to
pick those images from a 30-minute news video.

16
Association by Dynamic Programming

Basic Idea
The detected data is the sequence of key images
and that of key-sentences to which starting and
ending time is given. If a key image duration and
a key-sentence duration have enough overlap (or
close to each other) and the suggested situations
are compatible, they should be associated.
Basic Assumption
Order of a key image sequence and that of a
key-sentence sequence are the same.
The basic idea is to minimize the following
penalty value P.
P Sumj \in Sn Skips(j) Sumk \in In Skipi(k)
Sumj \in S, k \in I Match(j, k)
where S and I are the key-sentences and key
images which have corresponding clues in the
other modality, Sn and In are those without
corresponding clues. Skips is the penalty value
for a key-sentence without inter-modal
correspondence, Skipi is for a key image without
inter-modal correspondence, and Match(j,k) is the
penalty for the correspondence between the j-th
key-sentence and the k-th key image.

17
Association by DP - Cost Evaluation

Skipping Cost(Skip)
The penalty values are determined by the
importance of the data, that is the possibility
of each data having the inter-modal
correspondence. In this research, importance
evaluation of each clues is calculated by the
following formula. The skip penalty Skip is
considered as -E.
E EtypeEdata
where the Etype is the type of evaluation, for
example, the evaluation of a type face
close-up. Edata is that of each clue, for
example, the face size evaluation for a face
close-up.
Example of cost definition
key-sentence speech 1.0, meeting 0.6, crowd 0.6,
travel/visit 0.6, location 0.6
key image face 1.0, people 0.6, scene 0.6

18
Association by DP - Cost Evaluation

Matching Cost(Match)
The evaluation of correspondences is calculated
by the following formula.
Match(i,j) Mtime(i, j) Mtype(i, j)
where Mtime is the duration compatibility between
an image and a sentence. The more their durations
overlap, the less the penalty becomes.
A key image's duration (di) is the duration of
the cut from which the key image is taken the
starting and ending time of a sentence in the
speech is used for key-sentence duration (ds). In
the case where the exact speech time is difficult
to obtain, it is substituted by the time when
closed-caption appears.
The actual values for Mtype are shown in Table.
They are roughly determined by the number of
correspondences in our sample videos.

19
Experiments Results
20
Results (Continued.)
21
Usage of Results

Summarization and Presentation tool
Around 70 segments are spotted for each 30-minute
news video. This means an average of 3 segments
in a minute. If a topic is not too long, we can
place all of the segments in one topic into one
window. This view could be a good presentation of
a topic as well as a good summarization tool.
Each pair of a picture and a sentence is an
associated pair. The picture is a key image, and
the sentence is a key-sentence. The position of
the pair is determined by the situations defined
This view enables us to overlook how the topic is
organized. Visit and place information is given
first, meeting information is given second, then
a few public speeches and opinions are given.

22
Usage of Results (Continued.)

Data tagging to video segments

23
News Video topic explainer (Category Time Order)
24
Details in Topic Explainer
25
Conclusion

The idea of the Spotting by Association in news
video.
video segments with typical semantics are
detected by associating language clue and image
clue.
Most of the detected segments fit the typical
situations
Proposed new applications by using detected news
segments.
future work
Improvement of key image and key-sentence
detection
Check the effectiveness of this method with other
kinds of videos.

26
Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Semantic Analysis for Video Contents Extraction Spotting by Association in News Video - PowerPoint PPT Presentation

Semantic Analysis for Video Contents Extraction Spotting by Association in News Video

Semantic Analysis for Video Contents Extraction -Spotting by Association in News ... For semantic checking, use Hypernym relation in WordNet ... – PowerPoint PPT presentation