Semantic Analysis for Video Contents Extraction Spotting by Association in News Video - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Semantic Analysis for Video Contents Extraction Spotting by Association in News Video

Description:

Semantic Analysis for Video Contents Extraction -Spotting by Association in News ... For semantic checking, use Hypernym relation in WordNet ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 27
Provided by: bradh8
Category:

less

Transcript and Presenter's Notes

Title: Semantic Analysis for Video Contents Extraction Spotting by Association in News Video


1
Semantic Analysis for Video Contents Extraction
-Spotting by Association in News Video
  • Paper by
  • Yuichi NAKAMURA
  • Takeo KANADE
  • Presented By- Hemant Joshi

2
Introduction
  • Enormous amount of multimedia data
  • Linking two news matters together
  • Semantic linking
  • Using closed-captioning along with video

3
Video Content Spotting by Association
  • Necessity for multiple Modalities
  • video content extraction from only language or
    image data is not reliable
  • They say'' difficult to determine without
    semantics.

4
Situation Spotting by Association
  • Association between language and image clues is
    important key.
  • Two advantages
  • Reliable detection utilizing both images and
    language
  • The data explained by both modalities is clearly
    understandable to users.

5
Situation Spotting by Association (Con.)
6
Situation Spotting by Association (Con.)
7
Language Clue Detection
  • Simple Keyword Spotting
  • Direct Vs. Indirect narration
  • Keyword usage for speech

8
Language Clue Detection (Cont.)
  • Keyword usage for meeting and visiting

9
Screening Keywords
  • To avoid false detection of keywords not related
    to subject matter of interest, parse the sentence
    in transcripts, check the role of each keyword
    and check the semantics of the subject, the verb
    and the objects. Also consider following
  • Part-of-speech of each word can be used as
    keyword. Example- talk as verb
  • If keyword is verb, subject or object is checked
    semantically. For semantic checking, use Hypernym
    relation in WordNet
  • Negative sentences or those in future tense can
    be ignored.
  • Location name which follows several kinds of
    prepositions such as in, to is considered as
    a language clue.

10
Process - Conditions for key-sentence detection
  • In key-sentence detection, keywords are detected
    from transcripts.
  • Keywords are syntactically and semantically
    checked and evaluated by using the parsing
    results.
  • we focus only on subjects and verbs, results are
    more acceptable. (80 correct CNN news
    headlines)
  • A sentence including one or more words which
    satisfy these conditions is considered a
    key-sentence.

11
Process - Key-sentence detection result
  • The figure (X/Y/Z) in each table shows the
    numbers of detected key-sentence
  • X is the number of sentences which include
    keywords
  • Y is the sentences removed by the above keyword
    screening
  • Z is the number of sentences incorrectly removed

12
Image Clue Detection Key Image
  • Image Clues ?
  • Face close-ups
  • People Images
  • Outdoor Scenes
  • Usage of Face close-up

13
Key Image Usage of People Images
  • usage of people images is the description about
    crowds, such as people in a demonstration

14
Key Image Outdoor Scenes
  • In the case of outdoor scenes, images describe
    the place, the degree of a disasters, etc.

15
Key Image Detection
  • Face Close-up Detection
  • In this research, human faces are detected by the
    neural-network based face detection program. Most
    face close-ups are easily detected because they
    are large and frontal. Therefore, most frontal
    faces, less than half of the small faces and
    profiles are detected.
  • People Image and Outdoor Scene Detection
  • As for images with many people, the problem
    becomes difficult because small faces and human
    figures are more difficult to detect. The same
    can be said of outdoor scene detection.
  • Automatic face and outdoor scene detection is
    still under development. For the experiments in
    this paper, we manually pick them. Since the
    representative image of each cut is automatically
    detected, it takes only a few minutes for us to
    pick those images from a 30-minute news video.

16
Association by Dynamic Programming
  • Basic Idea
  • The detected data is the sequence of key images
    and that of key-sentences to which starting and
    ending time is given. If a key image duration and
    a key-sentence duration have enough overlap (or
    close to each other) and the suggested situations
    are compatible, they should be associated.
  • Basic Assumption
  • Order of a key image sequence and that of a
    key-sentence sequence are the same.
  • The basic idea is to minimize the following
    penalty value P.
  • P Sumj \in Sn Skips(j) Sumk \in In Skipi(k)
  • Sumj \in S, k \in I Match(j, k)
  • where S and I are the key-sentences and key
    images which have corresponding clues in the
    other modality, Sn and In are those without
    corresponding clues. Skips is the penalty value
    for a key-sentence without inter-modal
    correspondence, Skipi is for a key image without
    inter-modal correspondence, and Match(j,k) is the
    penalty for the correspondence between the j-th
    key-sentence and the k-th key image.

17
Association by DP - Cost Evaluation
  • Skipping Cost(Skip)
  • The penalty values are determined by the
    importance of the data, that is the possibility
    of each data having the inter-modal
    correspondence. In this research, importance
    evaluation of each clues is calculated by the
    following formula. The skip penalty Skip is
    considered as -E.
  • E EtypeEdata
  • where the Etype is the type of evaluation, for
    example, the evaluation of a type face
    close-up. Edata is that of each clue, for
    example, the face size evaluation for a face
    close-up.
  • Example of cost definition
  • key-sentence speech 1.0, meeting 0.6, crowd 0.6,
    travel/visit 0.6, location 0.6
  • key image face 1.0, people 0.6, scene 0.6

18
Association by DP - Cost Evaluation
  • Matching Cost(Match)
  • The evaluation of correspondences is calculated
    by the following formula.
  • Match(i,j) Mtime(i, j) Mtype(i, j)
  • where Mtime is the duration compatibility between
    an image and a sentence. The more their durations
    overlap, the less the penalty becomes.
  • A key image's duration (di) is the duration of
    the cut from which the key image is taken the
    starting and ending time of a sentence in the
    speech is used for key-sentence duration (ds). In
    the case where the exact speech time is difficult
    to obtain, it is substituted by the time when
    closed-caption appears.
  • The actual values for Mtype are shown in Table.
    They are roughly determined by the number of
    correspondences in our sample videos.

19
Experiments Results
20
Results (Continued.)
21
Usage of Results
  • Summarization and Presentation tool
  • Around 70 segments are spotted for each 30-minute
    news video. This means an average of 3 segments
    in a minute. If a topic is not too long, we can
    place all of the segments in one topic into one
    window. This view could be a good presentation of
    a topic as well as a good summarization tool.
  • Each pair of a picture and a sentence is an
    associated pair. The picture is a key image, and
    the sentence is a key-sentence. The position of
    the pair is determined by the situations defined
  • This view enables us to overlook how the topic is
    organized. Visit and place information is given
    first, meeting information is given second, then
    a few public speeches and opinions are given.

22
Usage of Results (Continued.)
  • Data tagging to video segments

23
News Video topic explainer (Category Time Order)
24
Details in Topic Explainer
25
Conclusion
  • The idea of the Spotting by Association in news
    video.
  • video segments with typical semantics are
    detected by associating language clue and image
    clue.
  • Most of the detected segments fit the typical
    situations
  • Proposed new applications by using detected news
    segments.
  • future work
  • Improvement of key image and key-sentence
    detection
  • Check the effectiveness of this method with other
    kinds of videos.

26
Questions?
Write a Comment
User Comments (0)
About PowerShow.com