Semantic Content based Modeling - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Semantic Content based Modeling

Description:

C3 = create Cnn.HeadlineNews.rv 32 65. D1 = (description C1 'Anchor speaking' ... Example: title = 'CNN Headline News' 15. Output Characteristics ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 57
Provided by: csewe
Category:

less

Transcript and Presenter's Notes

Title: Semantic Content based Modeling


1
Semantic Content based Modeling
  • Video semantics are captured and organized to
    support video retrieval
  • Difficult to automate
  • Relying on manual annotation
  • Capable of supporting natural language like
    queries.

2
Video Content Extraction
  • Other forms of information extraction can be
    employed
  • Close-captioned text
  • Speech recognition
  • Descriptive information from screenplay
  • Key frames that characterize a shot
  • These content information can be associated with
    the video story units.

3
Existing Semantic-Level Models
  • Segmentation-based Models
  • Stratification-based Models
  • Temporal Coherent Models

4
Segmentation-based Modeling
  • A video stream is segmented into temporally
    continuous segments
  • Each segment is associated with a description
    which could be natural text, keywords, or other
    kinds of annotation.
  • Disadvantages
  • Lack of flexibility
  • Limited in representing semantics

Lack of flexibility Limited capability of
representing semantics
5
Stratification-based Modeling
70
0
5
15
20
30
35
60
85
90
We partition the contextual information into
single events. Each event is associated with a
video segment called a stratum. Strata can
overlap or encompass each other.
6
Temporal Coherent
  • Each event is associated with a set of video
    segments where it happens.
  • More flexible in structuring video semantics.

More flexible in structuring video semantics
7
Stratum
The concept of stratification can be used to
assign descriptions to video footage. - Each
stratum refers to a sequence of video
frames. - The strata may overlap or totally
encompass each other.
Car wreck rescue mission
Medics
Victim
In ambulance
In stretcher
Pulled free
Siren
Ambulance
Video Frames
Advantage Allowing easy retrieval by keyword
8
Video Algebra
  • Goal To provide a high-level abstraction that
  • models complex information associated with
    digital video data and
  • supports content-based access
  • Strategy
  • The algebraic video data model consists of
    hierarchical compositions of video expressions
    with high-level semantic descriptions
  • The video expressions are constructed using video
    algebra operations

9
Presentation
  • In the algebraic video data model, the
    fundamental entity is a presentation.
  • A presentation is a multiwindow spatial,
    temporal,
  • and content combination of video segments.
  • Presentations are described by video
    expressions.
  • - The most primitive video expression creates a
    single-window presentation from a raw video
    segment.
  • - Compound video expression are constructed from
    simpler ones using video algebra operations.

a compound video expression
video expression
a primitive video expression
an algebraic video node
video expression
video expression
video expression
raw video
raw video
Note An algebraic video node provides a means of
abstraction by which video expressions can be
named, stored, and manipulated as units.
10
Video Algebra Operations
  • The video algebra operations fall into four
    categories
  • 1. Creation defines the construction of
    video expressions from raw video.
  • 2. Composition defines temporal relationships
    between component video expressions.
  • 3. Output defines spatial layout and audio
    output for component video expressions.
  • 4. Description associates content attributes
    with a video expression.

11
Composition
The composition operations can be combined
to produce complex scheduling definitions and
constraints.
create a video presentation raw
video segment
C1 create Cnn.HeadlineNews.rv 10 30 C2
create Cnn.HeadlineNews.rv 20 40 C3 create
Cnn.HeadlineNews.rv 32 65 D1 (description
C1 Anchor speaking) D2 (description C2
Professor Smith) D3 (description C3 Economic
reform)
D3 follows D2 which follows D1, and common
footages are not repeated. (It creates a
non-redundant video stream from three overlapping
segments.)
C1
C2
C3
Anchor speaking
Professor Smith
Economic reform
12
Composition Operators (1)
  • E1 ? E2 defines the presentation where E2
    follows E1
  • E1 È E2 defines the presentation where E2
    follows E1 and common footage is not repeated.
  • E1 Ç E2 defines the presentation where only
    common footage of E1 and E2 is played.
  • E1 - E2 defines the presentation where only
    footage of E1 that is not in E2 is played.
  • E1 E2 E1 and E2 are played concurrently and
    terminate simultaneously.
  • (test) ? E1E2...En Ei is played if test
    evaluates to i.
  • loop E1 time defines a repetition of E1 for a
    duration of time
  • stretch E1 factor sets the duration of the
    presentation equal to factor times duration of E1
    by changing the playback speed of the video
    segment.
  • limit E1 time sets the duration of the
    presentation equal to the minimum of time and the
    duration of E1, but the playback speed is not
    changed.

13
Composition Operators (2)
  • transition E1 E2 type time defines type
    transition effect between E1 and E2 time defines
    the duration of the transition effect
  • The transition type is one of a set of
    transition effects, such as dissolve, fade, and
    wipe.
  • contains E1 query defines the presentation that
    contains component expressions of E1 that match
    query.
  • A query is a Boolean combination of attributes
  • Example text smith and text question

14
Descriptions
  • description E1 content specifies that E1 is
    described by content.
  • a content is a Boolean combination of attributes
    that consists of a field name and a value.
  • some field names have predefined semantics (e.g.,
    title), while other fields are user-definable.
  • values can assume a variety of types, including
    strings and video node names.
  • field names or values do not have to be unique
    within a description.
  • hide-content E1 defines a presentation that
    hides the content of E1 (i.e.., E1 does not
    contain any description).
  • This operation provides a method for creating
    abstraction barriers for content-based access.

Example title CNN Headline News
15
Output Characteristics
  • Video expressions include output characteristics
    that specify the screen layout and audio output
    for playing back children streams.
  • Since expressions can be nested, the spatial
    layout of any particular video expression is
    defined relative to the parent rectangle.
  • window E1 (X1 , Y1 ) - (X2 , Y2 ) priority
  • specifies that E1 will be displayed with
    priority in the window defined by the top-left
    corner (X1 , Y1) and the bottom-right corner (X2
    , Y2) such that Xi ÃŽ 0, 1 and Yi ÃŽ 0, 1.
  • Window priorities are used to resolve overlap
    conflicts of screen display.
  • audio E1 channel force priority
  • specifies that the audio of E1 will be output to
    channel with priority if force is true, then the
    audio operation overrides any channel
    specifications of the component video expressions.

16
Output Characteristics An example
  • C1 create MavericksvsBulls.rv 300 500
  • P1 window C1 (0, 0) - (0.5, 0.5) 10
  • P2 window C1 (0, 0.5) - (0.5, 1) 20
  • P3 window C1 (0.5, 0.5) - (1, 1) 30
  • P4 window C1 (0.5, 0) - (1, 0.5) 40
  • P5 (P1 P2 P4)
  • P6 (P1 P2 P3 P4)
  • (P5
  • (window
  • (P5 (window P6 (0.5, 0.5) - (1, 1)
    60))
  • (0.5, 0.5) - (1, 1) 50))

larger means
higher priority
bottom-right
top-left
0
P1
P4
P1
P4
P2
P1
P4
P2
P2
P3
17
Scope of a video node description
  • The scope of a given algebraic video node
    description is the subgraph that originates from
    the node.
  • The components of a video expression inherit
    descriptions by context.
  • All the content attributes associated with some
    parent video nodes are also associated with all
    its descendant nodes.

18
Content-Based Access
  • Search query Search a collection of video nodes
    for video expressions that match query.
  • Strategy Matching a query to the attributes of
    an expression must take into account all of
    the attributes of that expression including the
    attributes of its encompassing expressions.
  • Example search text smith AND text question

Smith on economic reform
?
This is the result of the query
Smith
Anchor
O
Question from audience
O
This node also satisfies the query but is not
returned because its a descendant of a node
already in the result set.
Question
Raw video
19
Browsing and Navigation
  • Playback presentation
  • Plays back the video expression. It enables the
    user to view the presentation defined by the
    expressions.
  • Display video-expression
  • Display the video expression. It allows the user
    to inspect the video expression.
  • Get-parent video-expression
  • Returns the set of nodes that directly point to
    video-expression.
  • Get-children video expression
  • Returns the set of nodes that video- expressions
    directly points to.

20
Algebraic Video System Prototype
  • The Algebraic Video System is a prototype
    implementation of the algebraic video data model
    and its associated operations.
  • The implementation is build on top of three
    existing subsystems
  • The VuSystem is used for managing raw video data
    and for its support of Tcl (Tool command
    language) programming. It provides an environment
    for recording, processing, and playing video.
  • The Semantic File System is used as a storage
    subsystem with content-based access to data for
    indexing and retrieving files that represent
    algebraic video nodes.
  • The WWW server provides a graphical interface to
    the system that includes facilities for querying,
    navigating, video editing and composing, and
    invoking the video player.

21
Multimedia Objects in Relational Databases
  • The most straightforward and fundamental support
    of multimedia data types in a RDBMS is the
    ability to declare variable-length fields in the
    tables.
  • Some of the names of variable-length bit or
    character string used in commercial products
    include
  • VARCHAR
  • BLOB
  • TEXT
  • IMAGE
  • CHARACTER VARYING / SQL92 /
  • VARGRAPHIC
  • LONG RAW
  • BYTE VARYING
  • BIT VARYING / SQL92 /
  • Some systems have maximal variable-length field
    as small as 256 bytes. Other systems allow field
    values as large as 2 GBytes.

22
BLOBs in InterBase
  • InterBase stores BLOBs in collections of
    segments. A segment in InterBase can be thought
    of as a fixed-length page or I/O block.
  • InterBase provides special API calls to retrieve
    and modify the segments.
  • open-BLOB opens the BLOB for
    reading
  • get-segment reads the next segment
  • create-BLOB opens the BLOB for writes
    or updates
  • put-segment saves the changes to the
    BLOB
  • Users can specify the length of each segment.

23
IMAGE TEXT in Sybases SQL Server
  • TEXT and IMAGE data types are supported in
    Sybases TransactSQL, which is an enhanced
    version of the SQL standard.
  • TEXT and IMAGE data types can be as large as 2
    GBytes.
  • Internally, TEXT and IMAGE column values contain
    pointers to the first page of a linked list of
    pages.
  • Some of the functions supported
  • PATINDEX(pattern, column) returns the
    starting position of the first occurrence of the
    pattern in the column.
  • TEXTPTR(column) returns a pointer to the
    variable length field.

24
OODBs and Multimedia Applications
  • Object-oriented databases are more suitable for
    multimedia application development.
  • Better complex object support By their nature,
    many multimedia database applications, such as
    compound documents, need complex object support.
  • Extensibility and ability to add new types
    (classes) Users can add new types and extend
    the existing class hierarchies to address the
    specific needs of the multimedia application.
  • Better concurrency control and transaction model
    support Transaction concepts such as long
    transactions and nested transactions are
    important for multimedia applications.

25
Multimedia Data Types in UniSQL/X
  • UniSQL/X supports a class hierarchy rooted at
    generalized large object (GLO) class.
  • GLO class serves as the root of multimedia data
    type classes and provides a number of built-in
    attributes and methods.
  • For the content of GLO objects the user can
    create either a Large Object (LO) or a File Based
    Object (FBO).
  • LOs can only be accessed through UniSQL/X.
  • FBOs are stored in the host file system. The
    database stores a reference or a path for each
    FBO.
  • In addition to the base class GLO, UniSQL/X
    supports subclasses of GLO for specific
    multimedia data types
  • Audio class
  • Image class

26
Programming Multimedia Applications
  • An application is considered to be a multimedia
    object.
  • An application object uses or consists of many
    Basic Multimedia Objects (BMOs) and Compound
    Multimedia Objects (CMOs).
  • The specification of an object includes
  • binding information to a file
  • methods
  • event-driven processing (e.g., displaying the
    last image if the video ends before the audio).
  • The use of methods and events allows the
    application to create a script which express the
    interactions of different objects precisely and
    relatively simply.

27
A Multimedia-Program Example
28
Multimedia Information Retrieval (and Indexing)
  • Multimedia information retrieval
  • deals with the storage, retrieval, transport and
    presentation of different types of multi-media
    data (e.g., images, video clips, audio clips,
    texts,)
  • real need for managing multimedia data including
    their retrieval
  • Multimedia information retrieval in general
  • retrieval process
  • queries
  • indexing the medias
  • matching media and query representations

29
MMDBMS and Retrieval What is that ? First
attempt for a clearer meaning
  • Example
  • an insurance companys accident claim report as
    a multimedia object it includes
  • images (or video) of the accident
  • insurance forms with structured data
  • audio recordings of the parties involved in the
    accident
  • text report of the insurance companys
    representative
  • Multimedia databases store structured data and
    unstructured data
  • Multimedia retrieval systems must retrieve
    structured and unstructured data

30
MMDBMS and Retrieval (cont.)
  • Retrieval of structured data from databases
  • typically handled by a Database Management System
    (DBMS)
  • DBMS provides a query language (e.g., Structured
    Query Language, SQL for the relational data
    model)
  • deterministic matching of query and data
  • Retrieval of unstructured data from databases
  • typically handled by Information Retrieval (IR)
    system
  • similarity matching of uncertain query and
    document representations
  • result list of documents according to relevance

31
MMDBMS and Retrieval (cont.)
  • Multimedia database management systems should
    combine the Database Management System (DBMS) and
    information retrieval (IR) technology
  • data modeling capabilities of DBMSs with the
    advanced and similarity based query capabilities
    of IR systems
  • Challenge finding a data model that ensures
  • effective query formulation and document
    representation
  • efficient storage
  • efficient matching
  • effective delivery

32
MMDBMS and Retrieval (cont.)
  • Query formulation
  • must accommodate information needs of users of
    multimedia systems
  • Document representations and their storage
  • an appropriate modeling of the structure and
    content of the wide range of data of many
    different formats ( indexing) - XML ? - MPEG-7
  • cf. dealing with thousands of images, documents,
    audio and video segments, and free text
  • at the same time modeling of physical properties
    for
  • compression/ decompression, synchronization,
    delivery - MPEG-21

33
MMDBMS and Retrieval (cont.)
  • Matching of query and document representations
  • taking into account the variety of attributes and
    their relationships of query and document
    representations
  • combination of exact matching of structured data
    with uncertain matching of unstructured data
  • Delivery of data
  • browsing, retrieval
  • temporal constraints of video and audio
    presentation
  • merging of data from different sources (e.g., in
    medical networks)

34
MMDBMS Queries
  • 1) As in many retrieval systems, the user has the
    opportunity to browse and navigate through
    hyperlinks with querying need of
  • topic maps
  • summary descriptions of the multimedia objects
  • 2) Queries specifying the conditions of the
    objects of interest
  • idea of multimedia query language
  • should provide predicates for expressing
    conditions on the attributes, structure and
    content (semantics) of multimedia objects

35
MMDBMS Queries (cont.)
  • attribute predicates
  • concern the attributes of multimedia objects with
    an exact value (cf. traditional DB attributes)
  • e.g., date of a picture, name of a show
  • structural predicates
  • temporal predicates to specify temporal
    synchronization
  • for continuous media such as audio and video
  • for expressing temporal relationships between the
    frame representations of a single audio or video
  • e.g., Find all the objects in which a jingle is
    playing for the duration of an image display

36
MMDBMS Queries (cont.)
  • spatial predicates to specify spatial layout
    properties for the presentation of multimedia
    objects
  • examples of predicates contain, is contained in,
    intersect, is adjacent to
  • e.g., Find all the objects containing an image
    overlapping the associated text
  • temporal and spatial predicates can be combined
  • e.g., Find all the objects in which the logo of
    the car company is displayed, and when it
    disappears, a graphic (showing the increase in
    the company sales) is shown in the same position
    where the logo was
  • temporal and spatial predicates can
  • refer to whole objects
  • refer to subcomponents of objects with data
    model that supports complex object representation

37
MMDBMS Queries (cont.)
  • semantic predicates
  • concern the semantic and unstructured content of
    the data involved
  • represented by the features that have been
    extracted and stored for each multimedia object
  • e.g.,Find all the objects containing the word
    OFFICE or Find all red houses
  • uncertainty, proximity and weights can be
    expressed in query
  • multimedia query language
  • structured language
  • users do not formulate queries in this language,
    but enter query conditions by means of interfaces
  • natural language queries?
  • interface translates query to correct query syntax

38
MMDBMS Queries
  • 3) Query by example
  • e.g., video, audio
  • the query is composed by picking an example and
    choosing the features the object must comply with
  • e.g., in a graphical user interface (GUI) users
    chooses image of a house and domain features for
    the query Retrieve all houses of similar shape
    and different color
  • e.g., music recorded melody, note sequence
    being entered by Musical Instruments Digital
    Interface (MIDI)
  • 4) Question-answering?
  • e.g., questioning video images How many
    helicopters were involved in the attack on Kabul
    of December 20, 2001?

39
MMDBMS Example Oracles interMedia
  • Enables Oracle 9i to manage rich content,
    including images, audio, and video information in
    an integrated fashion with other traditional
    business data.
  • interMedia can parse, index, and store rich
    content, develop content rich Web applications,
    deploy rich content on the Web, and tune Oracle9i
    content repositories.
  • interMedia enables data management services to
    support the rich data types used in electronic
    commerce catalogs, corporate repositories, Web
    publishing, corporate communications and
    training, media asset management, and other
    applications for internet, intranet, extranet,
    and traditional application in an integrated
    fashion
  • http//technet.oracle.com

40
MMDBMS Indexing
  • Remember Indexing and Retrieval Systems.
  • Indexing assigning or extracting features that
    will be used for unstructured and structured
    queries (refers unfortunately often only to
    low-level features)
  • Often also segmentation detection of retrieval
    units
  • Two main approaches
  • manual
  • segmentation
  • indexing naming of objects and their
    relationships with key terms (natural language or
    controlled language)
  • automatic analysis
  • identify the mathematical characteristics of the
    contents
  • different techniques depending on the type of
    multimedia source (image, text, video, or audio)
  • possible manual correction

41
Indexing multimedia and features
  • multimedia object typically represented as set
    of features (e.g., as vector of features)
  • features can be weighted (expressing uncertainty
    or significance of its value)
  • can be stored and searched in an index tree
  • Features have to embedded with the semantic
    content

42
Indexing images
  • Automatic indexing of images
  • segmentation in homogeneous segments
  • homogeneity predicate defines the conditions for
    automatically grouping the cells
  • e.g., in a color image, cells that are adjacent
    to one another and whose pixel values are close
    are grouped into a segment
  • indexing recognition of objects simple
    patterns
  • recognition of low level features color
    histograms, textures, shapes (e.g., person,
    house), position
  • appearance features often not important in
    retrieval

43
Indexing audio
  • Automatic indexing of audio
  • segmentation into sequences ( basic units for
    retrieval) often manually
  • indexing
  • speech recognition and indexing of the resulting
    transcripts (cf. indexing written text retrieval)
  • acoustic analysis (e.g., sounds, music, songs
    melody transcription note encoding, interval and
    rhythm detection and chords information)
    translated into string
  • e.g., key melody extraction Tseng, 1999

44
Scene Segmentation based on Audio Information
  • Short Time Energy (STE) is a reliable indicator
    for silence detection.
  • Zero-Crossing Rate (ZCR) is a useful feature to
    characterize different non-silence audio signals
    (especially discern unvoiced speech )
  • Pitch (P value) is the fundamental frequency of
    an audio waveform
  • Spectrum Flux (SF) is defined as the average
    variation value of spectrum between two adjacent
    frames in a short-time analysis window to
    discriminate speech and environmental sound

45
Indexing video
  • Automatic indexing of video
  • segment basic unit for retrieval
  • objects and activities identified in each video
    segment can be used to index the segment
  • segmentation
  • detection of video shot breaks, camera motions
  • boundaries in audio material (e.g., other music
    tune, changes in speaker)
  • textual topic segmentation of transcripts of
    audio and of close-captions (see below)
  • heuristic rules based on knowledge of
  • type-specific schematic structure of video (e.g.,
    documentary, sports)
  • certain cues appearance of anchor person in news
    new topic

46
An example of indexing
  • Learning of textual descriptions of images from
    surrounding text (Mori et al., 2000)
  • training
  • images segmented in image parts of equal size
  • feature extraction for each image part (by
    quantization)
  • 4 x 4 x 4 RGB color histogram
  • 8 directions x 4 resolutions intensity histogram
  • words that accompany the image are inherited by
    each image part
  • words are selected from the text of the document
    that contains the image by selecting nouns and
    adjectives that occur with a frequency above a
    threshold
  • cluster similar image parts based on their
    extracted features
  • single-pass partitioning algorithm with minimum
    similarity threshold value

47
An example of indexing
  • for each word and each cluster is estimated
    P(wicj) as
  • where mji total frequency of word wi in
    cluster cj
  • Mj total frequency of all words in cj
  • testing
  • unknown image is divided into parts and image
    features are extracted
  • for each part, the nearest cluster is found as
    the cluster whose centroid is most similar to the
    part
  • the average likelihood of all the words of the
    nearest clusters is computed
  • k words with largest average likelihood are
    chosen to index the new image (in example k 3)

48
(No Transcript)
49
source Mori et al.
50
source Mori et al.
51
Demo Systems
  • Hermitage Museum Web Site (QBIC)
  • http//hermitagemuseum.org/
  • http//hermitagemuseum.org/fcgi-bin/db2www/qbicCol
    or.mac/qbic?selLangEnglish
  • Media Portal WebSEEk
  • http//www.ctr.columbia.edu/webseek/
  • Video Search Engine VideoQ
  • http//www.ctr.columbia.edu/videoq
  • Georgraphical Application
  • http//nayana.ece.ucsb.edu/M7TextureDemo/Demo/clie
    nt/M7TextureDemo.html
  • http//www-db.stanford.edu/IMAGE/

52
QBIC features
  • Color QBIC computes the average Munsell
    (Miyahara, et.al., 1988) coordinates of each
    object and image, plus a k element color
    histogram (k is typically 64 or 256) that gives
    the percentage of the pixels in each image in
    each of the k colors.
  • Texture QBIC's texture features are based on
    modified versions of the coarseness, contrast,
    and directionality features proposed in (H.
    Tamura, et.al., 1978). Coarseness measures the
    scale of the texture (pebbles vs. boulders),
    contrast describes the vividness of the pattern,
    and directionality describes whether or not the
    image has a favored direction or is isotropic
    (grass versus a smooth object).
  • Shape QBIC has used several different sets of
    shape features. One is based on a combination of
    area, circularity, eccentricity, major axis
    orientation and a set of algebraic moment
    invariants. A second is the turning angles or
    tangent vectors around the perimeter of an
    object, computed from smooth splines fit to the
    perimeter. The result is a list of 64 values of
    turning angle.

53
WebSeek
54
WebSeek (cont.)
55
VideoQ
56
VideoQ (cont.)
Write a Comment
User Comments (0)
About PowerShow.com