Title: MediaView Towards a Semantic Multimedia Database Model
1MediaView -- Towards a Semantic Multimedia
Database Model
- Qing Li
- Dept of Computer Science
- City University of Hong Kong
2Outline
- Motivation Introduction
- Modeling Constructs
- Logical Implementation
- Real-World Applications
- Conclusion
3State-of-the-art
- Multimedia Systems and Applications
- an explosive growth in recent years
- demand on managing multimedia using databases
- Database techniques for multimedia
- data modeling
- indexing
- query processing
- presentation synchronization
4Semantic Gap
- semantics-intensive multimedia systems
applications
non-semantic multimedia data models
Semantic Gap
require
model
raw data,primitive properties (size, format,
etc)
semantic meaning of the data
5Semantic modeling of multimedia -- Why hard?
- Context-dependency
- Semantics is not a static and intrinsic property
- The semantics of an object often depends on
- the application/user who manipulate the object
- the role that the object plays
- other objects in the same context
Example
Van Goghs paintings
flower
6Why hard? (cont.)
- Modality-independency
- Media objects of different modalities may suggest
the similar/related semantic meanings. - Example
Query
Results
Harry Potter has never been the star of a
Quidditch team, scoring points while riding a
broom far above the ground. He knows no spells,
has never helped to hatch a dragon, and has never
worn a cloak of invisibility.
image
video
text
7MediaView A Semantic Bridge
- An object-oriented view mechanism that bridges
the semantic gap between multimedia systems and
databases - Core concept media view (MV)
- a customized context for semantic interpretation
of media objects (text docs, images, video, etc) - collectively constitute the conceptual
infrastructure of an multimedia system
application
8Architecture
MediaView Mechanism
9Fundamentals of MediaView
- Basic concepts class vs. MV
- View operators basic functions of MV
- View algebra derivations of MV
- Comparison other dynamic data models
10Basic Concepts
- Definition 1 Set C as the set of base classes. A
base class Ci ? C has a unique class name, a type
description, and a set of objects associated with
it. The type of Ci is referred to as type(Ci),
which defines a set of properties as the common
interface of all the instances of Ci. The set of
properties are referred to as properties(Ci), and
each property in it can be a value of a simple
type, an instance of a certain class, or a
method. The set of objects associated with Ci is
defined as extent(Ci) o o?Ci.
11Basic Concepts
- Definition 2 A media view MVi is a virtual class
that has a unique view name, a type description,
and a set of objects associated with it. The type
of MVi is referred to as type(MVi), which defines
a set of properties properties (MVi) as the
common interface of all its instances.
Similarly, a property can be a value of a simple
type, an instance of a media view, or a method.
The set of objects associated with MVi is defined
as extent(MVi) o o?MVi.
12Basic Concepts
- So, a media view MVi can be represented as a
triple - MVi ltMi, Pi, Ri,gt
- Where
- Mi - a set of objects that are included into MVi
as its members. Each object o?Mi belongs to a
certain source class, and different members of
MVi may belong to different source classes. - Piv - a set of properties (attributes and
methods) applied on either MVi itself (Piv) or on
all the members (Pim). - Ri - a set of relationships, and each r?Ri is in
the form of ltoj, ok, tgt, which denotes a
relationship of type t between member oj and ok
in MVi Ri itself may exhibit a graph.
13Basic Concepts
- Definition 3 A base class Ci is defined as a
subclass of another base class Cj if and only if
the following two conditions hold (1)
properties(Cj) ? properties(Ci), and (2)
extent(Ci) ? extent(Cj). If Ci is the subclass of
Cj, we also say that there is an is-a
relationship from Ci to Cj. A base schema (BS) is
a directed acylic graph G(V, E), where V is a
finite set of vertices and E is a finite set of
edges as a binary relation defined on VV. Each
element in V corresponds to a base class Ci. Each
edge in the form of eltCi, Cjgt?E represents an
is-a relationship from Ci to Cj (or Ci is a
subclass of Cj).
14Basic Concepts
- Definition 4 A media view MVi is a subview of
another media view MVj (or there is an is-a
relationship from MVi to MVj) if and only if
properties(MVj) ? properties(MVi) and extent(MVi)
? extent(MVj). A view schema (VS) is a directed
acyclic graph GV, E, where a vertex in V
corresponds to a media view MVi, and an edge
eltMVi,MVjgt?E represents an is-a relationship
from MVi to MVj (or MVi is a subview of MVj).
15Basic Concepts
16Basic Concepts
- Semantics-based data reorganization via media
views
17Basic Concepts
- Definition 5 The semantic graph (SG) is an
undirected graph GV, E, where V is a finite
set of vertices and E is a finite set of edges.
Each element Vi?V corresponds to a multimedia
object Oi in the database. E is a ternary
relation defined on VVN. Each eltVi,Vj, ngt?E
represents a semantic link of degree n between
object Oi and Oj, where n is the number of media
views to which both objects belong. We define n
as the correlation factor between Oi and Oj.
18Basic Concepts
- Definition 6 The correlation matrix MMij is
an adjacency matrix of the semantic graph.
Specifically, each element Mij contains the
correlation factor between Oi and Oj, with all
the diagonal elements set to be zero.
19Basic Concepts
20View Operators
- A set of operators that take media views and view
instances as operands. - Our intension is not to come up with a complete
set of operators, but to focus on those that are
indispensable in supporting queries and
navigation over multimedia objects.
21View Operators
- type-level
- V-overlap
- syntaxltbooleangt v-overlap (ltmedia view1, media
view2 gt) - semantics true, if and only if (? o ?
O)(o?extent(ltmedia view1gt) and o?extent(ltmedia
view2gt)) - Cross
- syntaxltobjectgt cross (ltmedia view1, media
view2 gt) - semanticsltobjectgt o ? O o ?
extent(ltmedia view1gt) and o?extent(ltmedia
view2gt) - Sum
- syntaxltobjectgt sum (ltmedia view1,
meida-view2 gt) - semanticsltobjectgt o ? O o ?
extent(ltmedia view1gt) or o?extent(ltmedia view2gt) - Subtract
- syntaxltobjectgt subtract (ltmedia view1, media
view2gt) - semanticsltobjectgt o ? O o ? extent(ltmedia
view1gt) and o?extent(ltmedia view2gt)
22View Operators
- instance-level
- Class
- syntaxltbase classgt class(ltview instancegt)
- semanticsltview instancegt is a instance of ltbase
classgt - components
- syntaxltobjectgt components (ltview instancegt)
- semantics ltobjectgt o?O o is a component
(direct or indirect) of ltview instancegt - i-overlap
- syntaxltbooleangt i-overlap (ltview instnace1gt,
ltview instance2gt) - semantics true, if and only if (? o ? O) (o ?
components (ltview instance1gt) and o ?
components(ltview instance2gt))
23View Algebra
- Functions
- -- derivation of new MVs from existing MVs
- Heuristic Enumeration
- Blind enumeration
- Content-based enumeration
- Semantics-based enumeration
24View Algebra
- Definition 7. The n-level correlation matrix M(n)
is derived from correlation matrix M by the
following formula - where n is a positive integer and k (0ltklt1)
is a constant between 0 and 1. Each element
M(n)ij is defined as the n-level correlation
factor between objects Oi and Oj.
25View Algebra
- Algebra Operators
- select from src-MV where ltpredicategt
- project ltproperty-listgt from src-MV
- intersect (src-MV1, src-MV2)
- union (src-MV1, src-MV2)
- difference (src-MV1, src-MV2)
26Comparison (vs. class)
27Comparison (vs. traditional object view)
28Logical Implementation
- MediaView Construction
- MediaView Customization
- MediaView Evolution
29MediaViews Construction
- Work with CBIR systems to acquire the knowledge
from queries - Learn from previously performed queries
- A multi-system approach to support multi-modality
of media objects - Organize the semantics by following WordNet
30Why WordNet?
- Different queries may greatly vary with the
liberty of choosing query keywords - We need an approach to organize those knowledge
into a logic structure - A simple context a concept in WordNet
- Common media views corresponds to simple
contexts - We provide all common media views, based on which
users can build complex ones.
31Navigating the Multimedia Database
- Navigating via semantic relationships of WordNet
- Semantic Relationship Examples
- Synonymy (similar) pipe, tube
- Antonymy (opposite) fast, slow
- Hyponymy (subordinate) tree, plant
- Meronymy (part) chimney, house
- Troponomy (manner) march, walk
- Entailment drive, ride
32Navigating the Multimedia Database
33MediaViews Construction
34Multi-dimensional Semantic Space
- IS-A relationship in thesaurus
- For example, Season has a 4-dimension semantic
space spring, summer, autumn, winter
35Encoding with Probabilistic Tree
- A Probabilistic Tree specifies the probability of
one media object semantically matching a certain
concept in thesaurus.
36Encoding with Probabilistic Tree
- Procedure
- Step i Following the thesaurus, trace from the
target concept C1 to the root concept Root in
thesaurus. Assume the path is ltC1, C2 , Root
Cngt. Start from CCCn and initially set P1. - Step ii Suppose CCCi, and the next concept Ci-1
is one of the k sub-concepts of Ci. If CC is
encoded in the Probabilistic Tree of this media
object, then let - If not, we let
- Step iii If CC has not reached C1, repeat Step
ii. Or, P is the probability of the media object
matching concept C1.
37Evolution through Feedback
- A progressive approach
- MediaView is accumulated along with the processes
of user interactions - Two phases of feedback
- System-feedback
- User-feedback
38Evolution through Feedback
39Evolution through Feedback
- Procedure
- Record each feedback performed by users.
- For each CBIR system i involved, calculate its
accuracy rate of retrieval. That is, simply
divide the total number of retrieved results by
the number of correct results according to user
feedback. - Reset the value of to its accuracy rate
respectively. - Wait for next session of user feedback.
40Fuzzy Logic based Evolution Approach
- Due to the uncertainty of the semantics, can not
make an absolute assertion that a media object is
relevant or irrelevant to a context - A media object in a database may be retrieved as
a relevant result to a context several times
the more times a media object is retrieved, the
more confidence it has to be considered as
relevant to the context.
41Fuzzy Logic based Evolution Approach
- For a media object e, a context c,
- - the accumulation of historial
feedback information (from both system and
users) - - the adjustment of after each feedback
session
42Inverse Propagation of Feedback
- The drawback of up-down fashion of calculating
the probability - E.g. Whether a media object matches season can
not leverage from that the media object was a
match of spring - Solution propagate the confidence value of a
media object being relevant to a concept along
the hierarchical structure from bottom-up
43Inverse Propagation of Feedback
- Procedure
- Wait for a feedback session.
- For each positive feedback, namely, stating a
concept C is relevant to a media object.
Following the thesaurus, trace from C to the root
concept Root in thesaurus. Assume the path is
ltC, C1, C2 , Root Cngt. - Append Ci as also positive feedback to that media
object, where i1 to n.
44MediaView Customization
- Two level MediaView Framework
45MediaView Customization
- Dynamically construct complex-context-based media
views based on simple ones - An example complex context the Grand Hall in
City University - Several user-level operators are devised to
support more complex/advanced contexts, besides
the basic operators
46User-level Operators
- INHERIT_MV(N mv-name, NS set-of-mv-refs, VP
set-of-property-ref, MP set-of-property-ref)
mv-ref - UNION_MV(N mv-name, NS set-of-mv-refs) mv-ref
- INTERSECTION_MV(N mv-name, NS set-of-mv-refs)
mv-ref - DIFFERENCE_MV(N1 mv-ref, N2 mv-ref) mv-ref
47Build a MediaView in Run-time
- Example find out info about "Van Gogh"
- Who is "Van Gogh"?
- What is his work?
- Know more about his whole life.
- Know more about his country.
- See his famous painting "sunflower"
48Build a MediaView in Run-time
- Who is Van Gogh?
- INHERIT_MV(V. Gogh, ltpaintergt,nameVan Gogh
,) - What is his work?
- INTERSECTION_MV(work, ltpaintinggt, vg)
- Know more about his whole life.
- INTERSECTION_MV(life, ltbiographygt, vg)
- Know more about his country.
- INTERSECTION_MV(country, ltcountrygt, vg)
- See his famous painting sunflower
- Set sunflower INTERSECTION_MV(sunflower,
ltsunflowergt, ltpaintinggt)Set vg_sunflower
INTERSECTION_MV(vg_sunflower, vg_work,
sunflower)
49Authoring Scenario
- Creates a new media view named after the subject
- All multimedia materials used in the document
would be put into this MediaView for further
reference. - To collect the most relevant materials for
authoring, the user performs the MediaView
building process. - Import suitable media objects by browsing media
views - Reference the manner and style of authoring, to
find other media views with similar topics. - Drag Drop
- learning-from-references
50Interface of Our Authoring System
51System Features
- A Dynamic Environment
- Helps a user select materials from the database
to incorporate into the document - Query other similar media views for referencing
the manner and/or style of authoring
52Real-World Applications
- A Multimedia Recipe Database
- Modeling basis
- Personalized (context-aware) manipulation
- Cross-media indexing and retrieval system
- Novel way of annotating and retrieving media
objects - Lead to new indexing strategies
53A Personalized Recipe Database System
- People can not live without foods
- Existing recipe websites provide huge amounts of
recipes throughout the world - Fail to give support on analyzing and comparing
recipes (What are important cooking principles
skills what makes two dishes taste so
different, etc.) - Unable to help users find similar recipes in a
comprehensive manner (only keyword-based search
on recipe names) - Fail to adapt recipes to meet the real-world
situation (e.g. due to lack of ingredients or
user preference)
54A Personalized Recipe Database System -- Our
Contributions
- Propose a recipe model which encompasses static
attributes as well as dynamic behaviours (e.g.
cooking procedures and constraints) - Present a novel perspective of evaluating the
quality of a recipe by constructing and
analysing its cooking graph (capture both action
flows and data/ingredient flows) - Provide a promising way to address the problem of
recipe adaptation heuristically (with flexible
and feasible solutions)
55Recipe on the Web
56Sample Recipe -- The Cooking Procedure of Triple
Cheese Pasta Primavera
57Sample Recipe
Parsing the Cooking Procedure of Triple Cheese
Pasta Primavera
58Recipe Model
- A recipe R is modeled and represented by a tuple
of three elements - R ltM, RP, SPgt
- where
- (a) MMi i 1.. m a set of ingredients. An
ingredient Mi is either a basic ingredient or a
set of ingredients - Mi ltMID, MPgt, MIDunique identity, MPmember
level properties (and functions) such as the
name, quantity and image - An ingredient Mi belongs to one of the three
classes Main, Minor and Seasoning - (b) RP is a set of recipe-level properties (and
functions) applied on R itself, such as the main
cooking style, region, nutrition and images of
the dish of the recipe
59Recipe Model
- (c) SP (V, E, Cons, Ingr) is a labeled directed
Cooking Graph, - Vvi i 1..n is a set of nodes.
- via cooking action
- cooking action constraints Cons(vi)associated
constraint conditions that should be satisfied
when the action of vi takes place. e.g.
conditions on temperature and duration etc. - E is a set of directed edges on Vtemporal
execution flow of the cooking actions named
action flows. - An edge ltvi ,vjgt vj should take place after vi.
- cooking transition constraints Cons(vi , vj)
the conditions that should be satisfied for the
flow to take place. - Ingr(vi) ingredients that should be added into
vi - O(vi) the output ingredients of vi
- These inputs and outputs for the nodes are
called ingredient flows.
60Cooking Graph
The Cooking Graph of Triple Cheese Pasta
Primavera
61Basic Properties
- Definition 1. (Reachability) A cooking graph is
defined as reachable if each of its nodes is
reachable a node is reachable if it is on a
directed path from a starting node to the end
node. - Definition 2. (Consistency) A cooking graph is
defined to be consistent if the conditions for
each node/edge is consistent (i.e. there exists
assignment to variables to make the conditions
true).
62Constraints and Rules
- Definition 3. (Constraint) A constraint is a
predicate followed by one or more terms, enclosed
in parentheses and separated by commas a term is
either a constant, variable or function
expression. - Constraints specify all kinds of conditions or
restrictions in the recipe model - Three categories intra-recipe constraints,
inter-recipe constraints and outer-recipe
constraints. - Incompatible(Spinach, Tofu) says spinach and tofu
are incompatible and should not be cooked
together.
63Constraints and Rules
- Definition 4. (Rule) A rule is a logical
implication of the form If ? Then ? (or, ),
where ? and ? are sentences. - Validate the correctness of a recipe through
reasoning and recognition process. - Handle complex situations such as to make
necessary adjustment or compensation once an
improper cooking action occurs. - Describe cooking skills that have been widely
accepted and commonly used. - Over_Put(salt) ? Add(vinegarwater) says that if
too much salt has been put into a dish, then
neutralize the salty taste by adding either
vinegar or water.
64Recipe Cooking Graph Mining
- Pattern Some subgraphs occur in one or more
cooking graphs and they have certain influence on
the cooking effects (e.g. taste, appearance). - Find patterns for a set of recipes
- Whats usually done and whats usually put in the
cooking procedure (one action, a series of
actions, an ingredients, a set of ingredients,
actions combined with ingredients) - Cooking graphs of different recipes may share the
same pattern - Distinct subgraphs that determine the cooking
effect (e.g. taste) should be identified
65Sample Patterns
66Sample Cooking Style
Generally describe how a recipe is cooked in a
Pattern Combination or in Graph Abstraction.
67User Adaptation
- Usually a user wants to make a dish that has the
same cooking result (e.g. taste, appearance) as
the recipe exhibits. - Unfortunately, the user is very likely to get a
slightly or even totally different dish as he/she
modifies the cooking procedure. - Objective reasonse.g. lack of some ingredients,
Subjective reasonse.g. wrong cooking actions by
carelessness or personal preference.
68User Adaptation
- When the user makes an adaptation, the system
will check if the modified cooking graph is
feasible. - If not, a set of feasible templates are provided.
- The remaining subgraph is replaced by the user
selected one. - Property check (Reachability, Consistency)
Template Selection and Instantiation
69Prototype SystemGlobal Systemvs. User Space
70Prototype System Recipe Browser
71Prototype System Cooking Pattern Miner
72Prototype System Similarity Calculator
73Summary
- Proposed a data model to represent a recipe
- Advocated cooking graph mining to find frequent
used patterns (actions, ingredients) - Attempt to solve recipe adaptation problem by
using patterns as templates - Developed a prototype systemRecipeView
- Further work include
- discover patterns of cooking graphs
- Refine and strengthen the algorithm of recipe
adaptation
74Application Scenario
75Application Scenario
- Advantages (vs. traditional retrieval techniques)
- Easy-to-compose query
- By browsing (to get seed objects of arbitrary
modalities) - By subject (simply keyword) at various
abstraction level - Multi-modal results
- a collection of images, text docs, videos, etc
- vs. a single type of media
- Semantically relevant results
- natural outcome of exploring previously learnt
knowledge - vs. a set of specifically chosen features
76Advantages (contd)
- Hill-climbing Effect retrieval performance
grows as more user interactions are conducted
77Conclusion
- MediaView a semantic multimedia database
modeling mechanism - to bridge the semantic gap between conventional
database and semantics-intensive multimedia
applications - A set of user-level operators to accommodate the
specialization/generalization relationships among
the media views
78Conclusion
- MediaView promises more effective access to the
content of media databases - Users could get the right stuff and tailor it to
the context of their application easily. - Providing the most relevant content from
pre-learnt semantic links between media and
context - ? high performance database browsing and
multimedia authoring tools can enable more
comprehensive applications to the user
79Conclusion
- Users could customize specific media view
according to their tasks, by using user-level
operators - The effectiveness of using MediaView in the
experimental problem domains - Multimedia recipe database
- Cross-media indexing and retrieval
80Further Issues
- The development and transition of MediaView to a
fully-fledged multimedia database system
supporting declarative queries - Intensive and extensive performance studies
- Advanced semantic relations (eg. temporal and
spatial ones) can also be incorporated in
combining individual media views
81- Thank you!
- Q A
- Email Qing.Li_at_cityu.edu.hk