Title: The Multimedia Semantic Web
1The Multimedia Semantic Web
- Bill Grosky
- Multimedia Information Systems Laboratory
- University of Michigan-Dearborn
- Dearborn, Michigan
2Contents
- Introduction
- CBR Where are we?
- Multimedia annotation
- Context-rich environments
- Semantic web
- Our work
- Anglograms
- Finding latent semantics
- Using text for improved image search
- Using images for improved text search
- Web page structure
- A cross-modal theory of linked document
semantics
3CBR Where are We?
- Development of feature-based techniques for
content-based retrieval is a mature area, at
least for images
- CBR researchers should now concentrate on
extracting semantics from multimedia documents so
that retrievals using concept-based queries can
be tailored to individual users - The semantic gap
- (Semi)-automated multimedia annotation
4Multimedia Annotation
- Multimedia annotations should be semantically
rich
- Multiple semantics
- A social theory based on how multimedia
information is used
- This can be discovered by placing multimedia
information in a natural, context-rich environment
5Context-Rich Environments
- Structural context Authors contribution
- Documents author places semantically similar
pieces of information close to each other
- User can cluster together semantically similar
pieces of information
- Dynamic context Users contribution
- Short browsing sub-paths are semantically coherent
6Context-Rich Environments
- The WEB is a perfect example of a context-rich
environment
- Develop multimedia annotations through
cross-modal techniques
- Audio
- Images
- Text
- Video
7Semantic Web
- This program overlaps another very important
current research topic, the semantic web
- Web page annotations are the backbone of this
research effort
- We have something very important to offer to this
area
- Multimedia documents
- Deriving multiple semantics for a single
document
- Combining our efforts will enrich both
communities
8Semantic Web
- The Semantic Web is a new initiative to
transform the web into a structure that supports
more intelligent querying and browsing, both by
machines and by humans. This transformation is to
be supported through the generation and use of
metadata constructed via web annotation tools
using user-defined ontologies that can be related
to one another. - Somewhere on the web
9Semantic Web
End User
Ontology Articulation Toolkit
Agents
Ontology Construction Tool
Ontologies
Community Portal
?x C ? D
Inference Engine
Web-Page Annotation Tool
Annotated Web Pages
Metadata Repository
Based on www.semanticweb.org
10Semantic Web
- Plan a vacation within the next month
- Bill instructed his semantic web agent through
his handheld browser.
- An agent retrieved Bills vacation profile from
his travel agent, retrieved Bills availability
from his calendar, checked availability of
airlines, hotels and restaurants, and made all
the necessary arrangements.
11Semantic Web
- Multimedia semantic web
- Plan a vacation close to where
- is being exhibited.
12Contents
- Introduction
- CBR Where are we?
- Multimedia annotation
- Context-rich environments
- Semantic web
- Our work
- Anglograms
- Finding latent semantics
- Using text for improved image search
- Using images for improved text search
- Web page structure
- A cross-modal theory of linked document
semantics
13Anglograms
- Image object
- Entire image
- Some meaningful portion of an image
- semcon
- Point-based features
- corner points
- color histograms
14Anglograms
Point feature map for shape
15Anglograms
Point feature map for color
16Anglograms
Voronoi diagram of n 18 sites
17Anglograms
18Anglograms
- Delaunay triangulation of a set of n points
- O(n log n) algorithm
- Invariance of Delaunay triangles of a set of
points to
- translation
- rotation
- scaling
19Anglograms
- Spatial layout of point set
- Anglogram
- Computed by discretizing and counting the angles
of the Delaunay triangles
- Which angles are counted?
- O(max(n bins)) algorithm
- What is bin size?
20(No Transcript)
21Anglograms
- Computation of color anglogram of an image
- Divide image evenly into a number of MN
non-overlapping blocks
- Each individual block is abstracted as a unique
feature point labeled with its spatial location
and dominant colors
22Anglograms
- Computation of color anglogram of an image
- Point feature map
- Normalized feature points, after adjusting any
two neighboring feature points to a fixed
distance
- Construct Delaunay triangulation for each set of
feature points labeled with identical color
23Anglograms
- Computation of color anglogram of an image
- Compute anglogram based on each Delaunay
triangulation
- Color anglogram for image
- Concatenating all the anglograms together
24Anglograms
Pyramid image
25Anglograms
26Anglograms
Hue component
27Anglograms
Saturation component
28Anglograms
Point feature map
29Anglograms
Feature points of hue 2
30Anglograms
Delaunay triangulation of hue 2
31Anglograms
Delaunay triangulation of saturation 5
32Anglograms
Anglogram of saturation 5
33Contents
- Introduction
- CBR Where are we?
- Multimedia annotation
- Context-rich environments
- Semantic web
- Our work
- Anglograms
- Finding latent semantics
- Using text for improved image search
- Using images for improved text search
- Web page structure
- A cross-modal theory of linked document
semantics
34Finding Latent Semantics
- We want to transform low-level features to a
higher level of meaning
- Used for dimension reduction in QBIC
- Searching in high-dimensional spaces
- More importantly, it creates clusters of
co-occurring features
- So-called concepts
35Finding Latent Semantics
- Latent Semantic Analysis (LSA) was introduced to
overcome a fundamental problem in textual
information retrieval
- Users want to retrieve on the basis of conceptual
content
- Individual words provide unreliable evidence
about conceptual meanings
- Synonymy
- Many ways to refer to the same object
- Polysemy
- Most words have more than one distinct meaning
36Finding Latent Semantics
- Searching for documents concerning automobiles
- Tend to use the key-word automobile
- A statistical analysis determines that the
key-words automobile and car tend to co-occur
- LSA will retrieve documents in which the key-word
car appears, but not the key-word automobile
37Finding Latent Semantics
- Term-document association
- It is assumed that there exists some underlying
latent semantic structure in the data that is
partially obscured by the randomness of term
choice - By semantic structure we mean the correlation
structure in which individual terms appear in
documents
- Semantic implies only the fact that terms in a
document may be taken as referents to the
document itself or to its topic
- Statistical techniques are used to estimate this
latent semantic structure, and to get rid of
obscuring noise
38Finding Latent Semantics
- Singular-value decomposition (SVD)
- Take a large matrix of term-document association
- Construct a semantic space wherein terms and
documents that are closely associated are placed
near to each other
- SVD allows the arrangement of space to reflect
the major associative patterns and ignore
smaller, less important influence
- As a result, terms that did not actually appear
in a document may still end up close to the
document, if that is consistent with the major
patterns of association - Position in the space serves as the semantic
indexing
- Retrieval proceeds by using the terms in a query
to identify a point in the semantic space, and
documents in its neighborhood are returned as
relevant results
39Finding Latent Semantics
- Term-document matrix
- d documents
- t terms
- Represented by a t ? d term-document matrix A
- Each document is represented by a column
- document vector
- Each term is represented by a row
- term vector
40Finding Latent Semantics
41Finding Latent Semantics
42Finding Latent Semantics
- SVD is a dimension reduction technique
- Reduced-rank approximation to both column space
and row space
- Find a rank-k approximation to matrix A with
minimal change to that matrix for a given value
of k
- This decomposition exists for any matrix A
43Finding Latent Semantics
- SVD of a term-document matrix A
- A U ? VT
- A is t ? d
- U is a t ? r orthogonal matrix, where r is
rank(A)
- The columns of U are a basis for the column space
of A
- U is the matrix of eigenvectors of the matrix
AAT
- ? is an r ? r diagonal matrix having singular
values ?1 ? ?2 ? ? ?r of A in order along its
diagonal
- ?2 is the matrix of eigenvalues of AAT or ATA
- VT is a r ? d orthogonal matrix
- The rows of VT are a basis for the row space of
A
- V is the matrix of eigenvectors of the matrix
ATA
44Finding Latent Semantics
t ? d
t ? r
r ? r
r ? d
45Finding Latent Semantics
- A special rank-k approximation, Ak
- Ak Uk ?k VkT
- Uk
- First k columns of U
- ?k
- First k diagonal values of ?
- VkT
- First k rows of VT
46Finding Latent Semantics
47Finding Latent Semantics
48Finding Latent Semantics
Query
Score
49Finding Latent Semantics
Query
Score
50Contents
- Introduction
- CBR Where are we?
- Multimedia annotation
- Context-rich environments
- Semantic web
- Our work
- Anglograms
- Finding latent semantics
- Using text for improved image search
- Using images for improved text search
- Web page structure
- A cross-modal theory of linked document
semantics
51Using Text for Improved Image Search
- 10 sets of 5 similar images
52Using Text for Improved Image Search
- Color anglogram
- Each image is divided into 64 non-overlapping
blocks
- Extract average hue and average saturation values
of each block
- Hue and saturation each quantized into 10 values
- Generate Delaunay triangles for each hue value
and each saturation value
- Count two largest angles and quantize them into
36 bins, each of 5
- Feature vector has 720 elements
53Using Text for Improved Image Search
- Annotations
- Extra 15 elements
- Category positions
- sky, sun, land, water, boat, grass, horse, rhino,
bird, human, pyramid, column, tower, sphinx,
snow
- Each image annotated with appropriate keywords
and the area coverage of each of these keywords
- e.g., sky (0.55), sun (0.15), water (0.30)
54Using Text for Improved Image Search
55Using Text for Improved Image Search
56Contents
- Introduction
- CBR Where are we?
- Multimedia annotation
- Context-rich environments
- Semantic web
- Our work
- Anglograms
- Finding latent semantics
- Using text for improved image search
- Using images for improved text search
- Web page structure
- A cross-modal theory of linked document
semantics
57Using Images for Improved Text Search
- Using documents collected from news Web sites
- News headlines are often used as URL anchors and
document titles
- Topic can be represented easily and clearly by a
group of keywords in the headline
- News web sites often have extensive coverage of
the same topic during certain period of time
- News documents often include multimedia
components which are closely related to the topic
58Using Images for Improved Text Search
- Discover the semantic correlation between
keywords and image in the same document
- A collection of 20 documents from cnn.com
- 4 semantic categories of 5 documents each
- 43 keywords
- Select 1 image from each document
- Color anglogram
59Using Images for Improved Text Search
60Using Images for Improved Text Search
61Using Images for Improved Text Search
- Integrated feature vector F f1, f2,, f143T
- Textual feature vector K k1, k2, , k43T
- Image feature vector I i1, i2, , i100T
- Feature document matrix A F1, F2, , F20
- A USVT
- U is 143 ? 143, S is 143 ? 20, and V is 20 ? 20
- k 12
- Ak UkSkVkT
- Uk is 143 ? 12, Sk is 12 ? 12, and Vk is 20 ?
12
62Using Images for Improved Text Search
- Each image is normalized to 192 ? 128, and then
divided into 64 non-overlapping blocks
- Extract average hue and saturation values of each
block
- Hue and saturation each quantized into 10 values
- Generate Delaunay triangles for each hue value
and each saturation value
63Using Images for Improved Text Search
- Count two largest angles and quantize them into
36 bins, each of 5
- Image feature vector has 720 elements
- Feature document matrix A is 763 ? 20
- SVD
- k 12
64Using Images for Improved Text Search
Keywords only
Keywords using LSA
1 improvement
3 improvement
Image (global color histogram)
annotated keywords using LSA
21 improvement
Image (anglogram) annotated keywords using LSA
65Contents
- Introduction
- CBR Where are we?
- Multimedia annotation
- Context-rich environments
- Semantic web
- Our work
- Anglograms
- Finding latent semantics
- Using text for improved image search
- Using images for improved text search
- Web page structure
- A cross-modal theory of linked document
semantics
66Web Page Structure
- Genre detection
- We do the following
- Display web page in the program
- Get tag hierarchy with area co-ordinates
- Normalize the web page to size 512 512
- Divide page in 1616 blocks
- Calculate area covered by each tag in each block
considering the level of the tag in tag
hierarchy
- For each feature tag get the center coordinates
of the blocks where it is covering maximum area
as compared with other tags on the same level
67Web Page Structure
68Web Page Structure
69Web Page Structure
- Histogram
- 36 bins with two large angles
- Tags independent of level
- Try approach where tag on lower level overrides
upper-level tag
70Web Page Structure
- Set of tags defined -
- Initially, a large set of feature tags (52) is
defined to ensure a powerful set of independent
features for the discrimination of web pages
- A second set of tags (3) is defined based on
histograms created for initial set of tags so
that these tags will better differentiate web
pages
71Web Page Structure
- Experiment 1
- Categories defined are
- Detroit News
- Times of India
- Tribune India
- Esakal
- Amazon.com
- Buy.com
72Web Page Structure
- Cluster category based on closest page
73Web Page Structure
- Experiment 2
- Categories defined are
- News paper environment
- Detroit News
- Times of India
- Tribune India
- Esakal
- e - Commerce environment
- Amazon.com
- Buy.com
74Web Page Structure
75Contents
- Introduction
- CBR Where are we?
- Multimedia annotation
- Context-rich environments
- Semantic web
- Our work
- Anglograms
- Finding latent semantics
- Using text for improved image search
- Using images for improved text search
- Web page structure
- A cross-modal theory of linked document
semantics
76A Cross-Modal Theory of Linked Document Semantics
- Environment
- Suppose one has a linked set of multimedia
documents
- Web
- Content-based hypermedia
- This provides a rich context for individual
chunks of information
- The structure of individual multimedia documents
- The link structure
77A Cross-Modal Theory of Linked Document Semantics
- Goal
- Derive document semantics based on user browsing
behavior
- The same document has multiple semantics
- Different people see different meanings in the
same document
- Over short browsing paths, an individual users
wants and needs are uniform
- The pages visited over these short paths exhibit
semantics in congruence with these wants and
needs
78A Cross-Modal Theory of Linked Document Semantics
- Questions
- How can the semantics of a web page be derived
given a set of user browsing paths that end at
that page?
- How can we characterize the semantics of a user
browsing path?
- How can web page semantics help us in navigating
the web more efficiently?
- How can our approach actually be implemented in
the real web world?
79A Cross-Modal Theory of Linked Document Semantics
- Our approach
- We use actual browsing paths to find the latent
semantics of web pages
- Textual features
- Image features
- Structural features
- We hope to find general concepts comprising
various textual and image features which
frequently co-occur
80A Cross-Modal Theory of Linked Document Semantics
- We believe that a users browsing path exhibits
semantic coherence
- While the users entire path exhibits multiple
semantics, especially pages far from each other
on the path, neighboring pages, especially the
portions close to the links taken, are
semantically close to each other
81A Cross-Modal Theory of Linked Document Semantics
- We would like to characterize the contiguous
sub-paths of a users browsing path that exhibit
similar semantics and detect the semantic break
points along the path where the semantics
appreciably change - Collect these sub-paths into a multiset
82A Cross-Modal Theory of Linked Document Semantics
- We categorize the semantics of each web page
based on a history of the semantically-coherent
browsing paths of all users which end at that
page - A browsing path will be represented by a
high-dimensional vector
- The various positions of the vector correspond to
the presence of
- textual keywords
- image features (visual keywords)
- structural features (structural keywords)
83A Cross-Modal Theory of Linked Document Semantics
- From the complete set of web pages under
consideration, we extract a set of textual,
visual, and structural keywords
- For each multiset, M, of sub-paths that we are to
analyze, we form three matrices
- term-path matrix
- image-path matrix
- structure-path matrix
84A Cross-Modal Theory of Linked Document Semantics
- The (i,j)th element of these matrices are
determined by
- Strength of the presence of ith keyword along the
jth browsing path
- Determined by
- How many times this term occurs on the pages
along the path
- How much time the user spends examining these
pages
- How close each occurrence of the ith keyword is
to both the outgoing and incoming anchor
positions
- How many times this browsing path occurs in M
85A Cross-Modal Theory of Linked Document Semantics
- These matrices may be concatenated together in
various ways to produce an overall keyword-path
matrix
- Perform latent-semantic analysis to get concepts
- A page is then represented by a set of concept
classes
86Conclusions
- Researchers in CBR should now be concentrating on
extracting semantics from multimedia documents
- The web is a perfect testbed for studying
semi-(automated) techniques for multimedia
annotation due to contextual richness
- CBR Semantic Web The Multimedia Semantic Web
- Get Involved!!!