Title: Semantic Annotation for Web Content Adaptation
1Semantic Annotation forWeb Content Adaptation
- Unit 14 of Spinning the Semantic Web
2Introduction
- Necessary for Web contents to be adapted for
transparent access from a variety of client
agents (cellular phones, PDA) - A large, full-color image may be reduced with
regard to size and color depth, removing
unimportant portions of the content, when
accessed by certain devices - Better presentation and faster delivery to client
devices - Transcoding transformation of information from
one form to another - Web content transcoding
- Crucial for universal Web access under varying
conditions that may depend on client
capabilities, network connectivity, or user
preferences
3Composite Capabilities/ Preferences Profiles
(CC/PP)
4Introduction
- CC/PP stands for Composite Capabilities/Preference
s Profile, and is a system for expressing device
capabilities and user preferences. - The goal of the CC/PP framework is to specify how
client devices express their capabilities and
preferences (the user agent profile) to the
server that originates content (the origin
server). The origin server uses the "user agent
profile" to produce and deliver content
appropriate to the client device. In addition to
computer-based client devices, particular
attention is being paid to other kinds of devices
such as mobile phones.
5Devices
- The web is accessed by various devices
- PC, NoteBook, PDA, Mobile Phone
- Each one having different capabilities
- Hardwarescreen size/color, audio, bandwidth
- Softwarempeg, mp3, 3GPP, AMR
6CC/PP RDF
- The CC/PP framework starts with RDF and then
overlays a CC/PP-defined set of semantics that
describe profiles. - CC/PP, RDF based profiler, is a collection of
information of capabilities of hardware platform
and system software, and preferences of the user.
7Advantages of CC/PP
- By only sending required content, no time or
bandwidth is wasted sending unwanted content.
This can also lead to faster page loading times. - A server can provide information to a more
diverse range on browsers. This can not only be
beneficial in economic terms, but also in terms
of site accessibility. - You give the users what they want, not what you
think they want. - So many
8Deployment(Client Server Proxies)
9Deployment (Server Proxy only)
10Deployment (Client Proxy only)
11Deployment (Ideal Approach)
12CC/PP Query
13Content adaptation
14Two ways to use CC/PP profiles
- Selection
- If the web server has a set of pre-written web
pages, suitable for a number of different
devices, then the profile can be used to decide
which of these pre-written pages is most suitable
for the web browser. - Transformation
- Web page content can be kept in a neutral format
(e.g. XML). This can then be transformed into an
appropriate format, using the profile to decide
what that format is.
15CC/PP Implementations
- DICE
- Hewlett Packard
- DELI
- Intel
- Inria
- Keio University - Portal
- UMBC
- JIGSAW
- X-Smiles Browser
- So many
16Demonstrations
- An example of RDF file and graph
- A Demo Page presenting the functionality of the
CC/PP protocol
17Reference
- http//www.w3c.org
- http//www.w3.org
- http//www.webstandards.org
- http//www.ccpp.org/
- http//dice.ccpp.info/
- http//www.tml.hut.fi/Opinnot/Tik-111.590/2000/Pap
ers/Rdf.html - http//castrato.ics.forth.gr/qh/
- http//www.csse.monash.edu.au/projects/MobileCompo
nents/projects/pda_doc_layout/seminar-html/
18External Annotation Framework
19Annotation Schemes
- Inline annotation embed annotations in a Web
document - Created as extra attributes of document elements
- HTML browsers ignore unknown attributes in a HTML
document - Ease of annotation maintenance, eliminating the
bookkeeping task annotations with their target
documents - Require annotators to have document ownership
- External annotation separate annotation from the
original document - Raise no issues related to document ownership
- Facilitate the sharing and reuse of annotations
across documents - Avoid the mixing of contents and metadata
20Applications of Web Content Annotation
- Discovery
- Accurate searches of Web resources
- Qualification
- Descriptions of users preferences regarding
privacy - Adaptation the focus of this unit
21Overview of An Annotation-Based Transcoding
22External Annotation Files
- Contain metadata that address a part of a
document to be annotated - XPath and XPointer are used to associate
annotated portions of a document with annotating
descriptions - A reference may point to a single element or a
range of elements - If a target element has an ID attribute, the
attribute can be used for direct addressing with
the need for a long path expression - Use RDF as the fundamental syntax of annotation
files - User preferences and device capability Composite
Capability/Preference Profiles (CC/PP) - Document profiles (http//www.w3.org/TR/xhtml-prof
-req/)
23Framework of External Annotation
24Association
- How to select an annotation file for a Web
document - Implicitly ? by means of a structural analysis of
the subject document - Explicitly ? by means of ltlinkgt tag
- An annotation file can be associated with a
single document file, but the relation is not
limited to one-to-one - Many annotation files for one Web document
- One annotation file for multiple Web documents
- Useful when it is necessary to annotate common
parts of Web documents, such as page headers,
company logo images, and sidebar menus
25Annotation-Based Transcoding System
26Overview
- Content can be adapted on a content server, a
proxy, or a client terminal - An adaptation engine should not be forced to
reside in any particular location - Use a proxy-based approach for content adaptation
27Transcoding Architecture
- Intermediary
- Computational entities that reside along the Web
transaction path - Facilitate an approach to making ordinary
information streams into smart streams that
enhance the quality of communication - An intermediary processor or a transcoding proxy
can operate on a document to be delivered and
transform the contents with reference to
associated annotation files
28Authoring-Time Transcoding
- Requirement for authoring-time transcoding
- WYSIWYG editor
- Let the annotator to navigate from an existing
annotation to a portion of an annotated document
designated by XPath / Xpointer - Verify the results of content adaptation through
a previewer - Authoring-time transcoding is crucial when
annotations are employed for content adaptation,
rather than discovery or qualification of
contents - Content adaptation often changes the structure of
original documents as the results of transcoding
29Authoring-Time Transcoding
30WYSIWYG Annotation Tool
31HTML Page Splitting for Small-Screen Devices
32Annotation Vocabulary
- An annotation vocabulary for HTML page splitting
needs to be specified to constrain the
possibilities for decomposition, combination, and
partial replacement of contents - Annotation of Web Content for Transcoding
- Alternatives
- Provide alternative representations of a document
or any set of its elements - Color image ? grayscale image
- A transcoding proxy selects the alternative that
best suits the capabilities of the requested
client device - Elements in the annotated document can then be
altered either by replacement or by on-demand
conversion
33Annotation Vocabulary (Cont.)
- Splitting Hints
- An HTML file that can be shown as a single page
on a normal desktop PC may be divided into
multiple pages on clients with smaller display
screens - pcdGroup specifies a set of elements to be
considered as a logical unit and provides hints
for determining appropriate page break points - Selection Criteria
- Help a transcoding module select, from
alternatives, the one that best suits the client
device - pcd role ? value attribute (proper content, side
menu, decoration) - pcdimportance ? priority (low important content
may be ignored or displayed in a smaller font)
34Annotation Descriptions
35Adaptation Engine
- Run on an intermediary server called WBI
- Flow chart
- Upon receipt of the request from a client
browser, an original page is retrieved for the
first time from a content server. - The editor component of the plugin tries to find
the locations of annotation files - If it is specified in a link element in an HTML
header section, retrieve the designated
annotation file. - Lookup in a table for the mapping between an URL
of the original page and that of an annotation.
If it is found, retrieve the designated
annotation file. - Otherwise, the original page is returned as it is
and the session is terminated.
36Adaptation Engine (Cont.)
- Flow Chart (Cont.)
- The generator component of the plugin generates a
current page to be returned. - Taking account of client capabilities included in
an HTML request header, the generator extracts a
portion of a document object tree and returns a
sub-tree to the client
37Adaptation Engine System Flow
38Application to Real-Life HTML Pages
- The Web page used as an example is a news page
from a corporate Web site - The news page consists of three tables stacked
from top to bottom. - The top and middle tables correspond respectively
to a header menu and a search form. - The bottom table is used for layouting.
39Layout of A Real-Life News Page
40Annotations for Splitting the News Page
41Annotation for fragmentation of an actual news
page
42Screen copy of a small display preview on an
authoring tool
43Comparison of display contents on a small-screen
device
44Splitting Result
- The page splitting not only reduces the content
to be delivered, but also places the primary
content near the top of the fragmented page that
is provides with navigational features - Placing navigational features (menu bars etc.)
near the top of pages - Placing key information at the top of pages
- Reducing the amount of information on the page
- page fragmentation based on semantic annotation
will be more appropriate than page transformation
done by solely syntactic information (removing
white spaces, shrinking or removing images) - Semantic rearrangement is one of the critical
limitations of the syntactic transformation
approach. - The navigational features achieved by this
semantic annotation are noteworthy from the
perspective of Web content accessibility.
45Issues
- Consistency between an Original Document and Its
Annotation - Necessary to provide a way of keeping them
synchronized - Extensibility
- Custom-tailored transcoding module that runs
without any external meta-information. - Using a general-purpose transformation engine,
such as XSLT, which employs externally provided
transformation rules - Task-specific ? semantics
- Roles such as header, auxiliary, and layouter
supplement semantics that cannot be fully
prescribed in the definitions of Web document
46Comparison of transcoding approaches in terms of
extensibility