Transcoding Web Content to VoiceXML - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Transcoding Web Content to VoiceXML

Description:

New Technologies: access anywhere, anytime, any device. ... Nagao, Y. Shirai, and K. Squire, 'Semantic annotation and transcoding: making ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 21

Provided by: hantho

Category:

more less

Transcript and Presenter's Notes

Title: Transcoding Web Content to VoiceXML

1
Transcoding Web Content to VoiceXML

Speech Technology and Research Group
Department of Electrical Engineering
University of Cape Town
Mduduzi Nxumalo

2
Outline

Advances Communication Technology.
Expensive to achieve.
Solutions Modal-based design and Transcoding
Aim of the project
Understanding VoiceXML and Semantic Web.
Discuss proposed solution.

3
World of Communication

New Technologies access anywhere, anytime, any
device.
Choose suitable device Desktop PCs, Mobile
phones, Telephones
Different modalities graphical way e.g.
websites
Interaction goes beyond keyboard and mouse.
Speech Technology access services using voice.
Multimodal applications will allow end users to
interact with technology in ways that are most
suitable to the situation.
Different context of use portability,
familiarity, use during the meeting, carry your
PC.
Networks anywhere, Wireless Application
Protocol, ADSL.
Telecommunications Networks access web services.
Language Translation any language, emerging,
English to Zulu.

4
Expensive to achieve

Variety of devices with different rendering
capabilities,
Most cell phones do not have a pointing device
compared to desktops.
Difficult to develop an application for
multi-platform deployment without duplicating
development effort.
Render an appropriate form of content depending
on the accessing device e.g. multi dimensional
tables are not suitable for small screen,
Nature of voice applications users should be
bombored with long menus or multidimensional
tables
Diversity in network connections Big images take
long to load.
Differences in supported standards.
XHTML for normal Web
XHTML Mobile Profile for mobile devices replaced
Wireless Markup language
XHTMLVoice multimodal interface e.g. opera
browser.
VoiceXML Telephone Users.
In SA 11 different spoken languages,
Platforms 3 categories of devices mobile, web
and voice.
3X11 33 duplicates of the same information.

5
Solutions Modal Based

Changed focus of research and old techniques are
revisited.
Modal-based design interfaces for different
platforms at the same time, only provide high
level descriptions, translate them to specific
platform later.
Interface Description Languages XML based
specifications
User Interface Markup Language (UIML) one markup
to create user interfaces for any device, any
target language (java, vxml, html) and any device
(cellphone, PDA or desktop PC).
Different people get involved
design specialist use UIML to describe
interaction
Language specialist for each platform transform
it to e.g. XHTML, Wireless Markup Language in
WAP1.0, VoiceXML.

6
Solutions Transcoding

Transcoding is a method for translating one type
of code (e.g. HTML) into a different type (e.g.
VoiceXML).
Wide Applications used as an alternative where
design does not cater for specific needs.
People living with disability screen readers.
Transcoding proxies remove junk, trim or remove
images
Old people
Mobile devices most websites are not designed
for small devices.
Academic institutions, Government, news.
Transcoding Web contents into forms suitable for
small devices.
Capabilities/Preference Profiles (CC/PP) tailor
interface to specific cellphone.
Convenient navigation generate a navigation map,
help to reach different parts easily.

7
Transcoding for Voice

Not only the matter of matching HTML to VoiceXML.
Web design optimized for graphical presentation
different colors, formatting,
This optimization can make content into groups
e.g. navigation menu in the top, advertisements
in the left and right, and main content in the
middle .
However these groups of information cannot be
easily conveyed to users who use alternative
access.
Telephone users rely on what the system reads to
them,
Navigation cannot go straight to access the
information they want as if were surfing the Web.
Rendering complex HTML tables and forms in a
non-graphical manner and difficulties in
inputting and outputting speech.
Improving navigation voice browsing by making
users access important information first.
Inserting text which helps the user to see
different sections of the Web page and different
pages of the website.
Relating Telephone Browsing to Web Browsing,
where users are able to use forward and back
buttons to move mimic the web.

8
Aim

No tool which can be downloaded yet.
Private tool IBM Web Transcoding Publisher
A lot can be done everyone picks a small
component, especially in improving navigation
techniques.
Transcoding process has been guided by
annotations but we have not looked at how these
annotations are created.
Framework which can be used to understand how
different versions of content suitable for
telephone can be created, mantained and
discovered during the transcoding process.
It is more of a server side solution.

9
VoiceXML Applications

Understanding VoiceXML

10
Semantic Web

Web annotation providing not only human-readable
remarks, but also machine-understandable
descriptions,
applications such as discovery, qualification,
and
adaptation of Web documents maintain usability.
Internal or External
Internal Annotations embedded within a markup
languages.
External Annotations internal bad design
practice, separate content and metadata,
External document XPath or XPointer to point
specific element of XML document.
Maintenance keeping consistency with content.

11
Role of Semantics

In transcoding to voice need to be attached on
HTML.
Language the web browser does not need to know
anything about content e.g. but speech you need
to know which TTSs 11, automatic speech
recognition (11).
Relationship between resources Discovering
already existing audio alternatives rather then
synthesizing.
Estimate Quality of a resource based on how it
was created e.g. low quality, original text was
written in English and translated to Zulu using
human language translation tools and then
synthesized to speech. This knowledge can be used
to choose the best possible version of content to
ensure quality of service.
Facilitate creation of resources what versions
of content do we need to create.
Maintenance remove if there is a better quality.
Other roles of this
Searching difficult to search audio resource, if
it is not annotated might need to use ASR, this
will require you to have a bit of knowledge about
it.
Role allocation people allocated based on
languages they understand.
Business processes e.g. how much work was done by
employee X and work out salaries.

12
Solution Overview

Resource creation use a traditional way, create
each resource and provide annotations about it,
annotate XML documents as well.
Transcoding process Separate annotations which
guide the transcoding process into two groups.
Source XML Annotators The first group is the
semantic annotations about content and its
relationship to other resources in the server and
is attached on tags which define the structure of
content in its XML form.
We are trying to come up with a way of
automatically relating semantics in source XML
documents with HTML.
Interface Specialists The second group gives
annotations which guide the adaptation of the web
context to the telephone context and is attached
on HTML elements.

13
Resources and Semantics

Different versions need to maintain Relationship
Before creating HTML.
XML and Extensible Markup Language agree on the
schema.
Collaboration Create content, annotations and
interface

14
Propose Solution
Annotations
Source XML
Annotations B
XSLT
HTML
Annotator
Transcoder
Annotations A
VoiceXML
15
Extracting Annotations from XSLT

We are coming up with a way of automatically
relating semantics on XML documents with contents
of HTML elements.
Semantics were attached on XML elements because
they exist independent of the HTML interface.
Benefits?
It is time consuming to create annotations.
There is evidence that people are reluctant to
create them because the author of annotations is
usually not the one who benefits from them.
Re-use of annotations about content content and
annotations can be created before the interface
is created. These annotations can be used during
the transcoding process without any human
intervention. The promise of Semantic Web.
Re-use of annotations about HTML annotations
which adapt the web to the telephone context will
be re-used when annotations about content being
disseminated changes but the interface structure
does not change e.g. interfaces with the same
HTML structure but having content written in
different languages and different variants, will
use the same annotations to adapt the interface.
Continuous creation of resources new versions of
content can be created even after the interface
has been created. Since more resources can be
created even after content has been transformed
to HTML and converted to VoiceXML, more knowledge
about content can still be discovered.

16
How?

Analyze the XSLT document which transformed the
source XML document.
So far we have manage to rediscover these
annotations by interfering with transformation
rules in each template.
Exploring the possibility of integrating the
annotation tool in a XSLT Processor.

17
Web to Voice Annotations

Complicated HTML Lot of tags which define visual
orientation.
Helper tool visual interface, since aim is to
simplify the adaptation process.
Existing tools are not re-usable we can not
define our own ontology concepts.
Firefox customizable, able to add your own
functionality.
XML User Interface Language Firefox extensions
XUL Dynamic overlays developers modify the
behavior of the windows interface without
changing the original user interface code
Scripting use JavaScript.

18
Customizing Firefox

Interface used to understand the structure of the
HTML document being annotated.

19
References

1 "Internet Usage Statistics For Africa," 2006
http//www.internetworldstats.com/stats1.htmafric
a.
2 "The Extensible HyperText Markup Language,"
2002 http//www.w3.org/TR/xhtml1.
3 "Voice Extensible Markup Language (VoiceXML)
Version 2.0," 2004 http//www.w3.org/TR/voicexml2
0.
4 M. Tsai, "VoiceXML dialog system of the
multimodal IP-Telephony-The application for voice
ordering service," Expert Systems with
Applications, vol. 31, pp. 684-696, 2006.
5 J. R. Smith, R. Mohan, and C. Li,
"Transcoding Internet Content for heterogeneous
Client Devices," presented at IEEE International
Conference on Circuits and Systems, Monterey,
CA,USA, 1998.
6 L. Nevile, "Adaptability and accessibility a
new framework," presented at OZCHI 2005,
Canberra, Australia, 2005.
7 F. Paternò, "Model-based tools for pervasive
usability," Interacting with Computers, vol. 17,
pp. 291-315, 2005.
8 Z. Shao, R. Capra, and M. A. Pérez-Quiñones,
"Annotations for HTML to VoiceXML Transcoding
Producing Voice WebPages with Usability in
Mind.," Computing Research Repository (CoRR),
Technical Report cs.HC/0211037 2002.
9 H. Kim and K.. Lee, "Device-independent web
browsing based on CC/PP and annotation," Journal
of Network and Computer Applications, vol. 18,
pp. 283-303, 2006.
10 D. R. Lunn, "SADIE Structural-Semantics for
Accessibility and Device Independence," in School
of Computer Science University of Manchester,
2005.
11 C. Kouroupetroglou, M. Salampasis, and A.
Manitsaris, "A semantic-Web based Framework for
Developing Applications to Improve Accessibility
in the WWW," presented at International
cross-disciplinary workshop on Web accessibility
(W4A) Building the mobile web rediscovering
accessibility?, Edinburgh, U.K., 2006.
12 S. H. Kurniawan, A. King, D. G. Evans, and
P. L. Blenkhorn, "Personalising web page
presentation for older people," Interacting with
Computers, vol. 18, pp. 457-477, 2006.
13 K. Nagao, Y. Shirai, and K. Squire,
"Semantic annotation and transcoding making Web
content more accessible," IEEE MultiMedia, vol.
8, pp. 69-81, 2001.
14 N. Annamalai, "An Extensible Transcoder For
HTML to VoiceXML Conversion," in Computer
Science University of Texas at Dallas, 2002.
15 M. Lamb and B. Horowitz, "Guidelines for a
VoiceXML Solution Using WebSphere Transcoding
Publisher," vol. 2007.
16 M. Hori, K. Ono, Mari Abe, and T. Koyanagi,
"Generating Transformational Annotation for Web
Document Adaptation Tool Support and Empirical
Evaluation," Journal of Web Semantics, vol. 2,
pp. 1-18, 2005.
17 E. Pontelli, T. Son, C., K. Kottapally, C.
Ngo, R. Reddy, and D. Gillan, "A system for
automatic structure discovery and reasoning-based
navigation of the web," Journal of Interacting
with Computers, vol. 16, pp. 451-475, 2004.
18 N. Yankelovich, "How do users know what to
say?," ACM Interactions, vol. 3, pp. 32-43, 1996.
19 H. Takagi and C. Asakawa, "Web content
transcoding for voice output," presented at 11th
International Conference on World Wide Web,
Hawaii, USA, 2002.