Transcoding Web Content to VoiceXML - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Transcoding Web Content to VoiceXML

Description:

New Technologies: access anywhere, anytime, any device. ... Nagao, Y. Shirai, and K. Squire, 'Semantic annotation and transcoding: making ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 21
Provided by: hantho
Category:

less

Transcript and Presenter's Notes

Title: Transcoding Web Content to VoiceXML


1
Transcoding Web Content to VoiceXML
  • Speech Technology and Research Group
  • Department of Electrical Engineering
  • University of Cape Town
  • Mduduzi Nxumalo

2
Outline
  • Advances Communication Technology.
  • Expensive to achieve.
  • Solutions Modal-based design and Transcoding
  • Aim of the project
  • Understanding VoiceXML and Semantic Web.
  • Discuss proposed solution.

3
World of Communication
  • New Technologies access anywhere, anytime, any
    device.
  • Choose suitable device Desktop PCs, Mobile
    phones, Telephones
  • Different modalities graphical way e.g.
    websites
  • Interaction goes beyond keyboard and mouse.
  • Speech Technology access services using voice.
  • Multimodal applications will allow end users to
    interact with technology in ways that are most
    suitable to the situation.
  • Different context of use portability,
    familiarity, use during the meeting, carry your
    PC.
  • Networks anywhere, Wireless Application
    Protocol, ADSL.
  • Telecommunications Networks access web services.
  • Language Translation any language, emerging,
    English to Zulu.

4
Expensive to achieve
  • Variety of devices with different rendering
    capabilities,
  • Most cell phones do not have a pointing device
    compared to desktops.
  • Difficult to develop an application for
    multi-platform deployment without duplicating
    development effort.
  • Render an appropriate form of content depending
    on the accessing device e.g. multi dimensional
    tables are not suitable for small screen,
  • Nature of voice applications users should be
    bombored with long menus or multidimensional
    tables
  • Diversity in network connections Big images take
    long to load.
  • Differences in supported standards.
  • XHTML for normal Web
  • XHTML Mobile Profile for mobile devices replaced
    Wireless Markup language
  • XHTMLVoice multimodal interface e.g. opera
    browser.
  • VoiceXML Telephone Users.
  • In SA 11 different spoken languages,
  • Platforms 3 categories of devices mobile, web
    and voice.
  • 3X11 33 duplicates of the same information.

5
Solutions Modal Based
  • Changed focus of research and old techniques are
    revisited.
  • Modal-based design interfaces for different
    platforms at the same time, only provide high
    level descriptions, translate them to specific
    platform later.
  • Interface Description Languages XML based
    specifications
  • User Interface Markup Language (UIML) one markup
    to create user interfaces for any device, any
    target language (java, vxml, html) and any device
    (cellphone, PDA or desktop PC).
  • Different people get involved
  • design specialist use UIML to describe
    interaction
  • Language specialist for each platform transform
    it to e.g. XHTML, Wireless Markup Language in
    WAP1.0, VoiceXML.

6
Solutions Transcoding
  • Transcoding is a method for translating one type
    of code (e.g. HTML) into a different type (e.g.
    VoiceXML).
  • Wide Applications used as an alternative where
    design does not cater for specific needs.
  • People living with disability screen readers.
  • Transcoding proxies remove junk, trim or remove
    images
  • Old people
  • Mobile devices most websites are not designed
    for small devices.
  • Academic institutions, Government, news.
  • Transcoding Web contents into forms suitable for
    small devices.
  • Capabilities/Preference Profiles (CC/PP) tailor
    interface to specific cellphone.
  • Convenient navigation generate a navigation map,
    help to reach different parts easily.

7
Transcoding for Voice
  • Not only the matter of matching HTML to VoiceXML.
  • Web design optimized for graphical presentation
  • different colors, formatting,
  • This optimization can make content into groups
    e.g. navigation menu in the top, advertisements
    in the left and right, and main content in the
    middle .
  • However these groups of information cannot be
    easily conveyed to users who use alternative
    access.
  • Telephone users rely on what the system reads to
    them,
  • Navigation cannot go straight to access the
    information they want as if were surfing the Web.
  • Rendering complex HTML tables and forms in a
    non-graphical manner and difficulties in
    inputting and outputting speech.
  • Improving navigation voice browsing by making
    users access important information first.
  • Inserting text which helps the user to see
    different sections of the Web page and different
    pages of the website.
  • Relating Telephone Browsing to Web Browsing,
    where users are able to use forward and back
    buttons to move mimic the web.

8
Aim
  • No tool which can be downloaded yet.
  • Private tool IBM Web Transcoding Publisher
  • A lot can be done everyone picks a small
    component, especially in improving navigation
    techniques.
  • Transcoding process has been guided by
    annotations but we have not looked at how these
    annotations are created.
  • Framework which can be used to understand how
    different versions of content suitable for
    telephone can be created, mantained and
    discovered during the transcoding process.
  • It is more of a server side solution.

9
VoiceXML Applications
  • Understanding VoiceXML

10
Semantic Web
  • Web annotation providing not only human-readable
    remarks, but also machine-understandable
    descriptions,
  • applications such as discovery, qualification,
    and
  • adaptation of Web documents maintain usability.
  • Internal or External
  • Internal Annotations embedded within a markup
    languages.
  • External Annotations internal bad design
    practice, separate content and metadata,
  • External document XPath or XPointer to point
    specific element of XML document.
  • Maintenance keeping consistency with content.

11
Role of Semantics
  • In transcoding to voice need to be attached on
    HTML.
  • Language the web browser does not need to know
    anything about content e.g. but speech you need
    to know which TTSs 11, automatic speech
    recognition (11).
  • Relationship between resources Discovering
    already existing audio alternatives rather then
    synthesizing.
  • Estimate Quality of a resource based on how it
    was created e.g. low quality, original text was
    written in English and translated to Zulu using
    human language translation tools and then
    synthesized to speech. This knowledge can be used
    to choose the best possible version of content to
    ensure quality of service.
  • Facilitate creation of resources what versions
    of content do we need to create.
  • Maintenance remove if there is a better quality.
  • Other roles of this
  • Searching difficult to search audio resource, if
    it is not annotated might need to use ASR, this
    will require you to have a bit of knowledge about
    it.
  • Role allocation people allocated based on
    languages they understand.
  • Business processes e.g. how much work was done by
    employee X and work out salaries.

12
Solution Overview
  • Resource creation use a traditional way, create
    each resource and provide annotations about it,
    annotate XML documents as well.
  • Transcoding process Separate annotations which
    guide the transcoding process into two groups.
  • Source XML Annotators The first group is the
    semantic annotations about content and its
    relationship to other resources in the server and
    is attached on tags which define the structure of
    content in its XML form.
  • We are trying to come up with a way of
    automatically relating semantics in source XML
    documents with HTML.
  • Interface Specialists The second group gives
    annotations which guide the adaptation of the web
    context to the telephone context and is attached
    on HTML elements.

13
Resources and Semantics
  • Different versions need to maintain Relationship
  • Before creating HTML.
  • XML and Extensible Markup Language agree on the
    schema.
  • Collaboration Create content, annotations and
    interface

14
Propose Solution
Annotations
Source XML
Annotations B
XSLT
HTML
Annotator
Transcoder
Annotations A
VoiceXML
15
Extracting Annotations from XSLT
  • We are coming up with a way of automatically
    relating semantics on XML documents with contents
    of HTML elements.
  • Semantics were attached on XML elements because
    they exist independent of the HTML interface.
  • Benefits?
  • It is time consuming to create annotations.
  • There is evidence that people are reluctant to
    create them because the author of annotations is
    usually not the one who benefits from them.
  • Re-use of annotations about content content and
    annotations can be created before the interface
    is created. These annotations can be used during
    the transcoding process without any human
    intervention. The promise of Semantic Web.
  • Re-use of annotations about HTML annotations
    which adapt the web to the telephone context will
    be re-used when annotations about content being
    disseminated changes but the interface structure
    does not change e.g. interfaces with the same
    HTML structure but having content written in
    different languages and different variants, will
    use the same annotations to adapt the interface.
  • Continuous creation of resources new versions of
    content can be created even after the interface
    has been created. Since more resources can be
    created even after content has been transformed
    to HTML and converted to VoiceXML, more knowledge
    about content can still be discovered.

16
How?
  • Analyze the XSLT document which transformed the
    source XML document.
  • So far we have manage to rediscover these
    annotations by interfering with transformation
    rules in each template.
  • Exploring the possibility of integrating the
    annotation tool in a XSLT Processor.

17
Web to Voice Annotations
  • Complicated HTML Lot of tags which define visual
    orientation.
  • Helper tool visual interface, since aim is to
    simplify the adaptation process.
  • Existing tools are not re-usable we can not
    define our own ontology concepts.
  • Firefox customizable, able to add your own
    functionality.
  • XML User Interface Language Firefox extensions
  • XUL Dynamic overlays developers modify the
    behavior of the windows interface without
    changing the original user interface code
  • Scripting use JavaScript.

18
Customizing Firefox
  • Interface used to understand the structure of the
    HTML document being annotated.

19
References
  • 1 "Internet Usage Statistics For Africa," 2006
    http//www.internetworldstats.com/stats1.htmafric
    a.
  • 2 "The Extensible HyperText Markup Language,"
    2002 http//www.w3.org/TR/xhtml1.
  • 3 "Voice Extensible Markup Language (VoiceXML)
    Version 2.0," 2004 http//www.w3.org/TR/voicexml2
    0.
  • 4 M. Tsai, "VoiceXML dialog system of the
    multimodal IP-Telephony-The application for voice
    ordering service," Expert Systems with
    Applications, vol. 31, pp. 684-696, 2006.
  • 5 J. R. Smith, R. Mohan, and C. Li,
    "Transcoding Internet Content for heterogeneous
    Client Devices," presented at IEEE International
    Conference on Circuits and Systems, Monterey,
    CA,USA, 1998.
  • 6 L. Nevile, "Adaptability and accessibility a
    new framework," presented at OZCHI 2005,
    Canberra, Australia, 2005.
  • 7 F. Paternò, "Model-based tools for pervasive
    usability," Interacting with Computers, vol. 17,
    pp. 291-315, 2005.
  • 8 Z. Shao, R. Capra, and M. A. Pérez-Quiñones,
    "Annotations for HTML to VoiceXML Transcoding
    Producing Voice WebPages with Usability in
    Mind.," Computing Research Repository (CoRR),
    Technical Report cs.HC/0211037 2002.
  • 9 H. Kim and K.. Lee, "Device-independent web
    browsing based on CC/PP and annotation," Journal
    of Network and Computer Applications, vol. 18,
    pp. 283-303, 2006.
  • 10 D. R. Lunn, "SADIE Structural-Semantics for
    Accessibility and Device Independence," in School
    of Computer Science University of Manchester,
    2005.
  • 11 C. Kouroupetroglou, M. Salampasis, and A.
    Manitsaris, "A semantic-Web based Framework for
    Developing Applications to Improve Accessibility
    in the WWW," presented at International
    cross-disciplinary workshop on Web accessibility
    (W4A) Building the mobile web rediscovering
    accessibility?, Edinburgh, U.K., 2006.
  • 12 S. H. Kurniawan, A. King, D. G. Evans, and
    P. L. Blenkhorn, "Personalising web page
    presentation for older people," Interacting with
    Computers, vol. 18, pp. 457-477, 2006.
  • 13 K. Nagao, Y. Shirai, and K. Squire,
    "Semantic annotation and transcoding making Web
    content more accessible," IEEE MultiMedia, vol.
    8, pp. 69-81, 2001.
  • 14 N. Annamalai, "An Extensible Transcoder For
    HTML to VoiceXML Conversion," in Computer
    Science University of Texas at Dallas, 2002.
  • 15 M. Lamb and B. Horowitz, "Guidelines for a
    VoiceXML Solution Using WebSphere Transcoding
    Publisher," vol. 2007.
  • 16 M. Hori, K. Ono, Mari Abe, and T. Koyanagi,
    "Generating Transformational Annotation for Web
    Document Adaptation Tool Support and Empirical
    Evaluation," Journal of Web Semantics, vol. 2,
    pp. 1-18, 2005.
  • 17 E. Pontelli, T. Son, C., K. Kottapally, C.
    Ngo, R. Reddy, and D. Gillan, "A system for
    automatic structure discovery and reasoning-based
    navigation of the web," Journal of Interacting
    with Computers, vol. 16, pp. 451-475, 2004.
  • 18 N. Yankelovich, "How do users know what to
    say?," ACM Interactions, vol. 3, pp. 32-43, 1996.
  • 19 H. Takagi and C. Asakawa, "Web content
    transcoding for voice output," presented at 11th
    International Conference on World Wide Web,
    Hawaii, USA, 2002.

20
Thank You
  • Questions and comments are welcome.
Write a Comment
User Comments (0)
About PowerShow.com