Title: Voice XML
1Voice XML
- Team 1
- Matt Ganis, Jonathan Hill, Henry Wong
- Anne I. Mannette-Wright
2Agenda
- History of Voice Applications and Voice XML
- Related Voice Type Languages
- Advantages of Voice XML
- Architecture of VoiceXML
- Paper 1
- Paper 2
- Paper 3
- Demonstration
- Voice XML 2.0
- Differences between Voice XML 1.0 and 2.0
- The Future Voice XML 2.1
3History of Voice Applications
- Voice technologies emerged in the 1990s
- Automatic Speech Recognition (ASR)
- Small vocabulary and speech recognition problems
were solved - Text-to-Speech Systems
- Can generate speech responses on the fly
- Interactive Voice Response (IVR) applications
4History of Voice Applications
- IVRs became programmable but programmable IVRs
are - Difficult to program (call scripting is often
vendor specific) so each vendor had to reinvent
wheel - Did not allow for the easy movement of an
application from one IVR to another due to the
proprietary nature of IVRs
5History of Voice XML
- 1995 ATT started work on Phone Markup Language
(PML) - Oct.1998 Motorola developed VoxML (Voice Markup
Language) - Feb.1999 IBM developed SpeechML technology
- Mar.1999 VoiceXML Forum was formed by IBM, ATT,
Lucent, and Motorola - Mission was to design a standard dialog design
language that developers could use to build
conversational applications - March 2000 VoiceXML Forum releases VoiceXML 1.0
to the general public - May 2000 accepted by W3C
6W3C Speech Interface Framework
From McGashan, Dr. Scott, VoiceXML 2.0 from the
Inside, retrieved from www.voicexmlreview.org/De
c2001/features/inside.html
7Related Voice Type Languages
- Related to VoiceXML
- Grammar XML (grXML)
- Provides speech grammars used by speech
recognition engines - Speech Synthesis Markup Language (SSML)
- SSML specification is based upon JSML(J Speech
Markup Language) and JSGF (J Speech Grammar
Format) specifications, which are owned by Sun. - Introduced in September 2004 is currently a W3C
standard at Version 1.0 - Standardized way of specifying how text is
rendered as speech and includes tags for
pronunciation, tone, inflection, etc. - Often embedded in VoiceXML scripts to drive
interactive telephony systems.
8Related Voice Type Languages
- Related to VoiceXML (Continued)
- Call Control XML (CCXML)
- W3C standard markup language for controlling
telephony and telephony equipment currently at
Version 1.0 - Performs tasks such as setting up conference
calls, transferring incoming calls, etc. - Works hand-in-hand with VoiceXML
9Architecture of VoiceXML
From http//www.w3.org/TR/voicexml/Voice
eXtensible Markup Language (VoiceXML) version
1.0
10Advantages of Voice XML
- VoiceXML is a markup language that
- Minimizes client/server interactions by
specifying multiple interactions per document. - Shields application authors from low-level, and
platform-specific details. - Separates user interaction code (in VoiceXML)
from service logic (e.g. CGI scripts). - Promotes service portability across
implementation platforms. VoiceXML is a common
language for content providers, tool providers,
and platform providers. - Is easy to use for simple interactions, and yet
provides language features to support complex
dialogs.
11Paper 1
- Authored by Bruce Lucas VoiceXML for Web-based
Distributed Conversational Applications - Presents an introduction to VoiceXML
- Comparison to HTML
- Support for Natural Dialogue
12Paper 1
- VoiceXML is an XML application which results in
the following benefits - Allows the reuse and easy retooling of existing
tools for creating, transforming, and parsing XML
documents - Allows VoiceXML to make use of other
complementary XML-based standards. Example Java
Speech Markup Language for speech synthesis - A form is VoiceXMLs basic dialogue unit
- Contains a set of inputs (fields)
- Specifies what to do with a set of fields after
data is collected - A field includes a prompt and a specification of
what the user is allowed to say
13Paper 1 - VoiceXML Code Example
- lt?xml version1.0?gt
- ltvxml version1.0gt
- ltmenugt
- ltpromptgtSay one of ltenumerate/gtlt/promptgt
- ltchoice nexthttp//www.sports.example/sports.vx
mlgt - Sports scores
- lt/choicegt
- ltchoice nexthttp//www.weather.example/weather.
vxmlgt - Weather information
- lt/choicegt
- ltchoice nextlogingt
- Log in
- lt/choicegt
- lt/menugt
- ltform idlogingt
- ltfield namephone_number typephonegt
- ltpromptgtPlease say your complete phone
numberlt/promptgt - lt/fieldgt
14Paper 1
- VoiceXML includes support for common field types
including numbers, digits, phone, date and time
AND for user-specified fields using grammars - ltformgt
- ltfield namedrinkgt
- ltpromptgtWhat would you like to drink?lt/promptgt
- ltgrammargt
- coffee tea orange juice milk
nothing - lt/grammargt
- lt/fieldgt
- ltfield namesandwichgt
- ltpromptgtWhat sandwich would you like?lt/promptgt
- ltgrammar srcsandwiches.gram/gt
- lt/fieldgt
- ltblockgt
- ltsubmit next/servlet/order/gt
- lt/blockgt
- lt/formgt
15Paper 1 The Distributed Model
From Lucas, Bruce, VoiceXML for Web-Based
Distributed Conversational Applications, Communica
tions of the ACM, Vol.43, No.9, September 2000.
- VoiceXML provides support for advanced features
such as - Local validation and processing
- Audio playback and recording
- Support for context specific and taped help and
reusable sub dialogues
16Paper 1 VoiceXML compared with HTML
- An HTML document is a single unit specified by a
URI and presented to the user all at once - A VoiceXML document contains a number of dialogue
units (menus or forms) presented sequentially - An HTML document has no markup language to
identify distinct units - A VoiceXML document is structured to reflect the
sequential nature of the voice medium - An HTML document is like one single dialogue
- A VoiceXML document requires dialogue elements so
they can be presented one at a time. - VoiceXML has application logic for sequencing
among dialogue units
17Paper 1 Support for Natural Dialogue
- VoiceXML supports directed and mixed
initiative dialogues - directed dialogues the computer directs the
conversation at each step by prompting the user
for the next piece of information - Example C On what date do you wish to fly?
- H May 6th
- mixed initiative dialogues each participant
can take the initiative in leading a
conversation. VoiceXML does this by allowing
input grammars to be specified at the form level - C How can I help you?
- H Id like to fly from New York on May 8th
- C Where would you like to fly to?
18Paper 2
- Concepts of Programming by Voice
- Motivated by need to program without typing,
therefore preventing repetitive stress injuries
(RPI), a common injury among those who spend long
hours typing - Voice-activated software for the disabled is a
prime motivator in development - Paper proposes a system that creates an
environment for voice-activated programming
19Paper 2
- Costs of such software has fallen dramatically
- 7500 in 1998
- 100 in 2005
- Products Include
- Dragon Naturally Speaking
- IBM Via Voice
- Hausbie Voice Express
20Paper 2
- Authors developed a generator called
VocalGenerator using Dragon Naturally Speaking
with MS Visual C - Input a context-free grammar compatible with
most programming languages - Output An environment in which a voice
recognition, syntax-directed program can be
written by voice input alone - Allows for better recognition and selection of
sections of code
21Paper 2
- Evaluation of the product
- Programming is faster using a Syntax directed
voice recognition system than a natural language
DVR - A programmer suffering from repetitive stress
injuries will be able to program at a speed
sufficient to maintain competitive employment
22Paper 3
- Paper 3 focuses on V-commerce through a
survey of Voice XML applications for business
communication - Looks at the inherent risks in human to human
communication and the challenges these pose to
human to computer communication - Examines speech recognition
- Seeks to leverage the predominance of telephone
usage globally
23Paper 3
- Utilizes the W3C Voice Browser Working Group
design criteria including - Consistency
- Interoperability
- Generality
- Internationalization
- Generalization and Readability
- Implementation
24Paper 3
- Looks at the potential for Voice-activated Web
interface - Looks at a transactional communication method
with six phases - Sender has an idea
- Sender transforms the idea into a message
- Sender transmits a message
- Receiver gets the message
- Receiver interprets the message
- Receiver reacts and sends feedback
25Paper 3
- Challenges Include
- Unproven business models
- Business Process Change Requirements
- Channel conflicts
- Technology hurdles
- Legal issues
- Security privacy
26Paper 3
- Conclusions
- Speech is natural, flexible and efficient
- Voice technology will improve
- Voice recognition capabilities will improve
- The intersection of voice recognition, telecom
and Web technologies may lead to a large market
for products that take advantage of this
intersection
27Demo
- Using TellMe Studio (http//studio.tellme.com)
- TellMe Studio provides you with resources to
- Build and test your own Internet-powered "phone
sites" with nothing but your Web browser and an
ordinary telephone in the following ways - Type VoiceXML directly into an area called the
Scratchpad and then call the phone number to
preview the code - Publish the VoiceXML and audio files on a
publically accessible Web server, point Studio at
the URL for your application's "home page", and
once again call the Studio phone number to
preview the application - Browse and leverage an extensive library of
sample code, grammars, audio, and VoiceXML
documentation - Participate in the Voice Web development
community through open newsgroups
28Demo (Continued)
- This demo Drink Recipes I - will use one of the
prebuilt VoiceXML scripts available from the
TellMe Studio Code Library - This version of Drink Recipes
- asks the caller for a drink name
- in response, plays back the drink's ingredients
list and mixing instructions. - demonstrates the use of large grammars and how to
create data-driven applications.
29VoiceXML 2.0
From McGashan, Dr. Scott, VoiceXML 2.0 from the
Inside, retrieved from http//www.voicexmlrevie
w.org/Dec2001/features/inside.html
30Differences Between VoiceXML 2.0
- Differences between VoiceXML 1.0 and 2.0
- Interoperability
- Functional Completeness
- Clarity
31VoiceXML 2.0
- Interoperability VoiceXML 2.0 contains the
following new formats that guarantee developers
that their applications run on any VoiceXML
platform conforming to the VoiceXML 2.0
specification - input XML Format of the Speech Recognition
Grammar Specification for speech and DTMF input
VoiceXML 1.0 did not require any particular
speech grammar format - output Speech Synthesis Markup Language (SSML)
is used for text-to-speech and audio output
VoiceXML 1.0 did not use SSML and its speech
markup elements are not supported in Voice XML
2.0
32VoiceXML 2.0
- Interoperability (Continued)
- protocol the HTTP protocol for fetching
documents and resources is supported. Voice XML
1.0 did not require support for HTTP - audio audio platforms recommended for support in
VoiceXML 1.0 are now required in VoiceXML 1.0
33VoiceXML 2.0
- Functional Completeness New elements, attributes
and variables have been added in VoiceXML 2.0
that enable developers to ensure that key aspects
of the cycle of generating system output,
interpreting user input and transitioning from
one dialog to another is described. - NOTE VoiceXML 1.0 contained gaps for example
when prompts were played to the user - Some of the new/enhanced elements, variables and
support include - application.lastresult variable provides info
about last recognition in the application - ltloggt element generates a debug message
- ltthrowgt and ltcatchgt elements enhanced to provide
more info - ltaudiogt element enhanced with an expr
attribute - ltmenugt enhanced with accept attribute
- Enhanced support for greater control over
universal grammars
34VoiceXML 2.0
- Clarity Voice XML 2.0 provides a clear
description and interpretation of ALL elements
(and their attributes), how they interact with
one another, and their expected behavior. - NOTE VoiceXML 1.0 contains omissions and
contradictions in this respect - Some clarification changes include
- Subdialogs ltsubdialoggt description clarified
- Root and Leaf document definitions explicitly
defined - Prompt queueing and input collection
relationship between these two clarified - Relationship between VoiceXML 2.0 and ECMAScript
variables clarified - VoiceXML 2.0 clarifies conformance between
VoiceXML documents and VoiceXML processors - Alignment of VoiceXML 2.0 with Speech Grammar
and Speech Synthesis specifications
35VoiceXML 2.1
- Voice XML 2.1was released on June 13, 2005 by the
W3C as a candidate recommendation - Voice XML 2.1 proposes 8 enhancements to VoiceXML
2.0 as follows - Referencing grammars dynamically
- Referencing scripts dynamically
- Using ltmarkgt to detect Barge-in during prompt
playback - Using ltdatagt to fetch XML without requiring a
dialog transfer - Concatenating prompts dynamically using
ltforeachgt. - Recording user utterances while attempting
recognition - Adding namelist to ltdisconnectgt
- Adding type to lttransfergt
36References
- Ali, Sanwar, Albohali, Mohamed, Wibowo, Kustim,
VoiceXML for Business Applications A Survey,
First Annual ABIT Conference, May 3-5, 2001,
Pittsburg, Pennsylvania. - Arnold, Stephen A., Mark, Leo and Goldthwaite,
John, Programming by Voice, VocalProgramming,
ASSETS00, November 13-15, Arlington, Virginia - Lucas, Bruce, VoiceXML for Web-based Distributed
Conversational Applications, Communications of
the ACM, September 2000, Vol.43, No.9, pp.53-57. - http//www.w3.org/TR/voicexml/Voice eXtensible
Markup Language (VoiceXML version 1.0 - http//www.w3.org/TR/voicexml/Voice eXtensible
Markup Language (VoiceXML version 2.0) - http//www.w3.org/TR/voicexml/Voice eXtensible
Markup Language (VoiceXML version 2.1) - https//studio.tellme.com/vxml2/ovw/migrating21.ht
ml - http//www.voicexmlreview.org/Dec2001/features/ins
ide-full.html - McGashan, Dr. Scott, VoiceXML 2.0 from the
Inside, retrieved from www.voicexmlreview.org/Dec
2001/features/inside.html