Voice XML

About This Presentation

Title:

Voice XML

Description:

Team 1 Matt Ganis, Jonathan Hill, Henry Wong Anne I. Mannette-Wright Agenda History of Voice Applications and Voice XML Related Voice Type Languages Advantages of ... – PowerPoint PPT presentation

Number of Views:304

Avg rating:3.0/5.0

Slides: 37

Provided by: csis4

Category:

more less

Transcript and Presenter's Notes

Title: Voice XML

1
Voice XML

Team 1
Matt Ganis, Jonathan Hill, Henry Wong
Anne I. Mannette-Wright

2
Agenda

History of Voice Applications and Voice XML
Related Voice Type Languages
Advantages of Voice XML
Architecture of VoiceXML
Paper 1
Paper 2
Paper 3
Demonstration
Voice XML 2.0
Differences between Voice XML 1.0 and 2.0
The Future Voice XML 2.1

3
History of Voice Applications

Voice technologies emerged in the 1990s
Automatic Speech Recognition (ASR)
Small vocabulary and speech recognition problems
were solved
Text-to-Speech Systems
Can generate speech responses on the fly
Interactive Voice Response (IVR) applications

4
History of Voice Applications

IVRs became programmable but programmable IVRs
are
Difficult to program (call scripting is often
vendor specific) so each vendor had to reinvent
wheel
Did not allow for the easy movement of an
application from one IVR to another due to the
proprietary nature of IVRs

5
History of Voice XML

1995 ATT started work on Phone Markup Language
(PML)
Oct.1998 Motorola developed VoxML (Voice Markup
Language)
Feb.1999 IBM developed SpeechML technology
Mar.1999 VoiceXML Forum was formed by IBM, ATT,
Lucent, and Motorola
Mission was to design a standard dialog design
language that developers could use to build
conversational applications
March 2000 VoiceXML Forum releases VoiceXML 1.0
to the general public
May 2000 accepted by W3C

6
W3C Speech Interface Framework
From McGashan, Dr. Scott, VoiceXML 2.0 from the
Inside, retrieved from www.voicexmlreview.org/De
c2001/features/inside.html
7
Related Voice Type Languages

Related to VoiceXML
Grammar XML (grXML)
Provides speech grammars used by speech
recognition engines
Speech Synthesis Markup Language (SSML)
SSML specification is based upon JSML(J Speech
Markup Language) and JSGF (J Speech Grammar
Format) specifications, which are owned by Sun.
Introduced in September 2004 is currently a W3C
standard at Version 1.0
Standardized way of specifying how text is
rendered as speech and includes tags for
pronunciation, tone, inflection, etc.
Often embedded in VoiceXML scripts to drive
interactive telephony systems.

8
Related Voice Type Languages

Related to VoiceXML (Continued)
Call Control XML (CCXML)
W3C standard markup language for controlling
telephony and telephony equipment currently at
Version 1.0
Performs tasks such as setting up conference
calls, transferring incoming calls, etc.
Works hand-in-hand with VoiceXML

9
Architecture of VoiceXML
From http//www.w3.org/TR/voicexml/Voice
eXtensible Markup Language (VoiceXML) version
1.0
10
Advantages of Voice XML

VoiceXML is a markup language that
Minimizes client/server interactions by
specifying multiple interactions per document.
Shields application authors from low-level, and
platform-specific details.
Separates user interaction code (in VoiceXML)
from service logic (e.g. CGI scripts).
Promotes service portability across
implementation platforms. VoiceXML is a common
language for content providers, tool providers,
and platform providers.
Is easy to use for simple interactions, and yet
provides language features to support complex
dialogs.

11
Paper 1

Authored by Bruce Lucas VoiceXML for Web-based
Distributed Conversational Applications
Presents an introduction to VoiceXML
Comparison to HTML
Support for Natural Dialogue

12
Paper 1

VoiceXML is an XML application which results in
the following benefits
Allows the reuse and easy retooling of existing
tools for creating, transforming, and parsing XML
documents
Allows VoiceXML to make use of other
complementary XML-based standards. Example Java
Speech Markup Language for speech synthesis
A form is VoiceXMLs basic dialogue unit
Contains a set of inputs (fields)
Specifies what to do with a set of fields after
data is collected
A field includes a prompt and a specification of
what the user is allowed to say

13
Paper 1 - VoiceXML Code Example

lt?xml version1.0?gt
ltvxml version1.0gt
ltmenugt
ltpromptgtSay one of ltenumerate/gtlt/promptgt
ltchoice nexthttp//www.sports.example/sports.vx
mlgt
Sports scores
lt/choicegt
ltchoice nexthttp//www.weather.example/weather.
vxmlgt
Weather information
lt/choicegt
ltchoice nextlogingt
Log in
lt/choicegt
lt/menugt
ltform idlogingt
ltfield namephone_number typephonegt
ltpromptgtPlease say your complete phone
numberlt/promptgt
lt/fieldgt

14
Paper 1

VoiceXML includes support for common field types
including numbers, digits, phone, date and time
AND for user-specified fields using grammars
ltformgt
ltfield namedrinkgt
ltpromptgtWhat would you like to drink?lt/promptgt
ltgrammargt
coffee tea orange juice milk
nothing
lt/grammargt
lt/fieldgt
ltfield namesandwichgt
ltpromptgtWhat sandwich would you like?lt/promptgt
ltgrammar srcsandwiches.gram/gt
lt/fieldgt
ltblockgt
ltsubmit next/servlet/order/gt
lt/blockgt
lt/formgt

15
Paper 1 The Distributed Model
From Lucas, Bruce, VoiceXML for Web-Based
Distributed Conversational Applications, Communica
tions of the ACM, Vol.43, No.9, September 2000.

VoiceXML provides support for advanced features
such as
Local validation and processing
Audio playback and recording
Support for context specific and taped help and
reusable sub dialogues

16
Paper 1 VoiceXML compared with HTML

An HTML document is a single unit specified by a
URI and presented to the user all at once
A VoiceXML document contains a number of dialogue
units (menus or forms) presented sequentially
An HTML document has no markup language to
identify distinct units
A VoiceXML document is structured to reflect the
sequential nature of the voice medium
An HTML document is like one single dialogue
A VoiceXML document requires dialogue elements so
they can be presented one at a time.
VoiceXML has application logic for sequencing
among dialogue units

17
Paper 1 Support for Natural Dialogue

VoiceXML supports directed and mixed
initiative dialogues
directed dialogues the computer directs the
conversation at each step by prompting the user
for the next piece of information
Example C On what date do you wish to fly?
H May 6th
mixed initiative dialogues each participant
can take the initiative in leading a
conversation. VoiceXML does this by allowing
input grammars to be specified at the form level
C How can I help you?
H Id like to fly from New York on May 8th
C Where would you like to fly to?

18
Paper 2

Concepts of Programming by Voice
Motivated by need to program without typing,
therefore preventing repetitive stress injuries
(RPI), a common injury among those who spend long
hours typing
Voice-activated software for the disabled is a
prime motivator in development
Paper proposes a system that creates an
environment for voice-activated programming

19
Paper 2

Costs of such software has fallen dramatically
7500 in 1998
100 in 2005
Products Include
Dragon Naturally Speaking
IBM Via Voice
Hausbie Voice Express

20
Paper 2

Authors developed a generator called
VocalGenerator using Dragon Naturally Speaking
with MS Visual C
Input a context-free grammar compatible with
most programming languages
Output An environment in which a voice
recognition, syntax-directed program can be
written by voice input alone
Allows for better recognition and selection of
sections of code

21
Paper 2

Evaluation of the product
Programming is faster using a Syntax directed
voice recognition system than a natural language
DVR
A programmer suffering from repetitive stress
injuries will be able to program at a speed
sufficient to maintain competitive employment

22
Paper 3

Paper 3 focuses on V-commerce through a
survey of Voice XML applications for business
communication
Looks at the inherent risks in human to human
communication and the challenges these pose to
human to computer communication
Examines speech recognition
Seeks to leverage the predominance of telephone
usage globally

23
Paper 3

Utilizes the W3C Voice Browser Working Group
design criteria including
Consistency
Interoperability
Generality
Internationalization
Generalization and Readability
Implementation

24
Paper 3

Looks at the potential for Voice-activated Web
interface
Looks at a transactional communication method
with six phases
Sender has an idea
Sender transforms the idea into a message
Sender transmits a message
Receiver gets the message
Receiver interprets the message
Receiver reacts and sends feedback

25
Paper 3

Challenges Include
Unproven business models
Business Process Change Requirements
Channel conflicts
Technology hurdles
Legal issues
Security privacy

26
Paper 3

Conclusions
Speech is natural, flexible and efficient
Voice technology will improve
Voice recognition capabilities will improve
The intersection of voice recognition, telecom
and Web technologies may lead to a large market
for products that take advantage of this
intersection

27
Demo

Using TellMe Studio (http//studio.tellme.com)
TellMe Studio provides you with resources to
Build and test your own Internet-powered "phone
sites" with nothing but your Web browser and an
ordinary telephone in the following ways
Type VoiceXML directly into an area called the
Scratchpad and then call the phone number to
preview the code
Publish the VoiceXML and audio files on a
publically accessible Web server, point Studio at
the URL for your application's "home page", and
once again call the Studio phone number to
preview the application
Browse and leverage an extensive library of
sample code, grammars, audio, and VoiceXML
documentation
Participate in the Voice Web development
community through open newsgroups

28
Demo (Continued)

This demo Drink Recipes I - will use one of the
prebuilt VoiceXML scripts available from the
TellMe Studio Code Library
This version of Drink Recipes
asks the caller for a drink name
in response, plays back the drink's ingredients
list and mixing instructions.
demonstrates the use of large grammars and how to
create data-driven applications.

29
VoiceXML 2.0
From McGashan, Dr. Scott, VoiceXML 2.0 from the
Inside, retrieved from http//www.voicexmlrevie
w.org/Dec2001/features/inside.html
30
Differences Between VoiceXML 2.0

Differences between VoiceXML 1.0 and 2.0
Interoperability
Functional Completeness
Clarity

31
VoiceXML 2.0

Interoperability VoiceXML 2.0 contains the
following new formats that guarantee developers
that their applications run on any VoiceXML
platform conforming to the VoiceXML 2.0
specification
input XML Format of the Speech Recognition
Grammar Specification for speech and DTMF input
VoiceXML 1.0 did not require any particular
speech grammar format
output Speech Synthesis Markup Language (SSML)
is used for text-to-speech and audio output
VoiceXML 1.0 did not use SSML and its speech
markup elements are not supported in Voice XML
2.0

32
VoiceXML 2.0

Interoperability (Continued)
protocol the HTTP protocol for fetching
documents and resources is supported. Voice XML
1.0 did not require support for HTTP
audio audio platforms recommended for support in
VoiceXML 1.0 are now required in VoiceXML 1.0

33
VoiceXML 2.0

Functional Completeness New elements, attributes
and variables have been added in VoiceXML 2.0
that enable developers to ensure that key aspects
of the cycle of generating system output,
interpreting user input and transitioning from
one dialog to another is described.
NOTE VoiceXML 1.0 contained gaps for example
when prompts were played to the user
Some of the new/enhanced elements, variables and
support include
application.lastresult variable provides info
about last recognition in the application
ltloggt element generates a debug message
ltthrowgt and ltcatchgt elements enhanced to provide
more info
ltaudiogt element enhanced with an expr
attribute
ltmenugt enhanced with accept attribute
Enhanced support for greater control over
universal grammars

34
VoiceXML 2.0

Clarity Voice XML 2.0 provides a clear
description and interpretation of ALL elements
(and their attributes), how they interact with
one another, and their expected behavior.
NOTE VoiceXML 1.0 contains omissions and
contradictions in this respect
Some clarification changes include
Subdialogs ltsubdialoggt description clarified
Root and Leaf document definitions explicitly
defined
Prompt queueing and input collection
relationship between these two clarified
Relationship between VoiceXML 2.0 and ECMAScript
variables clarified
VoiceXML 2.0 clarifies conformance between
VoiceXML documents and VoiceXML processors
Alignment of VoiceXML 2.0 with Speech Grammar
and Speech Synthesis specifications

35
VoiceXML 2.1

Voice XML 2.1was released on June 13, 2005 by the
W3C as a candidate recommendation
Voice XML 2.1 proposes 8 enhancements to VoiceXML
2.0 as follows
Referencing grammars dynamically
Referencing scripts dynamically
Using ltmarkgt to detect Barge-in during prompt
playback
Using ltdatagt to fetch XML without requiring a
dialog transfer
Concatenating prompts dynamically using
ltforeachgt.
Recording user utterances while attempting
recognition
Adding namelist to ltdisconnectgt
Adding type to lttransfergt

36
References

Ali, Sanwar, Albohali, Mohamed, Wibowo, Kustim,
VoiceXML for Business Applications A Survey,
First Annual ABIT Conference, May 3-5, 2001,
Pittsburg, Pennsylvania.
Arnold, Stephen A., Mark, Leo and Goldthwaite,
John, Programming by Voice, VocalProgramming,
ASSETS00, November 13-15, Arlington, Virginia
Lucas, Bruce, VoiceXML for Web-based Distributed
Conversational Applications, Communications of
the ACM, September 2000, Vol.43, No.9, pp.53-57.
http//www.w3.org/TR/voicexml/Voice eXtensible
Markup Language (VoiceXML version 1.0
http//www.w3.org/TR/voicexml/Voice eXtensible
Markup Language (VoiceXML version 2.0)
http//www.w3.org/TR/voicexml/Voice eXtensible
Markup Language (VoiceXML version 2.1)
https//studio.tellme.com/vxml2/ovw/migrating21.ht
ml
http//www.voicexmlreview.org/Dec2001/features/ins
ide-full.html
McGashan, Dr. Scott, VoiceXML 2.0 from the
Inside, retrieved from www.voicexmlreview.org/Dec
2001/features/inside.html