VOICE XML - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

VOICE XML

Description:

Improvements in computer-based speech recognition. and text-to-speech synthesis. ... Advances are also being made in speech synthesis, or text-to-speech (TTS) ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 34
Provided by: Jan3119
Category:

less

Transcript and Presenter's Notes

Title: VOICE XML


1
VOICE XML
by Jan Bechstein
2
What is Voice XML?
  • VoiceXML is the HTML of the voice web, the
    open standard markup language for voice
    applications.
  • VoiceXML 1.0 was published by the VoiceXML
    Forum, a consortium of over 500 companies,in
    March 2000. (Main supp. devel. ATT, IBM,
    Motorola)
  • The Forum then gave control of the standard to
    the World Wide Web Consortium (W3C).

3
What is Voice XML?
  • VoiceXML 2.0 are already widely used
  • While HTML assumes a graphical web browser with
    display, keyboard, and mouse, VoiceXML assumes a
    voice browser with audio output, audio input, and
    keypad input. Audio input is handled by the voice
    browser's speech recognizer. Audio output
    consists both of recordings and speech
    synthesized by the voice browser's text-to-speech
    system

4
What is Voice XML?
  • A voice browser typically runs on a specialized
    voice gateway node that is connected both to the
    Internet and to the public switched telephone
    network

5
(No Transcript)
6
Why VoiceXML so new?
VoiceXML takes advantage of several trends
  • The growth of the World-Wide Web and of its
    capabilities.
  • Improvements in computer-based speech
    recognition and text-to-speech synthesis.
  • The spread of the WWW beyond the desktop
    computer.

7
The WWW
  • Web servers once delivered only static content,
    but now generate it dynamically using scripts,
    server pages, servlets and other technologies.
    They also provide access to databases and legacy
    systems. VoiceXML takes advantage of all these
    generation technologies.
  • The Internet is improving in performance,
    bandwidth, and quality of service. These
    improvements lead to new types of web
    applications and services, which in turn spur
    more improvements. VoiceXML strongly benefits
    from the ability to move audio data efficiently
    across the web

8
The speech technology
  • Over the phone, and with no speaker training, a
    speech recognition system needs to be given a set
    of speech grammars that tell it what words and
    phrases it should expect
  • Advances are also being made in speech synthesis,
    or text-to-speech (TTS). Not anymore drunken
    robots.

9
The speech technology
  • Automated speech recognition (ASR) systems have
    greatly improved in recent years as better
    algorithms and acoustic models are developed, and
    as more computer power can be brought to bear on
    the task.
  • An ASR system running on an inexpensive home or
    office computer with a good microphone can take
    free-form dictation, as long as it has been
    pre-trained for the speaker's voice.

10
What is so good about Voice XML?
  • VoiceXML devices are smaller (no mouse an
    keyboard)
  • there are more phones(1.5 billion), than
    Computers connected to the WWW
  • easlily to combining visual browsing and voice
    browsing
  • cheaper and easier to use

11
So what is it good for?
  • Information retrieval
  • Directory asssitant (ATT saved 20 million last
    year with it)
  • E-commerce
  • Telephone services
  • E-mail over phone
  • payments and sheduling orders

12
Lets code some...
  • VoiceXML 2.0 is an extensible markup language
    (XML) for the creation of automated speech
    recognition (ASR) and interactive voice response
    (IVR) applications. Based on the XML
    tag/attribute format, the VoiceXML syntax
    involves enclosing instructions (items) within a
    tag structure in the following manner
  • lt element_name attribute_name"attribute_value"gt
  • ......contained items......
  • lt /element_namegt

13
Lets code some...
  • A VoiceXML application consists of one or more
    text files called documents. These document files
    are denoted by a ".vxml" file extension
  • The first TAG is lt ?xml version"1.0"?gt
  • or ltxml version"2.0"gt

14
Lets code some...
  • Inside of the ltvxmlgt tag, a document is broken
    up into discrete dialog elements called
    forms. Each form has an ID.
  • Like that
  • lt form id"welcome"gt

15
Lets code some...
  • Each form has items which controls the session
    and interacts with the user
  • fields
  • ltfieldgt - gathers input from the user via speech
    or DTMF recognition as defined by a grammar
  • ltrecordgt - records an audio clip from the user
  • lttransfergt - transfers the user to another phone
    number

  • Dualtone Multifrequency

16
Lets code some...
  • fields
  • ltobjectgt - invokes a platform-specific object
    that may gather user input, returning the result
    as an ECMAScript object
  • ltsubdialoggt - performs a call to another dialog
    or document(similar to a function call),
    returning the result as an ECMAScript object
  • ECMA a forum for the standartisation of
    Information and Communication Systems

17
Lets code some...
fields ltblockgt - encloses a sequence of
statements for prompting and computation ltinitialgt
- controls mixed-initiative interactions withing
a form
18
Heres some code
lt?xml version"1.0"?gt ltvxml version"2.0"gt ltform
id"getPhoneNumber"gt ltfield name"PhoneNumber"
type"phone" gt ltgrammar src"../grammars/phone.gr
am"                   type"application/srgsxml"
/gt ltpromptgtWhat's your phone number?lt/promptgtlth
elpgt Please say your ten digit phone number.
lt/helpgt lt/fieldgt lt/formgt lt/vxmlgt
19
Lets make it a bit more complicated...
  • There can be different forms in one document.
  • Lets take an example a Pizza place
  • The pizza ordering application is going to need
    to do more than just get a phone number from
    the caller. It should probably also have the
    ability to find out the type of pizza that
    the caller wants and the address for delivery.

20
Lets make it a bit more complicated...
  • We need three forms
  • asking for the telephone number of the customer
    (the info can be used to check on the customer
    database)
  • What kind of pizza the customer wants
  • checking the adress of the customer ( just as
    security check and in case he/she is not at the
    entered place)

21
Lets make it a bit more complicated...
  • To transition between forms, one typically uses
    the ltgotogt tag. Execution begins in the next
    portion of the dialog (contained in another form)
    as dictated by the logic of the application.

22
Lets make it a bit more complicated...
  • ltform id"getPhoneNumber"gt
  • ltfield name"PhoneNumber" type"phone" gt
    ltgrammar src"../grammars/phone.gram"           
            type"application/srgsxml" /gt
    ltpromptgtWhat's your phone number?lt/promptgt
  • lthelpgt Please say your ten digit phone number.
    lt/helpgtlt/fieldgt
  • ltblockgt
  • ltgoto nextgetPhoneNumbergt
  • lt/blockgt lt/formgt

23
Lets make it a bit more complicated...
  • ltform id"pizzaType"gt
  • ltfield name"pizzaTopping" gt
  • ltpromptgtWhat type of pizza do you want?lt/promptgt
  • ltgrammar src"../grammars/pizzas.gram"
    type"application/srgsxml"/gt
  • lt/fieldgt
  • lt/formgt

24
Lets make it a bit more complicated...
  • Transitioning to a form item within a form
  • ltgoto nextitem"some_form_items_var_name" /gt
  • Transitioning to another form in the current
    document
  • ltgoto next"some_form_id" /gt
  • Transitioning to another document
  • ltgoto next"http//www.some_url.com/some_doc.vxm
    l" /gt

25
  • You can also split the forms in different
    documents and make transitions to each other

26
Not complicated enought? I teach you...
  • Conditional Statements
  • ltifgt, ltelsegt, ltelseifgt are the three elements
    utilized for conditional statements in VoiceXML.
  • Each element should utilize a cond attribute
    specifying an ECMAScript boolean condition.
    Examples of the usage of each tag are shown on
    the next slide

27
Not complicated enought? I teach you...
Example 1. ltif cond"total gt 1000"gt ltpromptgt This
is too much to spend.lt/promptgt lt/ifgt Example
2. ltif cond"amount lt 29.95"gt ltgoto
next"debit"/gt ltelse /gt ltpromptgtYou are out of
cash. lt/promptgt lt/ifgt
28
Not complicated enought? I teach you...
Example 3. ltif cond"flavor 'vanilla'"gt ltprompt
gt You ordered vanilla. lt/promptgt ltelseif
cond"flavor 'chocolate'" /gt ltpromptgt You
ordered chocolate. lt/promptgt ltelse /gt ltpromptgt
You didn't order vanilla or chocolate.
lt/promptgt lt/ifgt
29
The big scope...
There are also other important tags and values
for example session - read only variables
pertaining to an entire user session. These
variables are declared by the platform and cannot
be set within VoiceXML documents. application -
declared by the ltvargt element as children of the
root applications ltvxmlgt tag (declared directly
under this tag and no other). They exist while
the root document is loaded and can be accessed
at any level within any document in the
application. document - declared as children of a
supporting document's ltvxmlgt tag. They are
initialized upon loading the supporting document
and may be accessed only within that
document. dialog - declared as children of ltformgt
or ltmenugt, these variables are accessible only
within that dialog element and are initialized
when the form is visited. If declared inside of
executable content, initialization occurs when
the content is executed. Form/field item
variables initialize as the form item is
collected (see Tutorials1 2). (anonymous) -
Each ltblockgt, ltfilledgt, and ltcatchgt element
defines a new anonymous scope in which variables
may be declared.
30
Chill out...
Are there any questions?
31
Sources for the aquisation of info
  • VoiceXML Forum website
  • W3C.org Website
  • WebDevelopersConsortiumForum
  •  

32
Additional sources
  • A number of VoiceXML Forum Members provide access
    to developer sites and tool kits that will allow
    you to try out VoiceXML for yourself. A few of
    these are
  • BeVocal Cafe
  • IBM WebSphere Voice Server SDK
  • Motorola Mobile Application Developer's Kit
  • Nuance Voice Site Staging Center
  • Tellme.Studio
  • VoiceGenie Developer Workshop
  • Nuance VBuilder Desktop GUI Developer's Tool
  •  

33
 
Tank U weri mäni ... and have a niiiice
test... good luck 2 you all !!!
Write a Comment
User Comments (0)
About PowerShow.com