Creating User Interfaces - PowerPoint PPT Presentation

About This Presentation
Title:

Creating User Interfaces

Description:

Creating User Interfaces [Continue presentations as needed] Speech recognition. Speech synthesis Homework: Report on current products. Register on Tellme Studies. – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 30
Provided by: Jeani174
Category:

less

Transcript and Presenter's Notes

Title: Creating User Interfaces


1
Creating User Interfaces
  • Continue presentations as needed Speech
    recognition. Speech synthesis
  • Homework Report on current products. Register on
    Tellme Studies. Study VoiceXML

2
Speech recognition
  • User speaks. System 'understands', at least
    enough to perform some action.
  • Related to (but not the same as)
  • Natural language understanding
  • Voice print identification
  • Record information to be re-played to human in
    compressed form for later interaction
  • Speech synthesis (other direction) words to
    speech
  • ?

3
Natural language understanding
  • Skip speech altogether, but type in statements or
    phrases in normal language
  • What is normal? We tend not to speak that
    grammatically
  • Many 'natural language systems' actually use
    keywords
  • Histor
  • Moon rocks example
  • Combine speech to natural language

4
Continuous versus discrete
  • Speaker speaks 'naturally' versus
  • Speaker separates words

5
Examples
  • Dictation no understanding as such, produce
    words/sentences in a program
  • (Telephone) Help desk / Information generally
    restricted or directed speech, choosing from
    alternatives (may or may not be given). Advances
    the process
  • Restricted commands actually carrying out
    operations
  • Factory example start and stop
  • Car radio, heat/AC
  • Phone call specific number

6
Training
  • Dictation application user takes time to read
    specific test to train the system
  • Note some systems also adapt with use. If when
    user corrects the results, system may do better
    next time.
  • Phone lookup user records names. No
    'understanding', just record for matching.

7
Audience content
  • Some systems may allow adapting to audiences, for
    example, male versus female
  • Some systems have restrictions on types of
    content
  • Historical note IBM system in 1980s 1990s was
    restricted to male, American-born speakers (no
    speech impediments) and legal text.

8
Speech recognition concepts
  • Air pressure ? diaphragm in phone ?electrical
    signal ? (Fourier Transform) ?wave pattern
  • matched against
  • sets of canonical patterns (native speaker of
    English, perhaps male/female young/old
    alternatives)
  • generated for the specified grammar (using a
    segmentationdividing up of the parts)
  • Note interplay of grammar and statistics
    distinguishes different approaches

9
Fourier Transform(Discrete Fourier Transform --
FFT)
  • Takes data representing a signal
  • And produces numbers representing the combination
    of sine and cosine waves that make up the signal

10
Speech recognition
  • Works on the product of the FFT
  • Uses (in most cases)
  • Segmentation attempt to break up into pieces,
    perhaps syllables or words
  • Grammar definition of what is to be expected
  • Probabilities if first part matched X, then
    greater probability that then next would match to
    Y

11
Current State of the Art
  • General, no restrictions, speech reco, good
    enough to act on the speech? always about to
    happen?
  • dictation / substitute for keyboard exists and
    satisfies many
  • Is this most important application for most
    users?
  • May not be killer ap, but may be good for
    motivating research
  • Homework prepare brief report on a current
    product or application. Can be one you use
    yourself.

12
Speech synthesis
  • aka TTS (text to speech)
  • Application determines that the computer needs to
    say certain words
  • lexical units (syllables of words) ?phonemes
    ?pre-recorded (wav) files of phonemes

13
Speech synthesis
  • This is again a segmentation process need to
    divide up the words and then put together so
    speech sounds 'natural'.
  • particular phoneme may need to sound different
    in different context.
  • also need to deal with abbreviations local
    accents
  • Place names (important in travel weather
    applications)
  • Special case detect and use wav file for each
    name.
  • Older methods were all synthesized
  • similar distinction between all synthesized and
    samples of music

14
Speech synthesis
  • is essentially the computer reading out loud.
  • Easy to do most things
  • More and more difficult to do complete job
  • Different languages may be easier than English.
  • People who are not monolingual please comment!

15
Restricted / directed speech applications
  • We will use the tellme studio engine to create
    directed speech applications.
  • These make use of
  • Grammars
  • Options to use numbers (buttons)
  • Recorded (.wav) sounds
  • Text to speech

16
studio.tellme.com
  • Company that provides engine for applications
  • Provides developing environment
  • We are doing the Tellme version of VoiceXML, but
    it appears to be standard.
  • Register as a developer
  • Provide your own id assigned a PIN
  • Put VoiceXML in ScratchPad place (no audio files)
  • 1-800-555-VXML (8965)
  • SAY id and then PIN or can give phone number.
    Tellme runs either
  • program in ScratchPad OR
  • program at Application URL for projects with
    multiple files
  • To look at someone else's project, you change
    your Application URL
  • called pointing your account to a new source.

17
XML
  • Generalization of HTML
  • XML documents have markup.
  • Tag indicating type of element and, possibly with
    attributes, content, tag closer.
  • Document must be well-formed.
  • Developers decide on element types.

18
VoiceXML
  • XML document (VXML header)
  • This means proper nesting of elements, quotation
    marks on attributes
  • VoiceXML has tags for flow-of-control and
    calculations.
  • Also can use ltscriptgt for JavaScript
  • Grammars come in different varieties. We will
    use the Tellme way.
  • Grammars are included in CDATA tags to prevent
    XML interpretation.
  • Many grammars constructed for you.
  • ltfield name"answer" type"boolean" gtwill
    listen for yes or no. ltfield name"price"
    type"currency" gt will listen for currency.
  • ltmenu gt ltchoice gt ltchoicegt for list

19
Very brief overview
  • ltvxmlgt document contains ltformgt and/or menu
    elements.
  • ltformgt can contain ltblockgt, ltfieldgt
  • ltblockgt can contain ltaudiogt or do its own audio
  • ltfieldgt can contain ltpromptgt, ltgrammargt,
    ltnoinputgt, etc.
  • NOTE certain types of ltfieldgt elements use
    built-in grammars, for example, boolean
  • Can have a child node ltfilledgt that indicates
    what to do if there is a match
  • ltmenugt is a compressed way use a simple grammar

20
Very brief, cont.
  • Logic can be done using a ltscriptgt element that
    contains a variant of JavaScript and/or
  • vxml logic elements, including
  • ltvargt
  • ltifgt, ltelsegt ltelseifgt
  • other
  • These may be part of a ltfilledgt element

21
Audio
  • Tellme studio provides way to record your
    speech as a wav file to upload to a website.
    Sends it to your email address
  • You upload your VoiceXML file plus any wav files
    (and anything else)ltaudio src"mygreeting.wav"gtWe
    lcome to my site lt/audiogtIf Tellme can't find
    the mygreeting.wav file, it uses its Text to
    Speech on the string "Welcome to my site".
  • Note you also can use a full URL
    http//....
  • You put in the URL for the voicexml file into
    your Tellme studio account, called pointing to
    the URL.
  • TEST

22
VoiceXML basics, continued
  • ltformgt element can contain
  • ltblockgt elements, which can contain ltaudiogt,
    ltgogt, other
  • ltfieldgt which can contain
  • ltpromptgt
  • ltgrammargt (if not one of built-in grammars)
  • ltfilledgt
  • ltvargt tags can be at different levels (for
    example, document, block, or higher levels)
  • ltifgt ltelseifgtltelsegt tags
  • ltscriptgt elements for JavaScript (which can also
    appear in expressionsgt

23
VoiceXML basics typical case
  • a form element
  • ltfieldgt
  • ltpromptgt, made up of ltaudiogt, with reference to
    recorded wav file and backup text
  • ltgrammargt, if NOT using built-in grammars
    designated by type attribute of field. This is a
    CDATA section.
  • ltfilledgt with (follow-on) code using field
  • ltcatchgt for nomatch, noinput cases

24
Caution
  • A form contains various elements,
  • including
  • a field.
  • If a field has a grammar and the grammar is
    satisfied, control goes to a
  • filled tag

25
obligatory
  • lt?xml version"1.0"?gt
  • ltvxml version"2.0"gt
  • ltformgt
  • ltblockgt
  • ltaudio src"prompt1.wav"gtHello, world lt/audiogt
  • lt/blockgt
  • lt/formgt
  • lt/vxmlgt

recorded using tellme studio
backup using TTS, just in case src file missing
26
example
  • Asks for number of credits and calculates when
    you/caller can register
  • uses built-in grammar for number
  • No error recovery. You need to do better than
    this in your project.
  • Unfortunate situation there is a element type
    filled and an element type field.
  • The lt symbols are represented using lt

27
  • lt?xml version"1.0"?gt
  • ltvxml version"2.1"
  • xmlns"http//www.w3.org/2001/vxml"gt
  • ltform id"credit"gt
  • ltvar name"rest" expr"1000"/gt
  • ltfield name"bcount" type"number"gt
  • ltpromptgt
  • ltaudio src"howmanycredits.wav"gtHello there.
    How many credits have you earned? lt/audiogt
  • lt/promptgt
  • ltgrammar type"application/x-gsl" mode"voice" gt
  • lt!CDATA
  • NATURAL_NUMBER_THRU_999
  • gt
  • lt/grammargt
  • ltcatch event"noinput nomatch"gt ltaudio
    src"sorry.wav"gtSorry. I didn't get that.lt/audiogt
    ltexit/gt lt/catchgt

28
  • ltfilledgt
  • ltassign name"rest" expr"bcount"/gt
  • ltaudiogt ltvalue expr"rest" /gt lt/audiogt
  • ltif cond"restlt30" gt
  • ltaudio src"homestretch.wav"gtYou can
    register on the third day lt/audiogt
  • ltelseif cond"restlt60" /gt
  • ltaudio src"morethanhalf.wav"gtYou can
    register on the second day lt/audiogt
  • ltelseif cond"restlt90" /gt
  • ltaudio src"goodstart.wav"gtYou can
    register on the first daylt/audiogt
  • ltelse/gt
  • ltaudiogtYou can register on the fourth
    day lt/audiogt
  • lt/ifgt
  • ltaudio src"goodbye.wav"gtGood bye.
    lt/audiogt
  • lt/filledgt lt/fieldgt lt/formgt lt/vxmlgt

29
Homework
  • Do research / think about your own experiences
    and come prepared to report on a speech
    recognition / speech synthesis application
  • Start learning VoiceXML
Write a Comment
User Comments (0)
About PowerShow.com