Title: LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing
1LING 138/238 SYMBSYS 138Intro to Computer Speech
and Language Processing
- Lecture 3 October 5, 2004
- Dan Jurafsky
2Week 2 Dialogue and Conversational Agents
- Examples of spoken language systems
- Components of a dialogue system, focus on these
3 - ASR
- NLU
- Dialogue management
- VoiceXML
- Grounding and Confirmation
3Conversational Agents
- AKA
- Spoken Language Systems
- Dialogue Systems
- Speech Dialogue Systems
- Applications
- Travel arrangements (Amtrak, United airlines)
- Telephone call routing
- Tutoring
- Communicating with robots
- Anything with limited screen/keyboard
4A travel dialog Communicator
5Call routing ATT HMIHY
6A tutorial dialogue ITSPOKE
7Dialogue System Architecture
- Simplest possible architecture ELIZA
- Read-search/replace-print loop
- Well need something with more sophisticated
dialogue control - And speech
8Dialogue System Architecture
9ASR engine
- ASR Automatic Speech Recognition
- Job of ASR system is to go from speech (telephone
or microphone) to words - We will be studying this in a few weeks
10ASR Overview (pic from Yook 2003)
11ASR in Dialogue Systems
- ASR systems work better if can constrain what
words the speaker is likely to say. - A dialogue system often has these constraints
- System What city are you departing from?
- Can expect sentences of the form
- I want to (leavedepart) from CITYNAME
- From CITYNAME
- CITYNAME
- etc
12ASR in Dialogue Systems
- Also, can adapt to speaker
- But!! ASR is errorful
- So unlike ELIZA, cant count on the words being
correct - As we will see, this fact about error plays a
huge role in dialogue system design
13Natural Language Understanding
- Also called NLU
- We will discuss this later in the quarter
- There are many ways to represent the meaning of
sentences - For speech dialogue systems, perhaps the most
common is a simple one called Frame and slot
semantics. - Semantics meaning
14An example of a frame
- Show me morning flights from Boston to SF on
Tuesday. - SHOW
- FLIGHTS
- ORIGIN
- CITY Boston
- DATE Tuesday
- TIME morning
- DEST
- CITY San Francisco
15How to generate this semantics?
- Many methods, as we will see in week 9
- Simplest semantic grammars
- LIST -gt show me I want can I see
- DEPARTTIME -gt (afteraroundbefore) HOUR
morning afternoon evening - HOUR -gt onetwothreetwelve (ampm)
- FLIGHTS -gt (a) flightflights
- ORIGIN -gt from CITY
- DESTINATION -gt to CITY
- CITY -gt Boston San Francisco Denver
Washington
16Semantics for a sentence
- LIST FLIGHTS ORIGIN
- Show me flights from Boston
- DESTINATION DEPARTDATE
- to San Francisco on Tuesday
- DEPARTTIME
- morning
17Frame-filling
- We use a parser (week 10) to take these rules and
apply them to the sentence. - Resulting in a semantics for the sentence
- We can then write some simple code
- That takes the semantically labeled sentence
- And fills in the frame.
18Other NLU Approaches
- Cascade of Finite-State-Transducers
- Instead of a parser, we could use FSTs, which are
very fast, to create the semantics. - Or we could use Syntactic rules with semantic
attachments - This latter is what is done in VoiceXML, so we
will see that today.
19Generation and TTS
- Wont say much about this today
- TTS next week!
- Generation two main approaches
- Simple templates (prescripted sentences)
- Unification use similar grammar rules as for
parsing, but run them backwards!
20Dialogue Manager
- Eliza was simplest dialogue manager
- Read-search/replace-print loop
- No state was kept system did the same thing on
every sentence - A real dialogue manager needs to keep state
- We cant keep asking the same question over and
over!
21Three architectures for dialogue management
- Finite State
- Frame-based
- Planning Agents
22Finite State Dialogue Manager
23Finite-state dialogue managers
- System completely controls the conversation with
the user. - It asks the user a series of question
- Ignoring (or misinterpreting) anything the user
says that is not a direct answer to the systems
questions
24Dialogue Initiative
- Initiative means who has control of the
conversation at any point - Single initiative
- System
- User
- Mixed initative
25System Initiative
- Systems which completely control the conversation
at all times are called system initiative. - Advantages
- Simple to build
- User always knows what they can say next
- System always knows what user can say next
- Known words Better performance from ASR
- Known topic Better performance from NLU
- Disadvantage
- Too limited
26User Initiative
- User directs the system
- Generally, user asks a single question, system
answers - System cant ask questions back, engage in
clarification dialogue, confirmation dialogue - Used for simple database queries
- User asks question, system gives answer
- Web search is user initiative dialogue.
27Problems with System Initiative
- Real dialogue involves give and take!
- In travel planning, users might want to say
something that is not the direct answer to the
question. - For example answering more than one question in a
sentence - Hi, Id like to fly from Seattle Tuesday morning
- I want a flight from Milwaukee to Orlando one way
leaving after 5 p.m. on Wednesday.
28Single initiative universals
- We can give users a little more flexibility by
adding universal commands - Universals commands you can say anywhere
- As if we augmented every state of FSA with these
- Help
- Correct
- This describes many implemented systems
- But still doesnt deal with mixed initiative
29Mixed Initiative
- Conversational initiative can shift between
system and user - Simplest kind of mixed initiative use the
structure of the frame itself to guide dialogue - Slot Question
- ORIGIN What city are you leaving from?
- DEST Where are you going?
- DEPT DATE What day would you like to leave?
- DEPT TIME What time would you like to leave?
- AIRLINE What is your preferred airline?
30Frames are mixed-initiative
- User can answer multiple questions at once.
- System asks questions of user, filling any slots
that user specifies - When frame is filled, do database query
- If user answers 3 questions at once, system has
to fill slots and not ask these questions again! - Anyhow, we avoid the strict constraints on order
of the finite-state architecture.
31Multiple frames
- flights, hotels, rental cars
- Flight legs Each flight can have multiple legs,
which might need to be discussed separately - Presenting the flights (If there are multiple
flights meeting users constraints) - It has slots like 1ST_FLIGHT or 2ND_FLIGHT so use
can ask how much is the second one - General route information
- Which airlines fly from Boston to San Francisco
- Airfare practices
- Do I have to stay over Saturday to get a decent
airfare?
32Multiple Frames
- Need to be able to switch from frame to frame
- Based on what user says.
- Disambiguate which slot of which frame an input
is supposed to fill, then switch dialogue control
to that frame.
33VoiceXML
- Voice eXtensible Markup Language
- An XML-based dialogue design language
- Makes use of ASR and TTS
- Deals well with simple, frame-based mixed
initiative dialogue. - Most common in commercial world (too limited for
research systems) - But useful to get a handle on the concepts.
34Voice XML
- Each dialogue is a ltformgt. (Form is the VoiceXML
word for frame) - Each ltformgt generally consists of a sequence of
ltfieldgts, with other commands
35Sample vxml doc
- ltformgt
- ltfield name"transporttype"gt
- ltpromptgt
- Please choose airline, hotel, or rental
car. lt/promptgt - ltgrammar type"application/xnuance-gsl"gt
- airline hotel "rental car"
- lt/grammargt
- lt/fieldgt
- ltblockgt
- ltpromptgt
- You have chosen ltvalue expr"transporttype"gt.
lt/promptgt - lt/blockgt
- lt/formgt
36VoiceXML interpreter
- Walks through a VXML form in document order
- Iteratively selecting each item
- If multiple fields, visit each one in order.
- Special commands for events
37Another vxml doc (1)
- noinputgt
- I'm sorry, I didn't hear you. ltreprompt/gt
- lt/noinputgt
- ltnomatchgt
- I'm sorry, I didn't understand that. ltreprompt/gt
- lt/nomatchgt
38Another vxml doc (2)
- ltformgt
- ltblockgt Welcome to the air travel
consultant. lt/blockgt - ltfield name"origin"gt
- ltpromptgt Which city do you want to
leave from? lt/promptgt - ltgrammar type"application/xnuance-gsl"gt
- (san francisco) denver (new york)
barcelona - lt/grammargt
- ltfilledgt
- ltpromptgt OK, from ltvalue expr"origin"gt
lt/promptgt - lt/filledgt
- lt/fieldgt
-
39Another vxml doc (3)
- ltfield name"destination"gt
- ltpromptgt And which city do you want to go
to? lt/promptgt - ltgrammar type"application/xnuance-gsl"gt
- (san francisco) denver (new york)
barcelona - lt/grammargt
- ltfilledgt
- ltpromptgt OK, to ltvalue
expr"destination"gt lt/promptgt - lt/filledgt
- lt/fieldgt
- ltfield name"departdate" type"date"gt
- ltpromptgt And what date do you want to
leave? lt/promptgt - ltfilledgt
- ltpromptgt OK, on ltvalue
expr"departdate"gt lt/promptgt - lt/filledgt
- lt/fieldgt
-
-
40Another vxml doc (4)
- ltblockgt
- ltpromptgt OK, I have you are departing from
- ltvalue expr"origingt to ltvalue
expr"destinationgt on ltvalue expr"departdate"gt - lt/promptgt
- send the info to book a flight...
- lt/blockgt
- lt/formgt
-
-
41A mixed initiative VXML doc
- Mixed initiative user might answer a different
question - So VoiceXML interpreter cant just evaluate each
field of form in order - User might answer field2 when system asked field1
- So need grammar which can handle all sorts of
input - Field1
- Field2
- Field 1 and field 2
- etc
42VXML Nuance-style grammars
- Rewrite rules
- Wantsentence -gt I want to (flygo)
- Nuance VXML format is
- () for concatenation, for disjunction
- Each rule has a name
- Wantsentence (I want to fly go)
- Airports (san francisco) denver
43Mixed-init VXML example (3)
- ltnoinputgt I'm sorry, I didn't hear you.
ltreprompt/gt lt/noinputgt - ltnomatchgt I'm sorry, I didn't understand that.
ltreprompt/gt lt/nomatchgt - ltformgt
- ltgrammar type"application/xnuance-gsl"gt
- lt! CDATA
-
44Grammar
- Flight ( ?
- (i wanna (want to) fly go)
- (i'd like to fly go)
- ((i wanna)(i'd like a) flight)
-
-
- ( from leaving departing Cityx)
ltorigin xgt - ( (?going to)(arriving in) Cityx)
ltdest xgt - ( from leaving departing Cityx
- (?going to)(arriving in) Cityy)
ltorigin xgt ltdest ygt -
- ?please
- )
-
45Grammar
- City (san francisco) (s f o) return( "san
francisco, california") - (denver) (d e n) return( "denver,
colorado") - (seattle) (s t x) return(
"seattle, washington") -
- gt lt/grammargt
-
46Grammar
- ltinitial name"init"gt
- ltpromptgt Welcome to the air travel
consultant. What are your travel plans?
lt/promptgt - lt/initialgt
- ltfield name"origin"gt
- ltpromptgt Which city do you want to leave
from? lt/promptgt - ltfilledgt
- ltpromptgt OK, from ltvalue expr"origin"gt
lt/promptgt - lt/filledgt
- lt/fieldgt
-
-
47Grammar
- ltfield name"dest"gt
- ltpromptgt And which city do you want to go
to? lt/promptgt - ltfilledgt
- ltpromptgt OK, to ltvalue expr"dest"gt
lt/promptgt - lt/filledgt
- lt/fieldgt
- ltblockgt
- ltpromptgt OK, I have you are departing from
ltvalue expr"origin"gt - to ltvalue expr"dest"gt. lt/promptgt
- send the info to book a flight...
- lt/blockgt
- lt/formgt
-
-
48Grounding and Confirmation
- Dialogue is a collective act performed by speaker
and hearer - Common ground set of things mutually believed by
both speaker and hearer - Need to achieve common ground, so hearer must
ground or acknowledge speakers utterance. - Clark (1996)
- Principle of closure. Agents performing an
action require evidence, sufficient for current
purposes, that they have succeeded in performing
it
49Clark and Schaefer Grounding
- Continued attention B continues attending to A
- Relevant next contribution B starts in on next
relevant contribution - Acknowledgement B nods or says continuer like
uh-huh, yeah, assessment (great!) - Demonstration B demonstrates understanding A by
paraphrasing or reformulating As contribution,
or by collaboratively completing As utterance - Display B displays verbatim all or part of As
presentation
50(No Transcript)
51Grounding examples
- Display
- C I need to travel in May
- A And, what day in May did you want to travel?
- Acknowledgement
- C He wants to fly from Boston
- A mm-hmm
- C to Baltimore Washington International
52Grounding Examples (2)
- Acknowledgement next relevant contribution
- And, what day in May did you want to travel?
- And youre flying into what city?
- And what time would you like to leave?
53Grounding and Dialogue Systems
- Grounding is not just a tidbit about humans
- Is key to design of conversational agent
- Why?
54Grounding and Dialogue Systems
- Grounding is not just a tidbit about humans
- Is key to design of conversational agent
- Why?
- HCI researchers find users of speech-based
interfaces are confused when system doesnt give
them an explicit acknowedgement signal - Experiment with this
55Confirmation
- Another reason for grounding
- Speech is a pretty errorful channel
- Hearer could misinterpret the speaker
- This is important in Conv. Agents
- Since we are using ASR, which is still really
buggy. - So we need to do lots of grounding and
confirmation
56Explicit confirmation
- S Which city do you want to leave from?
- U Baltimore
- S Do you want to leave from Baltimore?
- U Yes
57Explicit confirmation
- U Id like to fly from Denver Colorado to New
York City on September 21st in the morning on
United Airlines - S Lets see then. I have you going from Denver
Colorado to New York on September 21st. Is that
correct? - U Yes
58Implicit confirmation display
- U Id like to travel to Berlin
- S When do you want to travel to Berlin?
- U Hi Id like to fly to Seattle Tuesday morning
- S Traveling to Seattle on Tuesday, August
eleventh in the morning. Your name?
59Implicit vs. Explicit
- Complementary strengths
- Explicit easier for users to correct systemss
mistakes (can just say no) - But explicit is cumbersome and long
- Implicit much more natural, quicker, simpler (if
system guesses right).
60Implicit and Explicit
- Early systems all-implicit or all-explicit
- Modern systems adaptive
- How to decide?
- ASR system can give confidence metric.
- This expresses how convinced system is of its
transcription of the speech - If high confidence, use implicit confirmation
- If low confidence, use explicit confirmation
61Next Lecture
- Dialogue acts
- More on VXML
- More on design of dialogue agents
- Evaluation of dialogue agents
- Dont forget to look at the homework early!!!!