Title: Humancomputer dialogue processing: where are we
1Human-computer dialogue processing where are we?
- Yorick Wilks
- Oxford Internet Institute
- and
- University of Sheffield
- www.dcs.shef.ac.uk/yorick
- Tartu, February 2005
2Main points of talk
- Data-driven performance systems versus
AI/linguistic theoretical systems, separate since
1970s - This distinction has mapped, till now, on domain
versus general systems - Both will require modeling of intentional
phenomena like belief, possibly speech acts,but
not in the old way. - It is unclear whether speech technology alone can
accomplish this - Some Sheffield empirical work in dialogue
technology - The COMPANIONS development environment.
3Problems of dialogue theory
- It can easily be vacuous dialogues are systems
of turn-taking - Speech act analysis has led to a rationalist
tradition implausibly deep levels of reasoning,
and no data gathering - There are apparently competing theories but
little or no evaluation of dialogue systems
(compare Information Extraction--Hobbs)
4Three important historical systems
- Colbys PARRY (Stanford, 1971)
- Groszs partitioned network focus system (SRI,
1979) - Perrault, Cohen, Allens speech act system
(Toronto, 1979)
5Colbys PARRY
- Perhaps best ever performance, many users,
robust, but not a normal subject (I.e. a
paranoid) - primitive individual models, some control of
dialogue process but it had lots to say! - Primitive simulation of intentionality
- not syntax but fast pattern matching
- much better than ELIZA
6PARRY conversation
- Have you been hospitalized before?
- THIS IS THE FIRST TIME
- How long have you been there?
- ABOUT TWO WEEKS
- Any headaches?
- MY HEALTH IS FINE
- Are you have memory difficulties?
7More PARRY
- NO
- Then can you tell me who is the President of the
US? - NIXON IS PRESIDENT
- And what day is today?
- TODAY IS THURSDAY--CAN WE GET ON WITH THE
INTERVIEW?
8How is this achieved in PARRY?
- WHAT IS YOUR MAIN PROBLEM _____________________
__________ - WHAT BE YOU MAIN PROBLEM
- BE YOU MAIN PROBLEM
- WHAT BE MAIN PROBLEM
- WHAT BE YOU PROBLEM ?
- WHAT BE YOU MAIN
9Groszs robot domain model
- Primitive notion of focus based on a structure of
partitioned semantic nets (Hendrix) - Unlike PARRY, has knowledge of a complex domain
- no real performance at all
10Grosz (IJCAI 1979)
- (Explicit focus)
- S1 The lid is attached to the container with
1/2 bolts. - R1 Where are the BOLTS?
- (Implicit focus)
- S1 Attach the lid to the container.
- R1 Where at the BOLTS?
11Perrault, Cohen, Allen system
- Based on speech act reasoning
- user must have one of two goals, meeting or
catching a train - Passenger/User Do you know when the Windsor
train arrives? - This is labelled as a REQUEST not a
REQUEST-INFORM (Y/N) because the system knows the
user knows it knows!
12Perrault et al.
- System has domain knowledge and reasoning power
- but virtually no performance
- was the first to assign speech act labels to
dialogue items - has a simple rigid model of nested belief
13Fixed nested beliefs passengers view of
systems view of passengers beliefs.
passenger
system
passenger
14Perrault et al.
- They had the appropriate notion of extreme
ellipsis and zero anaphor - Passenger The 3.15 train to Windsor?
- System Gate 10
- Note NOT It leaves from Gate 10
151970s division of approaches to machine
conversation.
- Domain-dependent systems with coded world
knowledge and some parsing and reasoning - VERSUS
- Wide shallow systems with little knowledge and
high performance - Published AI academic systems all in first group
- Only the second group performed at all
16Moving academic systems towards performance
- Best is Traums TRAINS system--descendant of
Allens work (Toronto-Rochester tradition) - uses corpora and retains reasoning
- moved to the movies in California!
17(No Transcript)
18TRAINS CORPUSDavid Traum (Rochester)
- utt1 s hello can I help you
- utt2 u yeah I want t- I want to determine the
maximum number of boxcars of oranges by
seven a.m. tomorrow morning - utt3 so hm
- so I guess all the boxcars will have to go
through oran- through Corning because
thats where the orange juice orange
factory is - utt4 so from Corning to Bath how far is
that - utt5 s two hours
- utt6 u and its gonna take us also an hour to
load boxcars right - utt7 s right
- utt8 u okay so hm so every
trip will take at least three hours
then - utt9 um
- utt10 s right we can unload any amount of
cargo onto a train in one hour - utt11 so we can so a maximum of
three boxcars in an hour
19AI people join the performersthe Loebner
Competition
- Well designed competition over the last ten
years---essentially a modified Turing test. - Domain dependent entries till 1997, then domain
free. - The game is for journalists to separate and rank
human and machine partners - Search Loebner on web for full transcripts
20Loebner 1994
- NEW JUDGE
- 150
Judge 1 - Do you think the country is headed generally in a
more conservative or more liberal direction right
now? - 151
Terminal - Didnt you say would you discribe that old texas
billionaire as a conservative?? Answer me one
question why are liberals portrayed as donkees?
Are you a caring person? Youre not really
talking about me, are you? - 152
Judge 1 - Ive got the feeling Ive heard these answers
before. Is our nation swinging leftward or
rightward right now? -
Terminal - We are at the beginning of a strong and
wonderful economic recovery, and we can thank the
liberals!
21Loebner 1997
- PROGRAM101804Hi, my name is Catherine, whats
yours? - JUDGE04101832My name is Elliott.
- PROGRAM101834Well, its nice to meet you,
Elliott. - PROGRAM101848Did you see that story on CNN
last night about the lesbian - PROGRAM101904couple who came out at a White
House party on Sunday? - JUDGE04101955Yes, I did. I think it may all
be a publicity stunt for Ellen. - PROGRAM102001Ellen Degeneres was one of them
she was kissing her lover - PROGRAM102011in front of Bill Clinton. Do you
think he was very - PROGRAM102025embarrassed?
- JUDGE04102100I think he want the gay vote. He
will do anything to get it and keep it. - JUDGE04102115He does not have a conservative
base. - PROGRAM102121
- PROGRAM102121Are you still a Friend of Bill
Clinton?
22We won in 1997!
- The CONVERSE program pretended to be Catherine, a
34-year old British journalist living in New York
(to explain any usage problems). - She could talk about 80 appropriate topics
- (80domain free???) clothes, shopping, food,
abortion, fitness, sex.etc.
23Moving performance towards theory
- The Loebner Prize Competition
www.acm.org/-loebner/loebner-prize.html
24The CONVERSE prototype 1997
- Push-me-pull-you architecture
- strong driving top-down scripts (80) in a
re-enterable network with complex output
functions - bottom-up parsing of user input adapted from
statistical prose parser - minimal models of individuals
- contained Wordnet and Collins PNs
- some learning from past Loebners BNC
25CONVERSE architecture
26Restarting Sheffield dialogue work in 2001
- an empirical corpus-based stochastic dialogue
grammar that maps directly to dialogue acts and
uses IE to match concepts with templates. - A better virtual machine for script-like objects
encapsulating both the domain and conversational
strategy (cf. PARRY and Grosz) to maintain the
push-pull approach. - Need for resources to build belief system
representations and quasi-linguistic models of
dialogue structure, scripts etc. - A model of speakers, incrementally reaching
VIEWGEN style ascription of belief procedures to
give dialogue act reasoning functionality
27(No Transcript)
28VIEWGENa belief model that computes agents
states
- Not a static nested belief structure like that of
Perrault and Allen. - Computes other agents RELEVANT states at time of
need - Topic restricted search for relevant information
- Can represent and maintain conflicting agent
attitudes
29VIEWGEN as a knowledge basis for
reference/anaphora resolutionprocedures
- Not just pronouns but grounding of descriptive
phrases in a knowledge basis - Reconsider finding the ground of
that old Texas billionaire as
Ross Perot, against a background of what the
hearer may assume the speaker knows when he says
that.
30(No Transcript)
31cause(deleted(x,sub-directory) not(happy(david)))
deleted(system,sub-directory)
system
belief
deleted(system,sub_directory)
simon
belief
system
goal
Candidate Planning Actions for system
I have just deleted the subdirectory
Click here for Plan recognition
Perform Action
Belief representation of the first turn of an
exchange.
32CONVERSE VIEWGEN
- for ViewGen
- Ballim, A.and Wilks, Y. 1991
Artificial Believers the ascription of belief
Erlbaum, Hillsdale NJ - for CONVERSE and contemporary paradigms (esp. in
industry) - Proceedings of Bellagio Workshops on
Human-Machine Conversation, 97,98,00
33Meanwhile, the world had moved on and empiricism
had reached dialogue processing
- Dialogue act-to-utterance learning using machine
learning over n-grams and preceding dialogue acts
(Cf. Samuels et al. 1998) - Speech act sequence statistics from Verbmobil
(Cf. Maier Reithinger) - Longman BNC dialogue ngrams (50 all domains)
- Tagged Switchboard, Verbmobil, BNC and domain
corpora
34And especially, Steve Young 2002
- ASR now a mature statistically based technology
- In other areas of dialogue, statistical methods
less mature - A complete dialogue system can be seen as a
Partially Observable Markov process - Subcomponents can be observed in turn with
intermediate variables
35Youngs statistical modules
- Speech understanding
- Semantic decoding
- Dialogue act detection
- Dialogue management and control
- Speech generation
- I.e. roughly same as everyone elses!!
36(No Transcript)
37Strategy not like Jelineks MT strategy of 1989!
- Which was non/anti-linguistic with no
intermediate representations hypothesised - Young assumes rougly the same intermediate
objects as we do but in very simplified forms. - The aim to to obtain training data for all of
them so the whole process becomes a single
throughput Markov model.
38Young concedes this model may only be for simple
domains
- His domain is a pizza ordering system
- A typical DialogueActSemantics could therefore
be - Purchase_Requestqty2toppingpepperoni
39Speech Understanding
- Classic mapping of waveforms Y to dialogue acts
A, given system beliefs B to seek - Argmax P(AuY,Bs) which expands to
- Argmax P(YAu,Bs)P(AuBs)
- Then the intermediate wordsequence is introduced,
giving - P(YAu,Bs)? P(Y,WAu,Bs)
- w
- This is a redescription of what empirical NLP
people had been doing for 5 years but no mention
of them!
40This breaks into two parts
- A classic two stage matching of sounds to a word
lattice (I.e. ASR) - Followed by the mapping of the Dialogue acts to
the words, given the system beliefs (I.e. system
state in general) - the dialogue state is crucial for handling
ambiguity and identifying underspecified dialogue
acts
41Introduction of concepts as well as DAs
- it is convenient to introduce another
intermediate representation.C represents the set
of semantic concepts encoded within W - So P(AuW,Bs)?P(AuC,Bs)P(CW,Bs)
- c
- I.e.training for concepts given wordsstates
- And Dialogue acts given conceptsstates.
- Experience suggsts both can be trained directly
on the words!
42(No Transcript)
43Semantic decoding
- Means adding one concept per main word input
(!) by means of a semantic template grammar
such as - PIZZA-QTYTOPPING pizza(s)
- A parse tree is an option if recursive syntactic
structure is required, over simple rule matching,
where concepts label states and concept bigram
probabilities are computed. - Parse tree gives Hidden Understanding Model using
lexical and concept bigrams. - Or learn the simple grammar templates above
directly
44Dialogue act detection
- Range of methods (including MDL) for solving
argmax P(AuC,Bs) - Interesting that this is not thought to depend on
the WORDS (as most people do it) - No reference to linguistic methods for this
(since Samuels et al 98).
45Dialogue management
- This is the one where it is hard to see how he
can get non-trivial data. - Data can be seen as a transition matrix of system
states S against system actions As, filling the
matrix cells with new system states. - Model of training is reinforcement learning
ascribed to Pierracini and then Walker. - No evidence of what such training changes in
practice for a non-trivial system - the typical system S will typically be
intractably large and must be approiximated - Puzzle the users beliefs cannot be directly
observed and must therefore be inferred true,
but.
46Questions about the Speech program for dialogue
- Is this just a description of the empirical NLP
program of work or an attempted reduction like - Jelineks MT without linguists
- Pollacks RAAM for (Fodors) syntactic recursion
- Can data be found to reduce all of a (fairly
general) DM to a transition space that can be
reward trained? - What is the real effect of training a DM--however
found? Reducing unused paths? - How in principle express changes to planning and
belief spaces?
47Back to work in NLPLearning to tag for Dialogue
Acts initial work
- Reithinger and Klesen (1997), n-gram language
modelling, Verbmobil corpus (75) - Samuels et al.(1998) TBL learning on n-gram DA
cues, Verbmobil corpus (75) - Stolcke et al. (2000) more complex n-gram
(including over DA sequents), more complex
Switchboard corpus (71)
48Work at Sheffield Starting with a naive
classifier for DA s
- Direct predictivity of DA s by n-grams as a
preprocess to any ML algorithm. - Get P(dn) for all 1-4 word n-grams and the DA
set over the Switchboard corpus, and take DA
indicated by n-gram with highest predictivity
(threshold for probability levels) - Do 10-foldcross validation (which lowers scores)
- Gives 63 over Switchboard but using only a
fraction of the data Stolcke needed. - Work by Nick Webb (in EC Amities Project)
49Extending the pretagging with TBL
- Gives 72 (Stolckes 71) over the Switchboard
data, but only 3 is due to TBL (rather than the
naive classifier). - Samuiels unable to see what TBL is doing for him.
- This is just a base for a range of more complex
ML algorithms (e.g. WEKA). - Figure same as Stolckes without the cross
validation
50Design of a Dialogue Action Manager at Sheffield
- General purpose DAM where domain dependent
features are separated from the control
structure. - The domain dependent features are stored as
Dialogue Action Forms (DAFs) which are similar to
Augmented Transition Networks (ATNs) - The DAFs represent general pupose Dialogue
manoevres as well as application specific
knowledge. - The control mechanism is based on a basic stack
structure where DAFs are pushed and popped during
the course of a user session.
51The structure of the DAM
- The control mechanism together with the DAFs
provide a flexible means for guiding the user
through the system goals (allowing for topic
change and barge in where needed). - User push is given by the ability to suspend
and stack a new DAF at any point (for a topic or
any user maneuver) - System push is given by the prestacked DAFs
correcsponding to what the system wants to show
or elicit.
52DAM Implementation
- The core mechanism for the DAM is a simple
pop-push stack (slightly augmented), onto which
structures are loaded and run. - The structures are Dialogue Action Forms (DAFs),
which are implemented as Augmented Transition
Networks (ATNs). - ATNs are transition networks that can function
at any level from finite state machines to Turing
Machine power.
53DAM Implementation 2
- The stack is pre-loaded with DAFs that satisfy
the overall goal of designing a bathroom in the
COMIC system. - The control structure has a preference for
continuing to run the DAFs on the stack (system
initiative) unless the user input is outside the
scope of the DAF (user initiative). In which
case, the control structure pushes onto the stack
the new DAF that most closely matches the user
input.
54DAM Implementation 3
- DAFs model the individual topics and
conversational manoeuvres in the application
domain.. - The stack structure will be preloaded with those
DAFs which are necessary for the COMIC bathroom
design task and the dialogue ends when the
Goodbye DAF is popped. - DAFs and stack interpreters together control the
flow of the dialogue
55Dialogue Management the stack
- Work by Catizone and Setzer (EU COMIC Project)
- Contrast with work at CSLI (Stanford) by Peters
and Lemon.
Greeting DAF
Room measurement DAF
Style DAF
Good-bye DAF
56Dialogue Management the augmented stack
- What happens if we match/push onto the stack a
DAF that is already on the stack (further down)? - In this case, we run the DAF and it is then
either satisfied (poppable) or not. - If it is then, on popping, we remove any
(unrun/prestored) copies of the DAF below it on
the stack (slight augmentation here of a simple
stack!)
57An augmented stack 2
- If it is not satisfied (almost certainly because
of context dependent DAFs still to come on stack)
the DAF traps the response and recommends waiting
to discuss that topic and is popped. The copy of
the same DAF lower down the stack is then
accessed and run later at the appropriate place
in the dialogue (this is one of a set of possible
strategies with which we shall experiment).
58An augmented stack 3
- The result we would like with this mechanism is
that a relevant and partly run DAF can be forced
to the top of the stack and all intermediate
partly run DAFs discarded (cf. Groszs notion of
conversational topics being closed and
unreachable for restarting) - .
59DAF example
60Learning to segment the dialogue corpora?
- Segmenting the corpora we have with a range of
tiling-style algorithms (initially by topic) - To segment it plausibly, hopefully into segments
that correspond to structures for DM (Dialogue
Action Frames in our naming) - Being done on the annotated corpus (i.e. a corpus
word model) and on the corpus annotated by
Information Extraction semantic tags (a semantic
model of the corpus) - Repetitions of DAsemantics template patterns amy
may a MDL packing of the corpus possible as an
alternative segmenting method.
61Sheffield Dialogue Research Challenges
- Will a Dialogue manager raise the DA 75ceiling
top-down? - Multimodal dialogue managers. Are they completely
independent of the modality? Are they really
language independent? - What is a good virtual machine for running a
dialogue engine? Do DAFs provide a robust and
efficient mechanism for doing Dialogue
Management? - Will they offer any interesting discoveries on
stack access to, and discard of, incomplete
topics (cf. Stacks and syntax).
62More research challenges
- Applying machine learning to transcripts so as to
determine the content of dialogue management,
i.e. the scope and content of candidate DAFs. - Can the state set of DAFs and a stack be trained
with reinforcement learning? - Can we add a strong belief/planning component to
this and populate it empirically? - Fusion with QA
63Why the dialogue task is still hard
- Where am I in the conversationwhat is being
talked about now, what do they want? - Does topic stereotopy help or are just
Finite-State pairs enough (VoiceXML? TRINDIKIT?)? - How to gather the beliefs/knowledge required ,
preferably from existing sources? - Are there distinctive procedures for managing
conversations that can be modelled by simple
virtual machines? - How to learn the structures we need--assuming we
do---how to get and annotate the data?
64How this research is funded
- AMITIES is a EU-US cooperative R D project
(the first in NLP) 2001-2005 to automate call
centers. - University of Sheffield (UK prime)
- SUNY Albany (US prime)
- Duke U. (US)
- LIMSI Paris (Fr)
- IBM (US)
- COMIC is an EU R D project ( 2001-2005) to
model multimodal dialogue - MaxPlanck Inst (Nijmegen) (Coordiantor)
- University of Edinburgh
- MaxPlanck Inst (Tuebingen)
- KUL Nijmegen
- University of Sheffield
- ViSoft GMBH
65There are now four not two competing approaches
- Logic-based systems with reasoning (old and still
unvalidated by performance) - Extensions of speech engineering methods, machine
learning and simple intermediate structure - Simple handcoded finite state systems in VoiceXML
(Chatbots and commercial systems) - Rational hybrids based on structure and machine
learning.