Humancomputer dialogue processing: where are we

About This Presentation

Title:

Humancomputer dialogue processing: where are we

Description:

I think it may all be a publicity stunt for Ellen. PROGRAM[10:20:01]Ellen Degeneres was one of them she was kissing her lover ... – PowerPoint PPT presentation

Number of Views:125

Avg rating:3.0/5.0

Slides: 66

Provided by: nick201

Category:

more less

Transcript and Presenter's Notes

Title: Humancomputer dialogue processing: where are we

1
Human-computer dialogue processing where are we?

Yorick Wilks
Oxford Internet Institute
and
University of Sheffield
www.dcs.shef.ac.uk/yorick
Tartu, February 2005

2
Main points of talk

Data-driven performance systems versus
AI/linguistic theoretical systems, separate since
1970s
This distinction has mapped, till now, on domain
versus general systems
Both will require modeling of intentional
phenomena like belief, possibly speech acts,but
not in the old way.
It is unclear whether speech technology alone can
accomplish this
Some Sheffield empirical work in dialogue
technology
The COMPANIONS development environment.

3
Problems of dialogue theory

It can easily be vacuous dialogues are systems
of turn-taking
Speech act analysis has led to a rationalist
tradition implausibly deep levels of reasoning,
and no data gathering
There are apparently competing theories but
little or no evaluation of dialogue systems
(compare Information Extraction--Hobbs)

4
Three important historical systems

Colbys PARRY (Stanford, 1971)
Groszs partitioned network focus system (SRI,
1979)
Perrault, Cohen, Allens speech act system
(Toronto, 1979)

5
Colbys PARRY

Perhaps best ever performance, many users,
robust, but not a normal subject (I.e. a
paranoid)
primitive individual models, some control of
dialogue process but it had lots to say!
Primitive simulation of intentionality
not syntax but fast pattern matching
much better than ELIZA

6
PARRY conversation

Have you been hospitalized before?
THIS IS THE FIRST TIME
How long have you been there?
ABOUT TWO WEEKS
Any headaches?
MY HEALTH IS FINE
Are you have memory difficulties?

7
More PARRY

NO
Then can you tell me who is the President of the
US?
NIXON IS PRESIDENT
And what day is today?
TODAY IS THURSDAY--CAN WE GET ON WITH THE
INTERVIEW?

8
How is this achieved in PARRY?

WHAT IS YOUR MAIN PROBLEM _____________________
__________
WHAT BE YOU MAIN PROBLEM
BE YOU MAIN PROBLEM
WHAT BE MAIN PROBLEM
WHAT BE YOU PROBLEM ?
WHAT BE YOU MAIN

9
Groszs robot domain model

Primitive notion of focus based on a structure of
partitioned semantic nets (Hendrix)
Unlike PARRY, has knowledge of a complex domain
no real performance at all

10
Grosz (IJCAI 1979)

(Explicit focus)
S1 The lid is attached to the container with
1/2 bolts.
R1 Where are the BOLTS?
(Implicit focus)
S1 Attach the lid to the container.
R1 Where at the BOLTS?

11
Perrault, Cohen, Allen system

Based on speech act reasoning
user must have one of two goals, meeting or
catching a train
Passenger/User Do you know when the Windsor
train arrives?
This is labelled as a REQUEST not a
REQUEST-INFORM (Y/N) because the system knows the
user knows it knows!

12
Perrault et al.

System has domain knowledge and reasoning power
but virtually no performance
was the first to assign speech act labels to
dialogue items
has a simple rigid model of nested belief

13
Fixed nested beliefs passengers view of
systems view of passengers beliefs.
passenger
system
passenger
14
Perrault et al.

They had the appropriate notion of extreme
ellipsis and zero anaphor
Passenger The 3.15 train to Windsor?
System Gate 10
Note NOT It leaves from Gate 10

15
1970s division of approaches to machine
conversation.

Domain-dependent systems with coded world
knowledge and some parsing and reasoning
VERSUS
Wide shallow systems with little knowledge and
high performance
Published AI academic systems all in first group
Only the second group performed at all

16
Moving academic systems towards performance

Best is Traums TRAINS system--descendant of
Allens work (Toronto-Rochester tradition)
uses corpora and retains reasoning
moved to the movies in California!

17
(No Transcript)
18
TRAINS CORPUSDavid Traum (Rochester)

utt1 s hello can I help you
utt2 u yeah I want t- I want to determine the
maximum number of boxcars of oranges by
seven a.m. tomorrow morning
utt3 so hm
so I guess all the boxcars will have to go
through oran- through Corning because
thats where the orange juice orange
factory is
utt4 so from Corning to Bath how far is
that
utt5 s two hours
utt6 u and its gonna take us also an hour to
load boxcars right
utt7 s right
utt8 u okay so hm so every
trip will take at least three hours
then
utt9 um
utt10 s right we can unload any amount of
cargo onto a train in one hour
utt11 so we can so a maximum of
three boxcars in an hour

19
AI people join the performersthe Loebner
Competition

Well designed competition over the last ten
years---essentially a modified Turing test.
Domain dependent entries till 1997, then domain
free.
The game is for journalists to separate and rank
human and machine partners
Search Loebner on web for full transcripts

20
Loebner 1994

NEW JUDGE
150
Judge 1
Do you think the country is headed generally in a
more conservative or more liberal direction right
now?
151
Terminal
Didnt you say would you discribe that old texas
billionaire as a conservative?? Answer me one
question why are liberals portrayed as donkees?
Are you a caring person? Youre not really
talking about me, are you?
152
Judge 1
Ive got the feeling Ive heard these answers
before. Is our nation swinging leftward or
rightward right now?
Terminal
We are at the beginning of a strong and
wonderful economic recovery, and we can thank the
liberals!

21
Loebner 1997

PROGRAM101804Hi, my name is Catherine, whats
yours?
JUDGE04101832My name is Elliott.
PROGRAM101834Well, its nice to meet you,
Elliott.
PROGRAM101848Did you see that story on CNN
last night about the lesbian
PROGRAM101904couple who came out at a White
House party on Sunday?
JUDGE04101955Yes, I did. I think it may all
be a publicity stunt for Ellen.
PROGRAM102001Ellen Degeneres was one of them
she was kissing her lover
PROGRAM102011in front of Bill Clinton. Do you
think he was very
PROGRAM102025embarrassed?
JUDGE04102100I think he want the gay vote. He
will do anything to get it and keep it.
JUDGE04102115He does not have a conservative
base.
PROGRAM102121
PROGRAM102121Are you still a Friend of Bill
Clinton?

22
We won in 1997!

The CONVERSE program pretended to be Catherine, a
34-year old British journalist living in New York
(to explain any usage problems).
She could talk about 80 appropriate topics
(80domain free???) clothes, shopping, food,
abortion, fitness, sex.etc.

23
Moving performance towards theory

The Loebner Prize Competition
www.acm.org/-loebner/loebner-prize.html

24
The CONVERSE prototype 1997

Push-me-pull-you architecture
strong driving top-down scripts (80) in a
re-enterable network with complex output
functions
bottom-up parsing of user input adapted from
statistical prose parser
minimal models of individuals
contained Wordnet and Collins PNs
some learning from past Loebners BNC

25
CONVERSE architecture
26
Restarting Sheffield dialogue work in 2001

an empirical corpus-based stochastic dialogue
grammar that maps directly to dialogue acts and
uses IE to match concepts with templates.
A better virtual machine for script-like objects
encapsulating both the domain and conversational
strategy (cf. PARRY and Grosz) to maintain the
push-pull approach.
Need for resources to build belief system
representations and quasi-linguistic models of
dialogue structure, scripts etc.
A model of speakers, incrementally reaching
VIEWGEN style ascription of belief procedures to
give dialogue act reasoning functionality

27
(No Transcript)
28
VIEWGENa belief model that computes agents
states

Not a static nested belief structure like that of
Perrault and Allen.
Computes other agents RELEVANT states at time of
need
Topic restricted search for relevant information
Can represent and maintain conflicting agent
attitudes

29
VIEWGEN as a knowledge basis for
reference/anaphora resolutionprocedures

Not just pronouns but grounding of descriptive
phrases in a knowledge basis
Reconsider finding the ground of
that old Texas billionaire as
Ross Perot, against a background of what the
hearer may assume the speaker knows when he says
that.

30
(No Transcript)
31
cause(deleted(x,sub-directory) not(happy(david)))
deleted(system,sub-directory)
system
belief
deleted(system,sub_directory)
simon
belief
system
goal
Candidate Planning Actions for system
I have just deleted the subdirectory
Click here for Plan recognition
Perform Action
Belief representation of the first turn of an
exchange.
32
CONVERSE VIEWGEN

for ViewGen
Ballim, A.and Wilks, Y. 1991
Artificial Believers the ascription of belief
Erlbaum, Hillsdale NJ
for CONVERSE and contemporary paradigms (esp. in
industry)
Proceedings of Bellagio Workshops on
Human-Machine Conversation, 97,98,00

33
Meanwhile, the world had moved on and empiricism
had reached dialogue processing

Dialogue act-to-utterance learning using machine
learning over n-grams and preceding dialogue acts
(Cf. Samuels et al. 1998)
Speech act sequence statistics from Verbmobil
(Cf. Maier Reithinger)
Longman BNC dialogue ngrams (50 all domains)
Tagged Switchboard, Verbmobil, BNC and domain
corpora

34
And especially, Steve Young 2002

ASR now a mature statistically based technology
In other areas of dialogue, statistical methods
less mature
A complete dialogue system can be seen as a
Partially Observable Markov process
Subcomponents can be observed in turn with
intermediate variables

35
Youngs statistical modules

Speech understanding
Semantic decoding
Dialogue act detection
Dialogue management and control
Speech generation
I.e. roughly same as everyone elses!!

36
(No Transcript)
37
Strategy not like Jelineks MT strategy of 1989!

Which was non/anti-linguistic with no
intermediate representations hypothesised
Young assumes rougly the same intermediate
objects as we do but in very simplified forms.
The aim to to obtain training data for all of
them so the whole process becomes a single
throughput Markov model.

38
Young concedes this model may only be for simple
domains

His domain is a pizza ordering system
A typical DialogueActSemantics could therefore
be
Purchase_Requestqty2toppingpepperoni

39
Speech Understanding

Classic mapping of waveforms Y to dialogue acts
A, given system beliefs B to seek
Argmax P(AuY,Bs) which expands to
Argmax P(YAu,Bs)P(AuBs)
Then the intermediate wordsequence is introduced,
giving
P(YAu,Bs)? P(Y,WAu,Bs)
w
This is a redescription of what empirical NLP
people had been doing for 5 years but no mention
of them!

40
This breaks into two parts

A classic two stage matching of sounds to a word
lattice (I.e. ASR)
Followed by the mapping of the Dialogue acts to
the words, given the system beliefs (I.e. system
state in general)
the dialogue state is crucial for handling
ambiguity and identifying underspecified dialogue
acts

41
Introduction of concepts as well as DAs

it is convenient to introduce another
intermediate representation.C represents the set
of semantic concepts encoded within W
So P(AuW,Bs)?P(AuC,Bs)P(CW,Bs)
c
I.e.training for concepts given wordsstates
And Dialogue acts given conceptsstates.
Experience suggsts both can be trained directly
on the words!

42
(No Transcript)
43
Semantic decoding

Means adding one concept per main word input
(!) by means of a semantic template grammar
such as
PIZZA-QTYTOPPING pizza(s)
A parse tree is an option if recursive syntactic
structure is required, over simple rule matching,
where concepts label states and concept bigram
probabilities are computed.
Parse tree gives Hidden Understanding Model using
lexical and concept bigrams.
Or learn the simple grammar templates above
directly

44
Dialogue act detection

Range of methods (including MDL) for solving
argmax P(AuC,Bs)
Interesting that this is not thought to depend on
the WORDS (as most people do it)
No reference to linguistic methods for this
(since Samuels et al 98).

45
Dialogue management

This is the one where it is hard to see how he
can get non-trivial data.
Data can be seen as a transition matrix of system
states S against system actions As, filling the
matrix cells with new system states.
Model of training is reinforcement learning
ascribed to Pierracini and then Walker.
No evidence of what such training changes in
practice for a non-trivial system
the typical system S will typically be
intractably large and must be approiximated
Puzzle the users beliefs cannot be directly
observed and must therefore be inferred true,
but.

46
Questions about the Speech program for dialogue

Is this just a description of the empirical NLP
program of work or an attempted reduction like
Jelineks MT without linguists
Pollacks RAAM for (Fodors) syntactic recursion
Can data be found to reduce all of a (fairly
general) DM to a transition space that can be
reward trained?
What is the real effect of training a DM--however
found? Reducing unused paths?
How in principle express changes to planning and
belief spaces?

47
Back to work in NLPLearning to tag for Dialogue
Acts initial work

Reithinger and Klesen (1997), n-gram language
modelling, Verbmobil corpus (75)
Samuels et al.(1998) TBL learning on n-gram DA
cues, Verbmobil corpus (75)
Stolcke et al. (2000) more complex n-gram
(including over DA sequents), more complex
Switchboard corpus (71)

48
Work at Sheffield Starting with a naive
classifier for DA s

Direct predictivity of DA s by n-grams as a
preprocess to any ML algorithm.
Get P(dn) for all 1-4 word n-grams and the DA
set over the Switchboard corpus, and take DA
indicated by n-gram with highest predictivity
(threshold for probability levels)
Do 10-foldcross validation (which lowers scores)
Gives 63 over Switchboard but using only a
fraction of the data Stolcke needed.
Work by Nick Webb (in EC Amities Project)

49
Extending the pretagging with TBL

Gives 72 (Stolckes 71) over the Switchboard
data, but only 3 is due to TBL (rather than the
naive classifier).
Samuiels unable to see what TBL is doing for him.
This is just a base for a range of more complex
ML algorithms (e.g. WEKA).
Figure same as Stolckes without the cross
validation

50
Design of a Dialogue Action Manager at Sheffield

General purpose DAM where domain dependent
features are separated from the control
structure.
The domain dependent features are stored as
Dialogue Action Forms (DAFs) which are similar to
Augmented Transition Networks (ATNs)
The DAFs represent general pupose Dialogue
manoevres as well as application specific
knowledge.
The control mechanism is based on a basic stack
structure where DAFs are pushed and popped during
the course of a user session.

51
The structure of the DAM

The control mechanism together with the DAFs
provide a flexible means for guiding the user
through the system goals (allowing for topic
change and barge in where needed).
User push is given by the ability to suspend
and stack a new DAF at any point (for a topic or
any user maneuver)
System push is given by the prestacked DAFs
correcsponding to what the system wants to show
or elicit.

52
DAM Implementation

The core mechanism for the DAM is a simple
pop-push stack (slightly augmented), onto which
structures are loaded and run.
The structures are Dialogue Action Forms (DAFs),
which are implemented as Augmented Transition
Networks (ATNs).
ATNs are transition networks that can function
at any level from finite state machines to Turing
Machine power.

53
DAM Implementation 2

The stack is pre-loaded with DAFs that satisfy
the overall goal of designing a bathroom in the
COMIC system.
The control structure has a preference for
continuing to run the DAFs on the stack (system
initiative) unless the user input is outside the
scope of the DAF (user initiative). In which
case, the control structure pushes onto the stack
the new DAF that most closely matches the user
input.

54
DAM Implementation 3

DAFs model the individual topics and
conversational manoeuvres in the application
domain..
The stack structure will be preloaded with those
DAFs which are necessary for the COMIC bathroom
design task and the dialogue ends when the
Goodbye DAF is popped.
DAFs and stack interpreters together control the
flow of the dialogue

55
Dialogue Management the stack

Work by Catizone and Setzer (EU COMIC Project)
Contrast with work at CSLI (Stanford) by Peters
and Lemon.

Greeting DAF
Room measurement DAF
Style DAF

Good-bye DAF
56
Dialogue Management the augmented stack

What happens if we match/push onto the stack a
DAF that is already on the stack (further down)?
In this case, we run the DAF and it is then
either satisfied (poppable) or not.
If it is then, on popping, we remove any
(unrun/prestored) copies of the DAF below it on
the stack (slight augmentation here of a simple
stack!)

57
An augmented stack 2

If it is not satisfied (almost certainly because
of context dependent DAFs still to come on stack)
the DAF traps the response and recommends waiting
to discuss that topic and is popped. The copy of
the same DAF lower down the stack is then
accessed and run later at the appropriate place
in the dialogue (this is one of a set of possible
strategies with which we shall experiment).

58
An augmented stack 3

The result we would like with this mechanism is
that a relevant and partly run DAF can be forced
to the top of the stack and all intermediate
partly run DAFs discarded (cf. Groszs notion of
conversational topics being closed and
unreachable for restarting)
.

59
DAF example
60
Learning to segment the dialogue corpora?

Segmenting the corpora we have with a range of
tiling-style algorithms (initially by topic)
To segment it plausibly, hopefully into segments
that correspond to structures for DM (Dialogue
Action Frames in our naming)
Being done on the annotated corpus (i.e. a corpus
word model) and on the corpus annotated by
Information Extraction semantic tags (a semantic
model of the corpus)
Repetitions of DAsemantics template patterns amy
may a MDL packing of the corpus possible as an
alternative segmenting method.

61
Sheffield Dialogue Research Challenges

Will a Dialogue manager raise the DA 75ceiling
top-down?
Multimodal dialogue managers. Are they completely
independent of the modality? Are they really
language independent?
What is a good virtual machine for running a
dialogue engine? Do DAFs provide a robust and
efficient mechanism for doing Dialogue
Management?
Will they offer any interesting discoveries on
stack access to, and discard of, incomplete
topics (cf. Stacks and syntax).

62
More research challenges

Applying machine learning to transcripts so as to
determine the content of dialogue management,
i.e. the scope and content of candidate DAFs.
Can the state set of DAFs and a stack be trained
with reinforcement learning?
Can we add a strong belief/planning component to
this and populate it empirically?
Fusion with QA

63
Why the dialogue task is still hard

Where am I in the conversationwhat is being
talked about now, what do they want?
Does topic stereotopy help or are just
Finite-State pairs enough (VoiceXML? TRINDIKIT?)?
How to gather the beliefs/knowledge required ,
preferably from existing sources?
Are there distinctive procedures for managing
conversations that can be modelled by simple
virtual machines?
How to learn the structures we need--assuming we
do---how to get and annotate the data?

64
How this research is funded

AMITIES is a EU-US cooperative R D project
(the first in NLP) 2001-2005 to automate call
centers.
University of Sheffield (UK prime)
SUNY Albany (US prime)
Duke U. (US)
LIMSI Paris (Fr)
IBM (US)
COMIC is an EU R D project ( 2001-2005) to
model multimodal dialogue
MaxPlanck Inst (Nijmegen) (Coordiantor)
University of Edinburgh
MaxPlanck Inst (Tuebingen)
KUL Nijmegen
University of Sheffield
ViSoft GMBH

65
There are now four not two competing approaches

Logic-based systems with reasoning (old and still
unvalidated by performance)
Extensions of speech engineering methods, machine
learning and simple intermediate structure
Simple handcoded finite state systems in VoiceXML
(Chatbots and commercial systems)
Rational hybrids based on structure and machine
learning.

Write a Comment

User Comments (0)