Humancomputer dialogue processing: where are we - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Humancomputer dialogue processing: where are we

Description:

I think it may all be a publicity stunt for Ellen. PROGRAM[10:20:01]Ellen Degeneres was one of them she was kissing her lover ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 66
Provided by: nick201
Category:

less

Transcript and Presenter's Notes

Title: Humancomputer dialogue processing: where are we


1
Human-computer dialogue processing where are we?
  • Yorick Wilks
  • Oxford Internet Institute
  • and
  • University of Sheffield
  • www.dcs.shef.ac.uk/yorick
  • Tartu, February 2005

2
Main points of talk
  • Data-driven performance systems versus
    AI/linguistic theoretical systems, separate since
    1970s
  • This distinction has mapped, till now, on domain
    versus general systems
  • Both will require modeling of intentional
    phenomena like belief, possibly speech acts,but
    not in the old way.
  • It is unclear whether speech technology alone can
    accomplish this
  • Some Sheffield empirical work in dialogue
    technology
  • The COMPANIONS development environment.

3
Problems of dialogue theory
  • It can easily be vacuous dialogues are systems
    of turn-taking
  • Speech act analysis has led to a rationalist
    tradition implausibly deep levels of reasoning,
    and no data gathering
  • There are apparently competing theories but
    little or no evaluation of dialogue systems
    (compare Information Extraction--Hobbs)

4
Three important historical systems
  • Colbys PARRY (Stanford, 1971)
  • Groszs partitioned network focus system (SRI,
    1979)
  • Perrault, Cohen, Allens speech act system
    (Toronto, 1979)

5
Colbys PARRY
  • Perhaps best ever performance, many users,
    robust, but not a normal subject (I.e. a
    paranoid)
  • primitive individual models, some control of
    dialogue process but it had lots to say!
  • Primitive simulation of intentionality
  • not syntax but fast pattern matching
  • much better than ELIZA

6
PARRY conversation
  • Have you been hospitalized before?
  • THIS IS THE FIRST TIME
  • How long have you been there?
  • ABOUT TWO WEEKS
  • Any headaches?
  • MY HEALTH IS FINE
  • Are you have memory difficulties?

7
More PARRY
  • NO
  • Then can you tell me who is the President of the
    US?
  • NIXON IS PRESIDENT
  • And what day is today?
  • TODAY IS THURSDAY--CAN WE GET ON WITH THE
    INTERVIEW?

8
How is this achieved in PARRY?
  • WHAT IS YOUR MAIN PROBLEM _____________________
    __________
  • WHAT BE YOU MAIN PROBLEM
  • BE YOU MAIN PROBLEM
  • WHAT BE MAIN PROBLEM
  • WHAT BE YOU PROBLEM ?
  • WHAT BE YOU MAIN

9
Groszs robot domain model
  • Primitive notion of focus based on a structure of
    partitioned semantic nets (Hendrix)
  • Unlike PARRY, has knowledge of a complex domain
  • no real performance at all

10
Grosz (IJCAI 1979)
  • (Explicit focus)
  • S1 The lid is attached to the container with
    1/2 bolts.
  • R1 Where are the BOLTS?
  • (Implicit focus)
  • S1 Attach the lid to the container.
  • R1 Where at the BOLTS?

11
Perrault, Cohen, Allen system
  • Based on speech act reasoning
  • user must have one of two goals, meeting or
    catching a train
  • Passenger/User Do you know when the Windsor
    train arrives?
  • This is labelled as a REQUEST not a
    REQUEST-INFORM (Y/N) because the system knows the
    user knows it knows!

12
Perrault et al.
  • System has domain knowledge and reasoning power
  • but virtually no performance
  • was the first to assign speech act labels to
    dialogue items
  • has a simple rigid model of nested belief

13
Fixed nested beliefs passengers view of
systems view of passengers beliefs.
passenger
system
passenger
14
Perrault et al.
  • They had the appropriate notion of extreme
    ellipsis and zero anaphor
  • Passenger The 3.15 train to Windsor?
  • System Gate 10
  • Note NOT It leaves from Gate 10

15
1970s division of approaches to machine
conversation.
  • Domain-dependent systems with coded world
    knowledge and some parsing and reasoning
  • VERSUS
  • Wide shallow systems with little knowledge and
    high performance
  • Published AI academic systems all in first group
  • Only the second group performed at all

16
Moving academic systems towards performance
  • Best is Traums TRAINS system--descendant of
    Allens work (Toronto-Rochester tradition)
  • uses corpora and retains reasoning
  • moved to the movies in California!

17
(No Transcript)
18
TRAINS CORPUSDavid Traum (Rochester)
  • utt1 s hello can I help you
  • utt2 u yeah I want t- I want to determine the
    maximum number of boxcars of oranges by
    seven a.m. tomorrow morning
  • utt3 so hm
  • so I guess all the boxcars will have to go
    through oran- through Corning because
    thats where the orange juice orange
    factory is
  • utt4 so from Corning to Bath how far is
    that
  • utt5 s two hours
  • utt6 u and its gonna take us also an hour to
    load boxcars right
  • utt7 s right
  • utt8 u okay so hm so every
    trip will take at least three hours
    then
  • utt9 um
  • utt10 s right we can unload any amount of
    cargo onto a train in one hour
  • utt11 so we can so a maximum of
    three boxcars in an hour

19
AI people join the performersthe Loebner
Competition
  • Well designed competition over the last ten
    years---essentially a modified Turing test.
  • Domain dependent entries till 1997, then domain
    free.
  • The game is for journalists to separate and rank
    human and machine partners
  • Search Loebner on web for full transcripts

20
Loebner 1994
  • NEW JUDGE
  • 150
    Judge 1
  • Do you think the country is headed generally in a
    more conservative or more liberal direction right
    now?
  • 151
    Terminal
  • Didnt you say would you discribe that old texas
    billionaire as a conservative?? Answer me one
    question why are liberals portrayed as donkees?
    Are you a caring person? Youre not really
    talking about me, are you?
  • 152
    Judge 1
  • Ive got the feeling Ive heard these answers
    before. Is our nation swinging leftward or
    rightward right now?

  • Terminal
  • We are at the beginning of a strong and
    wonderful economic recovery, and we can thank the
    liberals!

21
Loebner 1997
  • PROGRAM101804Hi, my name is Catherine, whats
    yours?
  • JUDGE04101832My name is Elliott.
  • PROGRAM101834Well, its nice to meet you,
    Elliott.
  • PROGRAM101848Did you see that story on CNN
    last night about the lesbian
  • PROGRAM101904couple who came out at a White
    House party on Sunday?
  • JUDGE04101955Yes, I did. I think it may all
    be a publicity stunt for Ellen.
  • PROGRAM102001Ellen Degeneres was one of them
    she was kissing her lover
  • PROGRAM102011in front of Bill Clinton. Do you
    think he was very
  • PROGRAM102025embarrassed?
  • JUDGE04102100I think he want the gay vote. He
    will do anything to get it and keep it.
  • JUDGE04102115He does not have a conservative
    base.
  • PROGRAM102121
  • PROGRAM102121Are you still a Friend of Bill
    Clinton?

22
We won in 1997!
  • The CONVERSE program pretended to be Catherine, a
    34-year old British journalist living in New York
    (to explain any usage problems).
  • She could talk about 80 appropriate topics
  • (80domain free???) clothes, shopping, food,
    abortion, fitness, sex.etc.

23
Moving performance towards theory
  • The Loebner Prize Competition
    www.acm.org/-loebner/loebner-prize.html

24
The CONVERSE prototype 1997
  • Push-me-pull-you architecture
  • strong driving top-down scripts (80) in a
    re-enterable network with complex output
    functions
  • bottom-up parsing of user input adapted from
    statistical prose parser
  • minimal models of individuals
  • contained Wordnet and Collins PNs
  • some learning from past Loebners BNC

25
CONVERSE architecture
26
Restarting Sheffield dialogue work in 2001
  • an empirical corpus-based stochastic dialogue
    grammar that maps directly to dialogue acts and
    uses IE to match concepts with templates.
  • A better virtual machine for script-like objects
    encapsulating both the domain and conversational
    strategy (cf. PARRY and Grosz) to maintain the
    push-pull approach.
  • Need for resources to build belief system
    representations and quasi-linguistic models of
    dialogue structure, scripts etc.
  • A model of speakers, incrementally reaching
    VIEWGEN style ascription of belief procedures to
    give dialogue act reasoning functionality

27
(No Transcript)
28
VIEWGENa belief model that computes agents
states
  • Not a static nested belief structure like that of
    Perrault and Allen.
  • Computes other agents RELEVANT states at time of
    need
  • Topic restricted search for relevant information
  • Can represent and maintain conflicting agent
    attitudes

29
VIEWGEN as a knowledge basis for
reference/anaphora resolutionprocedures
  • Not just pronouns but grounding of descriptive
    phrases in a knowledge basis
  • Reconsider finding the ground of
    that old Texas billionaire as
    Ross Perot, against a background of what the
    hearer may assume the speaker knows when he says
    that.

30
(No Transcript)
31
cause(deleted(x,sub-directory) not(happy(david)))
deleted(system,sub-directory)
system
belief
deleted(system,sub_directory)
simon
belief
system
goal
Candidate Planning Actions for system
I have just deleted the subdirectory
Click here for Plan recognition
Perform Action
Belief representation of the first turn of an
exchange.
32
CONVERSE VIEWGEN
  • for ViewGen
  • Ballim, A.and Wilks, Y. 1991
    Artificial Believers the ascription of belief
    Erlbaum, Hillsdale NJ
  • for CONVERSE and contemporary paradigms (esp. in
    industry)
  • Proceedings of Bellagio Workshops on
    Human-Machine Conversation, 97,98,00

33
Meanwhile, the world had moved on and empiricism
had reached dialogue processing
  • Dialogue act-to-utterance learning using machine
    learning over n-grams and preceding dialogue acts
    (Cf. Samuels et al. 1998)
  • Speech act sequence statistics from Verbmobil
    (Cf. Maier Reithinger)
  • Longman BNC dialogue ngrams (50 all domains)
  • Tagged Switchboard, Verbmobil, BNC and domain
    corpora

34
And especially, Steve Young 2002
  • ASR now a mature statistically based technology
  • In other areas of dialogue, statistical methods
    less mature
  • A complete dialogue system can be seen as a
    Partially Observable Markov process
  • Subcomponents can be observed in turn with
    intermediate variables

35
Youngs statistical modules
  • Speech understanding
  • Semantic decoding
  • Dialogue act detection
  • Dialogue management and control
  • Speech generation
  • I.e. roughly same as everyone elses!!

36
(No Transcript)
37
Strategy not like Jelineks MT strategy of 1989!
  • Which was non/anti-linguistic with no
    intermediate representations hypothesised
  • Young assumes rougly the same intermediate
    objects as we do but in very simplified forms.
  • The aim to to obtain training data for all of
    them so the whole process becomes a single
    throughput Markov model.

38
Young concedes this model may only be for simple
domains
  • His domain is a pizza ordering system
  • A typical DialogueActSemantics could therefore
    be
  • Purchase_Requestqty2toppingpepperoni

39
Speech Understanding
  • Classic mapping of waveforms Y to dialogue acts
    A, given system beliefs B to seek
  • Argmax P(AuY,Bs) which expands to
  • Argmax P(YAu,Bs)P(AuBs)
  • Then the intermediate wordsequence is introduced,
    giving
  • P(YAu,Bs)? P(Y,WAu,Bs)
  • w
  • This is a redescription of what empirical NLP
    people had been doing for 5 years but no mention
    of them!

40
This breaks into two parts
  • A classic two stage matching of sounds to a word
    lattice (I.e. ASR)
  • Followed by the mapping of the Dialogue acts to
    the words, given the system beliefs (I.e. system
    state in general)
  • the dialogue state is crucial for handling
    ambiguity and identifying underspecified dialogue
    acts

41
Introduction of concepts as well as DAs
  •  it is convenient to introduce another
    intermediate representation.C represents the set
    of semantic concepts encoded within W 
  • So P(AuW,Bs)?P(AuC,Bs)P(CW,Bs)
  • c
  • I.e.training for concepts given wordsstates
  • And Dialogue acts given conceptsstates.
  • Experience suggsts both can be trained directly
    on the words!

42
(No Transcript)
43
Semantic decoding
  • Means adding one  concept  per main word input
    (!) by means of a  semantic template grammar 
    such as
  • PIZZA-QTYTOPPING pizza(s)
  • A parse tree is an option if recursive syntactic
    structure is required, over simple rule matching,
    where concepts label states and concept bigram
    probabilities are computed.
  • Parse tree gives Hidden Understanding Model using
    lexical and concept bigrams.
  • Or learn the simple grammar templates above
    directly

44
Dialogue act detection
  • Range of methods (including MDL) for solving
    argmax P(AuC,Bs)
  • Interesting that this is not thought to depend on
    the WORDS (as most people do it)
  • No reference to linguistic methods for this
    (since Samuels et al 98).

45
Dialogue management
  • This is the one where it is hard to see how he
    can get non-trivial data.
  • Data can be seen as a transition matrix of system
    states S against system actions As, filling the
    matrix cells with new system states.
  • Model of training is reinforcement learning
    ascribed to Pierracini and then Walker.
  • No evidence of what such training changes in
    practice for a non-trivial system
  •  the typical system S will typically be
    intractably large and must be approiximated 
  • Puzzle  the users beliefs cannot be directly
    observed and must therefore be inferred  true,
    but.

46
Questions about the Speech program for dialogue
  • Is this just a description of the empirical NLP
    program of work or an attempted reduction like
  • Jelineks MT without linguists
  • Pollacks RAAM for (Fodors) syntactic recursion
  • Can data be found to reduce all of a (fairly
    general) DM to a transition space that can be
    reward trained?
  • What is the real effect of training a DM--however
    found? Reducing unused paths?
  • How in principle express changes to planning and
    belief spaces?

47
Back to work in NLPLearning to tag for Dialogue
Acts initial work
  • Reithinger and Klesen (1997), n-gram language
    modelling, Verbmobil corpus (75)
  • Samuels et al.(1998) TBL learning on n-gram DA
    cues, Verbmobil corpus (75)
  • Stolcke et al. (2000) more complex n-gram
    (including over DA sequents), more complex
    Switchboard corpus (71)

48
Work at Sheffield Starting with a naive
classifier for DA s
  • Direct predictivity of DA s by n-grams as a
    preprocess to any ML algorithm.
  • Get P(dn) for all 1-4 word n-grams and the DA
    set over the Switchboard corpus, and take DA
    indicated by n-gram with highest predictivity
    (threshold for probability levels)
  • Do 10-foldcross validation (which lowers scores)
  • Gives 63 over Switchboard but using only a
    fraction of the data Stolcke needed.
  • Work by Nick Webb (in EC Amities Project)

49
Extending the pretagging with TBL
  • Gives 72 (Stolckes 71) over the Switchboard
    data, but only 3 is due to TBL (rather than the
    naive classifier).
  • Samuiels unable to see what TBL is doing for him.
  • This is just a base for a range of more complex
    ML algorithms (e.g. WEKA).
  • Figure same as Stolckes without the cross
    validation

50
Design of a Dialogue Action Manager at Sheffield
  • General purpose DAM where domain dependent
    features are separated from the control
    structure.
  • The domain dependent features are stored as
    Dialogue Action Forms (DAFs) which are similar to
    Augmented Transition Networks (ATNs)
  • The DAFs represent general pupose Dialogue
    manoevres as well as application specific
    knowledge.
  • The control mechanism is based on a basic stack
    structure where DAFs are pushed and popped during
    the course of a user session.

51
The structure of the DAM
  • The control mechanism together with the DAFs
    provide a flexible means for guiding the user
    through the system goals (allowing for topic
    change and barge in where needed).
  • User push is given by the ability to suspend
    and stack a new DAF at any point (for a topic or
    any user maneuver)
  • System push is given by the prestacked DAFs
    correcsponding to what the system wants to show
    or elicit.

52
DAM Implementation
  • The core mechanism for the DAM is a simple
    pop-push stack (slightly augmented), onto which
    structures are loaded and run.
  • The structures are Dialogue Action Forms (DAFs),
    which are implemented as Augmented Transition
    Networks (ATNs).
  • ATNs are transition networks that can function
    at any level from finite state machines to Turing
    Machine power.

53
DAM Implementation 2
  • The stack is pre-loaded with DAFs that satisfy
    the overall goal of designing a bathroom in the
    COMIC system.
  • The control structure has a preference for
    continuing to run the DAFs on the stack (system
    initiative) unless the user input is outside the
    scope of the DAF (user initiative). In which
    case, the control structure pushes onto the stack
    the new DAF that most closely matches the user
    input.

54
DAM Implementation 3
  • DAFs model the individual topics and
    conversational manoeuvres in the application
    domain..
  • The stack structure will be preloaded with those
    DAFs which are necessary for the COMIC bathroom
    design task and the dialogue ends when the
    Goodbye DAF is popped.
  • DAFs and stack interpreters together control the
    flow of the dialogue

55
Dialogue Management the stack
  • Work by Catizone and Setzer (EU COMIC Project)
  • Contrast with work at CSLI (Stanford) by Peters
    and Lemon.

Greeting DAF
Room measurement DAF
Style DAF

Good-bye DAF
56
Dialogue Management the augmented stack
  • What happens if we match/push onto the stack a
    DAF that is already on the stack (further down)?
  • In this case, we run the DAF and it is then
    either satisfied (poppable) or not.
  • If it is then, on popping, we remove any
    (unrun/prestored) copies of the DAF below it on
    the stack (slight augmentation here of a simple
    stack!)

57
An augmented stack 2
  • If it is not satisfied (almost certainly because
    of context dependent DAFs still to come on stack)
    the DAF traps the response and recommends waiting
    to discuss that topic and is popped. The copy of
    the same DAF lower down the stack is then
    accessed and run later at the appropriate place
    in the dialogue (this is one of a set of possible
    strategies with which we shall experiment).

58
An augmented stack 3
  • The result we would like with this mechanism is
    that a relevant and partly run DAF can be forced
    to the top of the stack and all intermediate
    partly run DAFs discarded (cf. Groszs notion of
    conversational topics being closed and
    unreachable for restarting)
  • .

59
DAF example
60
Learning to segment the dialogue corpora?
  • Segmenting the corpora we have with a range of
    tiling-style algorithms (initially by topic)
  • To segment it plausibly, hopefully into segments
    that correspond to structures for DM (Dialogue
    Action Frames in our naming)
  • Being done on the annotated corpus (i.e. a corpus
    word model) and on the corpus annotated by
    Information Extraction semantic tags (a semantic
    model of the corpus)
  • Repetitions of DAsemantics template patterns amy
    may a MDL packing of the corpus possible as an
    alternative segmenting method.

61
Sheffield Dialogue Research Challenges
  • Will a Dialogue manager raise the DA 75ceiling
    top-down?
  • Multimodal dialogue managers. Are they completely
    independent of the modality? Are they really
    language independent?
  • What is a good virtual machine for running a
    dialogue engine? Do DAFs provide a robust and
    efficient mechanism for doing Dialogue
    Management?
  • Will they offer any interesting discoveries on
    stack access to, and discard of, incomplete
    topics (cf. Stacks and syntax).

62
More research challenges
  • Applying machine learning to transcripts so as to
    determine the content of dialogue management,
    i.e. the scope and content of candidate DAFs.
  • Can the state set of DAFs and a stack be trained
    with reinforcement learning?
  • Can we add a strong belief/planning component to
    this and populate it empirically?
  • Fusion with QA

63
Why the dialogue task is still hard
  •  Where am I  in the conversationwhat is being
    talked about now, what do they want?
  • Does topic stereotopy help or are just
    Finite-State pairs enough (VoiceXML? TRINDIKIT?)?
  • How to gather the beliefs/knowledge required ,
    preferably from existing sources?
  • Are there distinctive procedures for managing
    conversations that can be modelled by simple
    virtual machines?
  • How to learn the structures we need--assuming we
    do---how to get and annotate the data?

64
How this research is funded
  • AMITIES is a EU-US cooperative R D project
    (the first in NLP) 2001-2005 to automate call
    centers.
  • University of Sheffield (UK prime)
  • SUNY Albany (US prime)
  • Duke U. (US)
  • LIMSI Paris (Fr)
  • IBM (US)
  • COMIC is an EU R D project ( 2001-2005) to
    model multimodal dialogue
  • MaxPlanck Inst (Nijmegen) (Coordiantor)
  • University of Edinburgh
  • MaxPlanck Inst (Tuebingen)
  • KUL Nijmegen
  • University of Sheffield
  • ViSoft GMBH

65
There are now four not two competing approaches
  • Logic-based systems with reasoning (old and still
    unvalidated by performance)
  • Extensions of speech engineering methods, machine
    learning and simple intermediate structure
  • Simple handcoded finite state systems in VoiceXML
    (Chatbots and commercial systems)
  • Rational hybrids based on structure and machine
    learning.
Write a Comment
User Comments (0)
About PowerShow.com