Title: Artificial Companions: Explorations in machine personality and dialogue
1Artificial Companions Explorations in machine
personality and dialogue
- Yorick Wilks
- Computer Science, University of Sheffield
- and
- Oxford Internet Institute
- MLMI04, Martigny CH, June 2004
2What the talk contains
- Two natural language technologies I work within
- Human dialogue modelling
- Information extraction from the web
- What drives NLP dialogue models ML, speech?
- Conversational agents as essential for
- personalizing the web
- making it tractable
- Companions for the non-technical as a cosier kind
of persistent agent - For niche groups, some of them non-technical, or
handicapped - As an interface to the web
- An an interface to their stored lives
3Machine dialogue problems with available theory
- Dialogue the Cinderella of NLP
- It can be vacuous dialogues are systems of
turn-taking - Speech act analysis initially has led to
implausibly deep levels of reasoning--you dont
need plans to sell an airticket. - For some researchers, dialogue theory is still a
question of how best to deploy logic - Much conversation is not task orientated at all,
nor does it have plausible info-states.
4Important historical systems have all the modern
traits and functionalities in miniature
- Colbys PARRY (Stanford, 1971)
- Winograds SHRDLU (MIT 1971)
- Perrault, Cohen, Allens speech act system
(Toronto, 1979)
5Colbys PARRY
- Perhaps best ever performance, many users,
robust, but not a normal subject (I.e. a
paranoid) - primitive individual models, some control of
dialogue process but it had lots to say! - Primitive simulation of intentionality, and
emotion in output choice - not syntax analysis but fast pattern matching
- Far, far, better than ELIZA
6PARRY conversation
- Have you been hospitalized before?
- THIS IS THE FIRST TIME
- How long have you been there?
- ABOUT TWO WEEKS
- Any headaches?
- MY HEALTH IS FINE
- Are you have memory difficulties?
- JUST A FEW
7Does the little block that the hatched pyramids
support supports support anything black?
8Perrault, Cohen Allen at Toronto
- System has domain knowledge and reasoning power
- was the first to assign speech act labels to
dialogue items - User must have one of two goals, meeting or
catching a train - Passenger/User Do you know when the Windsor
train arrives? - This is labelled as a REQUEST not a
REQUEST-INFORM (Y/N) because the system knows the
user knows it knows! - but again virtually no performance
91970s division of approaches to machine
conversation.
- Domain-dependent systems with coded world
knowledge and some parsing and reasoning - VERSUS
- Wide shallow systems with little knowledge and
high performance - Published AI academic systems all in first group
- Only the second group (I.e. PARRY) performed at
all
10Academic systems have moved towards performance
- Best is Traums TRAINS system--descendant of
Allens work (Toronto-Rochester tradition) - Semi-empiricist uses corpora but retains
reasoning - Also TRINDIKIT at Gothenburg/Edinburgh uses a
large rule base - Pressure from ASR advances
- Pressure from the Loebner competition to perform?
11AI people joined the performersthe Loebner
Competition
- Well designed competition over the last ten
years---essentially a modified Turing test. - Domain dependent entries till 1998, then domain
free. - The game is for journalists to separate and rank
human and machine partners - Search Loebner on web for full transcripts
12Sheffield University/Intelligent Systems won in
1998!
- The CONVERSE program pretended to be Catherine, a
34-year old British journalist living in New York
(to explain any usage problems). - She could talk about 80 appropriate topics
- (80domain free???) clothes, shopping, food,
abortion, fitness, sex.etc.
13Loebner Competition 1998---Sheffields winning
entry!
- PROGRAM101804Hi, my name is Catherine, whats
yours? - JUDGE04101832My name is Elliott.
- PROGRAM101834Well, its nice to meet you,
Elliott. - PROGRAM101848Did you see that story on CNN
last night about the lesbian - PROGRAM101904couple who came out at a White
House party on Sunday? - JUDGE04101955Yes, I did. I think it may all
be a publicity stunt for Ellen. - PROGRAM102001Ellen Degeneres was one of them
she was kissing her lover - PROGRAM102011in front of Bill Clinton. Do you
think he was very - PROGRAM102025embarrassed?
- JUDGE04102100I think he want the gay vote. He
will do anything to get it and keep it. - JUDGE04102115He does not have a conservative
base. - PROGRAM102121
- PROGRAM102121Are you still a Friend of Bill
Clinton?
14The CONVERSE prototype 1998
- Push-me-pull-you architecture
- strong driving top-down scripts (80) in a
re-enterable network with complex output
functions - bottom-up parsing of user input adapted from
statistical prose parser - minimal models of individuals
- contained Wordnet and Collins Proper Names
- some machine learning from past Loebners BNC
- It owed more to PARRY than to Toronto!
15Sheffield dialogue circa 2002
- Empirical corpus-based stochastic dialogue
grammar that maps utterances directly to
dialogue acts and uses IE to match concepts with
templates to provide semantic content. - A better virtual machine for script-like (DAF)
objects encapsulating both the domain moves and
conversational strategy (cf. PARRY and Grosz) to
maintain the push-pull (alias mixed-initative)
approach. - The Dialogue Action Frames provide domain
context, and the stack topic change and reaccess
to partially fulfilled DAFs
16Resources vs. highest level structure
- Need for resources to build belief system
representations and quasi-linguistic models of
dialogue structure, scripts etc., and to provide
a base for learning optimal Dialogue Act
assignments - A model of speakers, incrementally reaching
VIEWGEN style ascription of belief procedures to
give dialogue act reasoning functionality - Cf A. ballim Y. Wilks, 1991 Artificial
Believers, Erlbaum.
17How this research is funded
- AMITIES is a EU-US cooperative R D project
(2001-2005) to automate call centers. - University of Sheffield (EU prime)
- SUNY Albany (US prime)
- Duke U. (US)
- LIMSI Paris (Fr)
- IBM (US)
- COMIC is an EU R D project ( 2001-2005) to
model multimodal dialogue - MaxPlanck Inst (Nijmegen) (Coordiantor)
- University of Edinburgh
- MaxPlanck Inst (Tuebingen)
- KUL Nijmegen
- University of Sheffield
- ViSoft GMBH
18COMIC
- Three-year project
- Focussed on Multi Modal Dialogue
- Speech and pen input/output
- Bathroom Design Application
- Helps the customer to make bathroom design
decisions - Will be based on existing bathroom design
software - Spoken output is done with a talking head which
includes facial expressions etc.
19Design of a Dialogue Action Manager
- General purpose DAM where domain dependent
features are separated from the control
structure. - The domain dependent features are stored as
Dialogue Action Frames (DAFs) which are similar
to Augmented Transition Networks (ATNs) - The DAFs represent general purpose Dialogue
manoeuvres as well as application specific
knowledge. - The control mechanism is based on a basic stack
structure where DAFs are pushed and popped during
the course of a user session. - The control mechanism together with the DAFs
provide a flexible means for guiding the user
through the system goals (allowing for topic
change and barge-in where needed). - User push is given by the ability to suspend
and stack a new DAF at any point (for a topic or
any user maneuver) - System push is given by the pre-stacked DAFs
corresponding to what the system wants to show or
elicit. - Research question of how much of the stacks
unpopped DAFs can/should be reaccessed (cf. Grosz
limits on reachability).
20Dialogue Management
- DAFs model the individual topics and
conversational manoeuvres in the application
domain.. - The stack structure will be preloaded with those
DAFs which are necessary for the COMIC bathroom
design task and the dialogue ends when the
Goodbye DAF is popped. - DAFs and stack interpreters together control the
flow of the dialogue
Greeting DAF
Room measurement DAF
Style DAF
Good-bye DAF
21DAF example
22Current work Learning to segment the dialogue
corpora
- Segmenting the corpora we have with a range of
tiling-style and MDL algorithms ( by topic and by
strategic maneuver) - To segment it plausibly, hopefully into segments
that correspond to structures for DM (I.e.
Dialogue Action Frames) - Being done on the annotated corpus (i.e. a corpus
word model) and on the corpus annotated by
Information Extraction semantic tags (a semantic
model of the corpus)
23AMITIÉS Objectives
- Call Center/Customer Access Automation
- multilingual access to customer information and
services. - Now speech over the telephone (call centers)
- Later speech, text and pointing over the
Internet (e-service) - Multilingual natural language dialogue
- unscripted, spontaneous conversation
- models derived from real call center data
- tested and verified in real call center
environment - Showcase applications at real call centers
- financial services centers (English, French,
German) - expand into public service gov. applications
(US EC)
24Corpora
- GE Financial call centres
- 1k English calls (transcribed, annotated)
- 1k French calls (transcribed, annotated)
- IBM software support call centre
- 5k English calls (transcribed)
- 5k French calls (transcribed)
- AGF insurance claim call centre
- 5k French calls (recording)
- VIEL et CIE
- 100 French calls (transcribed, annotated)
25AMITIÉS System
- Data driven dialogue strategy
- Similar to Colorados communicator system
- Statistical a dialogue transition graph is
derived from a large body of transcribed,
annotated conversations - Task and ID identification
- Task identification automatically trained
vector-based approach (Chu-Carroll Carpenter
1999)
26Sheffield does the post ASR fusion in AMITIES
- Language Understanding
- Use of ANNIE IE for robust extraction
- Partial matching (creates list of possible
entities) - Dialogue Act Classifier
- Recognise domain-independent dialogue acts
- Works well (86 accuracy) for subset of Dialogue
Act labels
27Evaluation
- 10 native speakers of English
- Each made 9 calls to the system, following
scenarios they were given - Overall call success was 70
- Compare this to communicator scores of 56
- Similar number of concepts/scenario (9)
- Word Error Rates
- 17 for successful calls
- 22 for failed calls
28Evaluation Interesting Numbers
- Avg. num turns/dialogue 18.28
- Avg. num words/user turn 6.89
- High in comparison to communicator scores,
reflecting - Lengthier response to open prompts
- Responses to requests for multiple attributes
- Greater user initiative
- Avg. user satisfaction scores 20.45
- (range 5-25)
29Learning to tag for Dialogue Acts initial work
- Samuels et al.(1998) TBL learning on n-gram DA
cues, Verbmobil corpus (75) - Stolcke et al. (2000) full language modelling
(including DA sequences), more complex
Switchboard corpus (71)
30Starting with a naive classifier for DAs
- Direct predictivity of DAs by n-grams as a
preprocess to any ML algorithm. - Get P(dn) for all 1-4 word n-grams and the DA
set over the Switchboard corpus, and take DA
indicated by n-gram with highest predictivity
(threshold for probability levels) - Do 10-fold cross validation (which lowers scores)
- Gives a best cross validated score of around 63
over Switchboard but using only some of the data
Stolcke needed. - Single highest score currently 71.2 - higher
than that reported in Stolke - Up to 86 wiuth small (6) DA set
31Extending the pretagging with TBL
- Gives 66 (Stolckes 71) over the Switchboard
data, but only 3 is due to TBL (rather than the
naive classifier). - Samuels unable to see what TBL is doing for him.
- This is just a base for a range of more complex
ML algorithms (e.g. WEKA).
32Dialogue Research Challenges
- Will a Dialogue manager raise the DA 75/85
ceiling top-down? - Multimodal dialogue managers. Are they completely
independent of the modality? Are they really
language independent? - What is the best virtual machine for running a
dialogue engine? Do DAFsstack provide a robust
and efficient mechanism for doing Dialogue
Management e.g. topic change? (vs. simple rule
systems) - Will they offer any interesting discoveries on
stack access to, and discarding, incomplete
topics (cf. Stacks and syntax). - Applying machine learning to transcripts so as to
determine the content of dialogue management,
i.e. the scope and content of candidate DAFs. - Can the state set of DAFs and a stack be trained
with reinforcement learning (like a Finite State
matrix)? - Can we add a strong belief/planning component to
this and populate it empirically? - Fusion with QA functionality
33What is the most structure that might be needed
and how much of it can be learned?
- Steve Young (Cambridge) says learn all modules
and no need for rich a priori structures (cf MT
history and Jelinek at IBM) - Availability of data (dialogue is unlike MT)?
- Learning to partition the data into structures.
- Learing the semantic speech act interpretation
of inputs alone has now reached a (low) ceiling
(75/85).
34Youngs strategy not quite like Jelineks MT
strategy of 1989!
- Which was non/anti-linguistic with no
intermediate representations hypothesised - Young assumes rougly the same intermediate
objects as we do but in very simplified forms. - The aim to to obtain training data for all of
them so the whole process becomes a single
Partially Observable Markov model. - It remains unclear how to train complex state
models that may not represent tasks, let alone
belief and intention models.
35There are now four not two competing approaches
to machine dialogue in NLP
- Logic-based systems with reasoning (traditional
and still unvalidated by performance) - Extensions of speech engineering methods, machine
learning and no structure (new) - Simple handcoded finite state systems in VoiceXML
(Chatbots and commercial systems) - Rational hybrids based on structure and machine
learning.
36Modes of dialogue with machine agents
- Current mode of phone/multimodal interactions at
terminals. - The internet (possibly becoming the semantic
web) will be for machine agents that understand
its content, and with which users dialogue e.g
Find me the best camera under 500. - Interaction with mobile phone agents (more or
less monomodal) - Some or all of these services as part of function
of persistent, more personal, cosy, lifelong
Companion agents.
37The Companions a new economic and social goal
for dialogue systems
38An idea for integrating the dialogue research
agenda in a new style of application...
- That meets social and economic needs
- That is not simply a product but everyone will
want one if it succeeds - That cannot be done now but could in a few years
by a series of staged prototypes - That modularises easily for large project
management, and whose modules cover the research
issues. - Whose speech and language technology components
are now basically available
39A series of intelligent and sociable COMPANIONS
- The SeniorCompanion
- The EU will have more and more old people who
find technological life hard to handle, but will
have access to funds - The SC will sit beside you on the sofa but be
easy to carry about--like a furry handbag--not a
robot - It will explain the plots of TV programs and help
choose them for you - It will know you and what you like and dont
- It wills send your messages, make calls and
summon emergency help - It will debrief your life.
40(No Transcript)
41Other COMPANIONS
- The JuniorCompanion
- Teaches and advises, maybe from a backpack
- Warns of dangerous situations
- Helps with homework and web search
- Helps with languages
- Always knows where the child is
- Explains ambient signals and information
- Its what e-learning might really mean!
42(No Transcript)
43The Senior Companion is a major technical and
social challenge
- It could represent old people as their agents and
help in difficult situations e.g. with landlords,
or guess when to summon human assistance - It could debrief an elderly user about events
and memories in their lives - It could aid them to organise their life-memories
(this is now hard!)(see Lifelog and Memories for
Life) - It would be a repository for relatives later
- Has Loebner chat aspects as well as
information--it is to divert, like a pet, not
just inform - It is a persistent and personal social agent
interfacing with Semantic Web agents
44Other issues for Companions we can hardly begin
to formulate
- Companion identity as an issue that can be
settled many ways--- - like that of the owners web identity---- now a
hot issue? - Responsibilities of Companion agents--who to?
- Communications between agents and our access to
them - Are simulations of emotional behaviour or
politeness desirable in a Companion? - Protection of the vulnerable (young and old here)
- What happens to your Companion when you are gone?
45Companions and the Web
- A new kind of agent as the answer to a passive
web - The web/internet must become more personal to be
tractable, as it gets bigger (and more structured
or unstructured?) - Personal agents will need to be autonomous and
trusted (like space craft on missions) - But also personal and persistent, particularly
for large sections of populations now largely
excluded from the web. - The semantic web is a start to structure the web
for comprehension and activity, but web agents
are currently abstract and transitory. - The old are a good group to start with (growing
and with funds).
46The technologies for a Companion are all there
already
- ASR for a single user (but may be dysarthric)
- Ascribing personality? remember Tamagochi?
- Quite intelligent people rushed home to feed one
(and later Furby) even though they knew it was a
simple empty mechnaism. - And Tamaogochi could not even talk!
- People with pets live longer.
- Wouldnt you like a warm pet to remind you what
happened in the last episode of your favourite TV
soap? - No, OK, but perhaps millions of your compatriots
would?!
47This isnt just about furry talking handbags on
sofas, but any persistent and personalised entity
that will interface to information sources
phones above all, and for dealing with the web in
a more personal manner. ..claim the internet
is killing their trade because customersseem to
prefer an electronic serf with limitless memory
and no conversation. (Guardian 8.11.03)
48Conclusions
- Companions are a plausible binding concept for
exploring and evaluating a richer concept of
human-machine interaction (useful too!!) - Interactions beyond simple task-driven dialogues.
- That require more interesting theories
underpinning them, even ones we cannot
immediately see how to reinforce/learn. - Interactions with persistent personality, affect,
emotion, interesting beliefs and goals - Above all, we need a more sophisticated and
generally accepted evaluation regime