Title: Share and Share Alike: Resources for Language Generation
1Share and Share Alike Resources for Language
Generation
- Prof. Marilyn Walker
- University of Sheffield NSF- 20 April 2007
2What type of resource is needed for generation?
- What type of scientific problem is generation?
- An essential difference between language
generation and language interpretation problems
(parsing, WSD, relation extraction, coreference)
is that there is no single right answer for
language generation - Language Productivity Assumption An optimal
generation resource will represent multiple
outputs for each input, with a human-generated
quality metric associated with each output
3Dialogue vs. generation?
- Dialogue is like generation in that there is no
single right answer for how to do a task in
dialogue - Information gathering and information
presentation in dialogue systems are generation
problems - DARPA evaluation for dialogue systems
- Fixed domain TRAVEL PLANNING
- First ATIS evaluations compared dialogue system
behaviour against human behaviour in corpus of
human-wizard dialogues (Hirschman 2000) - No mixed initiative, different dialogue
strategies, divergence of context, user modeling
4Dialogue vs. generation?
- Second define context, evaluate on system
response to user utterance in a particular
context - Much more like generation, context is defined,
system communicative goal is defined - Form How is the same response defined? Some
forms for identical content may be better than
others - Content User Models, definitions of context.
Also dialogue system should be able to decide on
communicative goal.
5Dialogue vs. generation?
- Third Communicator evaluation given user task
(NYC to LHR, Continental, April 22nd, 2007),
collect metrics (time to completion, ASR error,
utterance output quality, concept understanding,
user satisfaction) - Corpus semi-automatically labelled with dialogue
act (quality/strategy metrics) for system
utterances (8 or more different instantiations
from different systems for particular
communicative goals) - Try to understand which metrics are contributors
to user satisfaction (PARADISE) - User utterance labelled subsequently, used in RL
experiments comparing dialogue strategies - Hard to compare particular scientific techniques
for particular modules in systems, plug and play
never worked
6Dialogue vs. generation Conclusions?
- Just having a fixed task (TRAVEL) by itself does
not necessarily lead to scientific progress - Want to compare particular scientific techniques
for particular modules in systems - Plug and play is the only way to do this
- BUT very hard to define for a whole community
what interfaces between modules should be
7Position
- What type of resources would be useful for
scientific advancement in language generation?? - Almost anything!!
- If you build it they will come - If its useful
people will use it - Can we leverage what we already have in our own
research groups, share it, and make it better?
8What is needed to incentivize data sharing
- Many different domains/problems/modules gt NEED
LOTS OF DIFFERENT RESOURCES - Resources costly (developing group not finished
yet) gt FINANCIAL INCENTIVE SCIENTIFIC
INCENTIVE CITATION INCENTIVE - Costs too much to support resource preparation,
maintenance, distribution and re-use gt NSF/LDC
FINANCIAL/SUPPORT - NOTE MANY LDC RESOURCES ARE FOUND DATA (not
explicitly commissioned)
9A proposal for one shared resource
10Information presentation of one or more database
entities
- Natural Language Interfaces/SDS (McKeown85,
McCoy89, Cooperative Response literature,
CareniniMoore01, Polifroni etal 03, COGENTEX w/
active buyers website, Walkeretal04,DembergMoore0
6, etc) - Different communicative goals Summarize,
Recommend, Compare, Describe (DB entities) - Representation not controversial (attributes and
values for DB entities, relations between entity
and attribute) - Application not dependent on NLU
11What type of resource is needed for generation?
- What type of scientific problem is generation?
- An essential difference between language
generation and language interpretation problems
(parsing, WSD, relation extraction, coreference)
is that there is no single right answer for
language generation - Language Productivity Assumption An optimal
generation resource will represent multiple
outputs for each input, with a human-generated
quality metric associated with each output
12We could make available a resource of
- INPUT-1 Speech ACT, SET of DB Entities
- SUMMARIZE(SET) DESCRIBE(ENTITY),
RECOMMEND(ENTITY,SET), COMPARE(SET) - INPUT-2 user model, discourse/dialogue context,
style parameters, etc. - OUTPUT-1 a set of alternative outputs possibly
with TTS markup - OUTPUT-2 human generated ratings or rankings for
the outputs oriented to the criteria specified by
INPUT-2
13A Content Plan for a Recommend
- strategy recommend
- relations justify(nuc1 sat2)
- justify(nuc1 sat3)
- justify(nuc1, sat4)
- content 1. assert(best (Babbo))
- 2. assert(has-att (Babbo,
foodquality(superb))) - 3. assert(has-att (Babbo,
decor(excellent))) - 4. assert(has-att (Babbo,
service(excellent)))
14Human Feedback for Ranking
- The ratings can represent any metric associated
with the possible response, e.g. coherence,
information quality, social appropriateness,
personality. - Informational Coherence
- SPARKY, a generator for MATCH
- SPOT, a generator for ATT COMMUNICATOR
- Users are shown response variants then told
- For each variant, please rate to what extent you
agree with this statement. - The utterance is easy to understand, well-formed
and appropriate to the dialogue context.
15Examples Learned Rules applied to test fold
16Individual Differences (Sentence Planning
Preferences)
17Human Feedback for Ranking (2)
- Ten Item Personality Inventory Questionnaire,
(Gosling 2003) - PERSONAGE
- Users are shown response variants then told
- For each variant, rate on a scale of 1 to 7
whether - The speaker is quiet, reserved
- The speaker is enthusiastic
18Personality judgments Recommend Le Marais
19What else is out there?
- Coconut corpus referring expression generation,
but add alternatives and ratings? - Boston directions corpus (NSF funded early 1990s)
- Communicator corpus (8 different system outputs
for dialogue contexts that can be characterized) - Tools Halogen, Penman, FUF-SURGE, RealPro
- Library of text plans, content plans, sentence
planners?