Title: Microplanning Sentence planning Part 1
1Microplanning(Sentence planning)Part 1
2Natural Language Generation
- Taking some computer-readable gibberish
- Translating it into proper English
- Applications include
- dialogue/chat systems
- on-line help
- summarisation,
- document authoring
3NLG Tasks (as explained by Anja)
- Content determination decide what to say
construct set of messages - Discourse planning ordering, structuring
concepts rhetorical relationships - Sentence aggregation divide content into
sentences construct sentence plans - Lexicalisation map concepts and relations to
lexemes ( words) - Referring expression generation decide how to
refer to objects - Linguistic realisation put it all together in
acceptable words and sentences
4Modular structure of NLG systems (in theory!)
Content determination
TEXT PLANNER
Discourse planning
Sentence aggregation
SENTENCE PLANNER/ MICROPLANNER
Lexicalisation
Referring expressions
REALISER
Realisation
5Last week Input to realisation
- message-id msg02
- relation C_DEPARTURE
- departing-entity C_CALEDON-EXPRESS
- args departure-location C_ABERDEEN
- departure-time C_1000
- departure-platform C_7
6Microplanning 1Aggregation
- Distributing information over different
sentences. Example - a. The Caledonian express departs Aberdeen at
1000, from platform 7 - b. The Caledonian express departs Aberdeen at
1000. The Caledonia express departs from
platform 7
7Microplanning 2 GRE
- GRE Generation of Referring Expressions
- Explaining which objects youre talking about
-
- a. The Caledonian express departs Aberdeen at
1000, from platform 7 - b. The Caledonian express departs -- at
1000. The train departs from this platform
8Microplanning 3 lexical choice
- Using different words for the same concept
- a. The Caledonian express departs Aberdeen at
ten oclock, from platform 7 - b. The Caledonian express departs Aberdeen
at ten. The Caledonia express leaves from
platform 7
9In practice tasks can be performed in different
order
- Example aggregation can be performedon
messages
10- message-id msg02
- relation C_DEPARTURE_1
- departing-entity C_CALEDON-EXPRESS
- args departure-location C_ABERDEEN
- departure-time C_1000
-
message-id msg03 relation C_DEPARTURE_2 args
departure-entity C_CALEDON-EXPRESS
departure-platform C_7
11- Aggregation can also be performed after
realisation - The Caledonian express departs Aberdeen
at 1000 from platform 7 - gt
- The Caledonian express departs Aberdeen
at 1000. The Caledonia express departs
from platform 7
12Lets focus on GRE, but ...
- A little detour NLG systems do not always work
as youve been told - Some practically deployed systems combine canned
text with NLG - One possibility system has a library of language
templates, with gaps that need to be filled.
E.g.,
13- TRAIN departs TOWN at TIME
- TRAIN departs TOWN from PLATFORM
- We apologise for the fact that TRAIN is
delayed by AMOUNT -
- Question which of the other tasks are still
relevant?
14Lets move on to GRE
15- The referent has a familiar name, but its not
unique, e.g., John Smith - The referent has no familiar name trains,
furniture, trees, atomic particles, -
- ( Databases use keys, e.g.,
- Smith73527, TRAIN-3821 )
- 3. Similar sets of objects
- 4. NL is too economical to have namesfor
everything
16Last week Input to realisation
- message-id msg02
- relation C_DEPARTURE
- departing-entity C_CALEDON-EXPRESS
- args departure-location C_ABERDEEN
- departure-time C_1000
17Last week Input to realisation
- message-id msg02
- relation C_DEPARTURE
- departing-entity C_CALEDON-EXPRESS
- args departure-location C_ABERDEEN
- departure-time C_1000
18This week more realistic input
- message-id msg02
- relation C_DEPARTURE
- departing-entity C_34435
- args departure-location .....
- departure-time .....
the caledonian (express), the Aberdeen-Glasgow
express the blue train on your left , the
train
19- Communication is about saying the truth ...
- but thats not all there is to it
- Paul Grice (around 1970) principles of rational,
cooperative communication - GRE is a good case study. (R.Dale and E.Reiter,
Cognitive Science, 1995)
20Grice maxims of conversation
- Quality only say what you know to be true
- Quantity give enough but not too much
information - Relevance be relevant
- Manner be clear and brief
- (There is overlap between these four)
21Maxims are two-edged sword
- They say how one should normally speak/write.
Example - Yes, theres a gasoline station around the
corner (when its no longer operational) - quality yes, its true
- quantity probably yes
- relevance no, not relevant to hearers
intentions - manner its brief, clear, etc.
22Maxims are two-edged sword
- 2. They can also be exploited. Example
- Asked to write academic reference Kees
always came to my lectures and hes a nice guy - quality yes, its true (lets assume)
- quantity No -- How about academic achievements?
- relevance yes
- manner yes
23- Application to GRE
- Dale Reiter best description of an object
- fulfils the Gricean maxims. E.g.,
- (Quality) list properties truthfully
- (Quantity) use properties that allow
identification without containing more info - (Relevance) use properties that are of
interest in the situation - (Manner) be brief
-
24DRs expectation
- Violation of a maxim leads to implicatures.
- For example,
- Quantity the pitbull (when there is only
one dog). - Manner Get the cordless drill thats in the
toolbox (Appelt). - Theres just one problem
25people dont always speak this way
- For example,
- Manner the red chair (when there is only one
red object in the domain). - Manner/Quantity I broke my arm (when I have
two). - General empirical work shows much redundancy
- Similar for other maxims, e.g.,
- Quality the man with the martini (Donellan)
26Example Situation
c, 100
d, 150
e, ?
Swedish
Italian
b, 150
a, 100
27Formalized in a KB
- Type furniture (abcde), desk (ab), chair (cde)
- Origin Sweden (ac), Italy (bde)
- Colours dark (ade), light (bc), grey (a)
- Price 100 (ac), 150 (bd) , 250 ()
- Contains wood (), metal (abcde), cotton(d)
- Assumption all this is shared knowledge.
28Game
- 1. Describe object a.
- 2. Describe object e.
- 3. Describe object d.
29Game
- 1. Describe object a desk,sweden, grey
- 2. Describe object e no solution
- 3. Describe object d Italy, 150
30Violations of
- Manner
- The 100 grey Swedish desk which is made of
metal - (Description of a)
- Relevance
- The cotton chair is a fire hazard?
-
- ?Then why not buy the Swedish chair?
- (Descriptions of d and c respectively)
31- In fact, there is a second problem with
Quantity/Manner. Consider the following
formalization -
- Full Brevity Never use more than the minimal
number of properties required for identification
(Dale 1989) - An algorithm
32- Dale 1989
- Check whether 1 property is enough
- Check whether 2 properties is enough
- .
- Etc., until
- success minimal description is generated or
- failure no description is possible
33Problem exponential complexity
- Worst-case, this algorithm would have to inspect
all combinations of properties. n properties
combinations. - Recall one grain of rice on square one twice
as many on any subsequent square. - Some algorithms may be faster, but
- Theoretical result algorithm must be
exponential in the number of properties.
34- DR conclude that Full Brevity cannot be
achieved in practice. - They designed an algorithm that only approximates
Full Brevity the Incremental Algorithm (next
time)
35Microplanning(Sentence planning)Part 2
36Exercise
- Two issues to think about
- 1. Which NLG tasks does a template-based system
need to perform? - 2. How would you set up a GRE program, to resolve
the issues that have come up so far?
37- Ex. 1. Earlier examples of templates
-
- TRAIN departs TOWN at TIME
- TRAIN departs TOWN from PLATFORM
- We apologise for the fact that TRAIN is
delayed by AMOUNT -
-
38- Simple extreme use canned text
- C_3445 the Caledonian Express
- C_3446 the Gatwick Express
- C_327 the train from Bton to Hastings
- Sophisticated extreme use full NLG
39Ex. 2. Some problems in GRE
- Producing a distinguishing description whenever
one exists - Honouring Grices maxims ...
- ... except where people dont
- Doing everything in a computationally feasible
way - Well show you Dale Reiters response.
40- Incremental Algorithm (informal)
- Properties are considered in a fixed order
- P
- A property is included if it is useful
- true of target false of some distractors
- Stop when done so earlier properties have a
greater chance of being included. (E.g., a
perceptually salient property) - Therefore called preference order.
41- r the individual to be described
(target referent) - P the list of properties, in preference order
- P a property
- L the list of properties in the description
- (Recall were not worried about realization
today)
42 43Properties of the IA
- Hill climbing better and better approximations
of the target - Linear complexity
- If m nr of properties
- then O(m) actions needed in the worst
case - Can do smart things with preference order
- (think of properties that are hard to test)
44Back to the KB
- Type furniture (abcde), desk (ab), chair (cde)
- Origin Sweden (ac), Italy (bde)
- Colours dark (ade), light (bc), grey (a)
- Price 100 (ac), 150 (bd) , 250 ()
- Contains wood (), metal (abcde), cotton(d)
- Assumption all this is shared knowledge.
45Back to our game
- 1. Describe object a.
- 3. Describe object d.
- Can you see room for improvement?
46Improving the Incremental Algorithm
- 1. Absent nouns. IA may fail to include a
property that can be expressed through a noun.
E.g., L ltgreygt - Solution
- -- Make noun-ish properties most preferred
- -- Last line of algorithm If no noun-ish
properties have been included, add one
47Improving the Incremental Algorithm
- 2. Relations between properties
- Add f to the domain. Now,
- to describe a, IA includes these properties
- furniture gt abcde
- desk gt ab
- Sweden gt a
- After lexicalisation and realisation, e.g.
- the piece of furniture thats a Swedish desk
48Relations between properties
- Dale and Reiter proposed to take the
- attribute value structure into account
- Type furniture(abcde),desk(ab),chair(cde
) - -- For a given attribute, use the value that
removes most distractors - -- If there is a tie, use the most general value
49Relations between properties
- Type furniture(abcde), desk(ab),
chair(cde) - -- To refer to a when C abcdef,
- desk is better than furniture
- (desk removes c,d,e,f furniture removes f)
- -- To refer to a when C abf,
- furniture is better than desk
- (furniture removes f desk removes f)
50Improving the Incremental Algorithm
- 3. Negations. Try to describe object e. IA
fails - furniture gt abcde
- chair gt cde
- italy gt de
- (dark) gt de
- (metal gt de
- It might have succeeded
- Italy gt bde
- Not 150 gt e
51Adding negations where useful
- Type furnit (abcde), desk (ab), chair
(cde) - Origin Sweden (ac), Italy (bde)
- Colours dark (ade), light (bc), grey (a)
- not-grey (bcde)
- Price 100 (ac), 150 (bd) , 250 ()
- not-150 (a,c,e)
- Contains wood (), metal (abcde), cotton(d)
- not-cotton (abce)
52Adding negations where useful
- Type furnit (abcde), desk (ab), chair
(cde) - Origin Sweden (ac), Italy (bde)
- Colours dark (ade), light (bc), grey (a)
- not-grey (bcde)
- Price 100 (ac), 150 (bd) , 250 ()
- not-150 (a,c,e)
- Contains wood (), metal (abcde), cotton(d)
- not-cotton (abce)
53 54Improving the Incremental Algorithm
- 4. Referring to sets. Try to describe the set
d,e. No problem! - The algorithm was designed for reference to
individuals, but it can be reused for sets - chair gt c,d,e
- Italy gt d,e
55- Type furnit (abcde), desk (ab), chair
(cde) - Origin Sweden (ac), Italy (bde)
- Colours dark (ade), light (bc), grey (a)
- not-grey (bcde)
- Price 100 (ac), 150 (bd) , 250 ()
- not-150 (a,c,e)
- Contains wood (), metal (abcde), cotton(d)
- not-cotton (abce)
56Improving the Incremental Algorithm
- 5. Other logical operations. Try to describe
the set b,c,d - Its not possible (even with negation)
- furniture gt a,b,c,d,e
- not grey gt b,c,d,e
- Yet, a description might have been possible
- light or 150 gt b,c U b,d b,c,d
57Some other limitations
- How might one generate the small mouse, the
large mouse (where large and small are context
dependent)? - How could you generate non-literal descriptions,
e.g., shes married to the yellow spectacles - Recursive descriptions the mother of
- the father of the brother of ... of my
neighbour - And so on ... Well focus on one more issue
context
58Sometimes no description is needed
- The ..... leaves at 10. It departs from ...
(earlier example) - Let every xi be a referring expression
- ....x1....x2.....x3.......
..........x2....x2....x1..
....x1....x4.....x5....... - Full definite descriptions are one option among
many
59Some different ways of referring
- the train from Aberdeen to Glasgow
- (description)
- the Caledonian Express (proper name)
- this train (demonstrative)
- that train over there (demonstrative)
- it (pronoun)
60Category choice
- Choosing between pronouns, demonstratives,
descriptions, proper names, etc. - Theories about category choice are often studied
using corpora, via hypothesis testing or
learning. - Salience is a key concept, which takes a
different form in different theories
61Salience is difficult to define
- Talking about John may make him salient
- John ... Henry ... Henry ... John ...
John ... John ... John - Johns physical proximity may make him salient
- salient/important/striking/attention catching/...
Measured in different ways
62Henschel, Cheng Poesio (2000)(just to give you
the flavour)
- Choose pronoun if
- antecedent is realized as subject or
discourse-old - no competing referent is realized as subject or
discourse-old - no competing referent is amplified by appositive
or restrictive relative clause - Otherwise choose definite description
63- In this lecture, we ignore category choice,
focussing on generation of definite
descriptions. - So far, we have also ignored salience,even
though it is also relevant for definite
descriptions
64Salience in GRE
- Book Reiter and Dale (2000) Building Natural
Language Generation Systems - Domain elements that are salient enough
- Krahmer and Theune (2002)
- This disregards different degrees of salience
within the Domain - This fails to reflect that even the least salient
object can be referable
65Degrees of salience
- If our chihuahua is the most salient object in
the Domain then the dog refers to it. - If our chihuahua is the least salient object in
the Domain then we might refer to him saying the
small ratty creature thats trying to hide behind
the chair
66Krahmer and Theune (2002)
- the dog the most salient dog
- This assumption is exploited by re-interpreting
Domain (in the Incremental Algorithm) by
67 68SalMaxac, SalMidb, SalMinde
- Type furniture (abcde), desk (ab), chair (cde)
- Origin Sweden (ac), Italy (bde)
- Colours dark (ade), light (bc), brown (a)
- Price 100 (ac), 150 (bd) , 250 ()
- Contains wood (), metal (abcde), cotton (d)
- Exercise Describe a Describe b Describe d
69SalMaxac, SalMidb, SalMinde
- Type furniture (abcde), desk (ab), chair (cde)
- Origin Sweden (ac), Italy (bde)
- Colours dark (ade), light (bc), brown (a)
- Price 100 (ac), 150 (bd) , 250 ()
- Contains wood (), metal (abcde), cotton (d)
- a Domain a,c description desk
- b Domain a,b,c description desk,
Italy - d Domain a,b,c,d,e description chair,
Italy, 150
70- Krahmer Theune are agnostic about how salience
is determined - They focus on textual salience
- ....x1....x2.....x3.......
..........x2....x2....x1..
....x1....x4.....x5....... - Also influenced by physical proximity. (E.g.,
the door the nearest door)
71Before abandoning GRE ...
- Lets return to the pipeline architecture
72Modular structure of NLG systems
Content determination
TEXT PLANNER
Discourse planning
Sentence aggregation
SENTENCE PLANNER/ MICROPLANNER
Lexicalisation
Referring expressions
REALISER
Realisation
73- We have talked about Microplanning,
- focussing on GRE
- Weve focussed even further, on determining the
properties in a description - Who decides
- What words to use?
- How to put them together? (The chair thats from
Italy and that contains wood)
74- One can see GRE as a microcosm of GRE, containing
- content determination (which properties?)
- aggregation (which grouping?)
- which words?
- which linguistic construction?
75Next week