Title: 11682: Introduction to IR, NLP, MT and Speech
111-682 Introduction to IR, NLP, MT and Speech
Natural Language Generation Overview
2Todays Topics
- An overview of practical issues in building
natural language generation (NLG) systems - Based on Reiter Dale, 1997
- Goal produce understandable texts in a human
language from some underlying representation
3NLG Ingredients
- A representation of the input (probably not
human-friendly) - Knowledge of the domain
- Knowledge of the target language
- A human-friendly output format
- documents, reports, explanations, help messages,
technical instructions, etc.
4Example NLG Applications
- Forecasts from weather maps
- Summarize results of DB queries
- Explain complex (e.g. medical) information
- Describe a chain of reasoning in an expert system
- Answering questions about an object in a
knowledge base
5Authoring Aids
- Template-based generation of routine documents
- Examples
- discharge summaries, referral letters
- letters to customers
- management summaries
- job descriptions
- technical manuals
6When Is NLG Appropriate?
- Are graphics more useful?
- Is human-quality output required?
- How much stylistic variation?
- Any legal liabilities / requirements?
- Constraints posed by the problem domain? (e.g.
bandwidth)
7Templates (mail-merge)
- Insert input data into pre-defined slots in a
template document - More complex systems vary structure based on
input - More limited than NLG
- NLG can achieve higher quality
- NLG is easier to adapt to changes
8Human vs. Machine
- Is NLG a cost-effective solution?
- Economics of NLG development
- Systems are expensive
- A large volume of output necessary to justify the
expenditure - The cost / quality threshold
- Can NLG provide the necessary quality at an
acceptable price? (or at all?)
9Requirements Analysis
- NLG is an evolving technology...
- ...so iterative prototyping is the most
appropriate SE technique - Corpus-Based Methods
- Identify target text sample
- Associate with internal representations (input to
NLG) - Specify required NLG algorithms and data
10Gathering A Corpus
- Archived examples of human texts
- Cover a full range of texts
- If no corpus, ask experts to create one
(associated costs conflicts) - Document Table
- rows domain categories (e.g., product lines,
business areas,) - columns document types (installation, user,
maintenance, etc.)
11Example Document Table
12Analyzing theInformation Content
- Which parts convey information that isnt
available to the NLG system? E.g.When is the
next train to Glasgow?(requires external DB) - Analysis classifying sentences according to
information required - unchanging text, direct data, computed data,
unavailable data
13Sentence Types
- Unchanging TextThank you for flying US Airways
- Directly-Available DataScheduled departure is
630pm - Computable DataThere are 20 flights to Boston
- Unavailable DataDue to ground delay in Pittsburgh
Easy
Hard orImpossible
(Rely on Humans for Unavailable Data)
146 Basic NLG Tasks
- 1. Content Determination what information should
be conveyed? - 2. Discourse Planning order structure of
message set - 3. Sentence Aggregation grouping messages into
sentences - 4. Lexicalizationwords phrases for concepts,
relations - 5. Referring Expression Generation words
phrases for entities - 6. Linguistic Realisation syntax, morphology,
orthography
15Typical 3-Module Architecture
Q How should these be represented?
16Text Plans
- Common representation tree
- Leaf nodes messages
- Internal nodes message groupings
- Simple text plans templates OK
- Complex text plans require full representation
language (e.g., TAMERLAN, DIOGENES)
17Sentence Plans
- Simple templates (select fill)
- Complex abstract representation(SPL Sentence
Planning Language)
18Example SPL Expression
(S1/exist object (01/train
cardinality 20 relations ((R1/period
value daily)
(R2/source value
Aberdeen) (R3/destination
value Glasgow))))
There will be 20 trains to Glasgow
19Content Determination
- Messages (raw content)
- User Model (influences content)
- Is Reasoning Required?Find a train from Aberdeen
to Leeds(It requires two trains to get there) - Deep Reasoning Systems
- represent the users goals as well as any
immediate query - utilize plan recognition reasoning
20Discourse Planning
- Structure messages into a coherent text
- Example start with a summary, then give details
- Discourse relations, e.g.
- elaboration More specifically, X
- exemplification For example, X
- contrast / exception However, X
- Rhetorical Structure Theory (RST)
21Sentence Aggregation
- No aggregation (1 sentence / message)
- Relative Clause..which leaves at 10am
- Conjunction..and the next train is the express
- Combinations..and the next train is the express
which leaves at 10am
22Lexicalization
- Choosing words to realize concepts or relations
- Example(action/change (measure
outside_temperature) (delta (quantity/deg_F
-10)))The temperature dropped 10 degrees
23Lexical Selection Rules
24Case Creation
- Additional structure is required to realize the
meaning of the semantic representation
(A-KICK (AGENT O-JOHN) (PATIENT
O-BALL)) "John
propelled the ball with his foot"
25Case Absorption
- Word chosen to realize a semantic head also
implies the meaning conveyed by a semantic role
(A-FILE-LEGAL-ACTION (AGENT O-BOB) (PATIENT
O-SUIT) (RECIPIENT O-ACME))
"Bob sued Acme"
26Referring Expression Generation
- Initial introductionA man in the park looked up
- PronounsHe saw a bird fly over
- Definite DescriptionsThe man covered his head
with a newspaper
27Fixing Robot Text
- Start the enginei and run the enginei until
the enginei reaches normal operating
temperature - Start i and run the enginei until iti
reaches normal operating temperature - Second example introduces ellipsis and anaphora
28Journalistic Style
A dissident Spanish priest was charged here
todaywith attempting to murder the Pope. Juan
FernandezKrohn, aged 32, was arrested after a
man armed witha bayonet approached the Pope
while he was saying prayers at Fatima on
Wednesday night. According tothe police,
Fernandez told the investigating magistrates
today, he trained for the past six months for the
assault. If found guilty, the Spaniard faces a
prison sentence of 15-20 years. (Brown and
Yule, 1983)
29Summary
- 6 Basic Steps in NLG
- Architectures group those steps into different
modules - Input / output / approach depend on the domain
- Design of internal data structures depends on
complexity of task