Title: An Overview from an applied systembuilding perspective
1An Overview from an applied system-building
perspective
- Natural Language Generation
Based on Building Applied Natural Language
Generation Systems by Ehud Reiter and Robert
Dale, 1997
2Introduction
- What is Natural Language Generation (NLG) ?
- This presentation applied system-building
perspective
3Overview
- Applications of NLG
- When to use NLG
- Requirements Analysis System Specification
- Architecture and Components
4Applications of NLG
- Automatically present information to humans
- Textual weather forecasts
- Summarize statistical data from database
- Explain medical information in a patient-friendly
way - Authoring aid
- Customer service (letters)
- Help technical authors with software instructions
5When to use NLG?
- Text versus Graphics
- NLG versus Mail-merge
- NLG versus Human authoring
6Requirements Analysis System Specification
- Corpus-based
- Initial Corpus of Output Texts
- Target Text Corpus
- Classify each sentence of the corpus text
- Unchanging text
- Directly available data
- Computable data
- Unavailable data
7Example
There are 20 trains each day from Aberdeen to
Glasgow. The next train is the Caledonian
Express it leaves Aberdeen at 10am. It is due to
arrive in Glasgow at 1pm, but arrival may be
slightly delayed because of snow on the track
near Stirling. Thank you for considering rail
travel.
8Example
- Required Output Sample
- Unchanging text
There are 20 trains each day from Aberdeen to
Glasgow. The next train is the Caledonian
Express it leaves Aberdeen at 10am. It is due to
arrive in Glasgow at 1pm, but arrival may be
slightly delayed because of snow on the track
near Stirling. Thank you for considering rail
travel.
9Example
- Required Output Sample
- Unchanging text
- Directly available data
There are 20 trains each day from Aberdeen to
Glasgow. The next train is the Caledonian
Express it leaves Aberdeen at 10am. It is due to
arrive in Glasgow at 1pm, but arrival may be
slightly delayed because of snow on the track
near Stirling. Thank you for considering rail
travel.
10Example
- Required Output Sample
- Unchanging text
- Directly available data
- Computable data
There are 20 trains each day from Aberdeen to
Glasgow. The next train is the Caledonian
Express it leaves Aberdeen at 10am. It is due to
arrive in Glasgow at 1pm, but arrival may be
slightly delayed because of snow on the track
near Stirling. Thank you for considering rail
travel.
11Example
- Required Output Sample
- Unchanging text
- Directly available data
- Computable data
- Unavailable data
There are 20 trains each day from Aberdeen to
Glasgow. The next train is the Caledonian
Express it leaves Aberdeen at 10am. It is due to
arrive in Glasgow at 1pm, but arrival may be
slightly delayed because of snow on the track
near Stirling. Thank you for considering rail
travel.
12Architecture and Components
- Six basic tasks
- Three-stage pipeline
13NLG Tasks (1)
- Content Determination
- What should be communicated?
- Filter and summarize input data
- Entities
- e.g. specific trains, places and times
- Concepts
- e.g. property being the next train
- Relations
- e.g. departure relation between train and time
14NLG Tasks (2)
- Content Determination
- Discourse planning
- Order and structure the set of messages that are
to be communicated
15NLG Tasks (3)
- Content Determination
- Discourse planning
- Sentence aggregation
- e.g. The next train, which leaves at 10am, is
the Caledonian Express.
16NLG Tasks (4)
- Content Determination
- Discourse planning
- Sentence aggregation
- Lexicalization
- Which specific words and phrases should be
chosen? - e.g. leave or depart
- Especially important with multiple languages
17NLG Tasks (5)
- Content Determination
- Discourse planning
- Sentence aggregation
- Lexicalization
- Referring expression generation
- The next train is the Caledonian Express it
leaves Aberdeen at 10am.
18NLG Tasks (6)
- Content Determination
- Discourse planning
- Sentence aggregation
- Lexicalization
- Referring expression generation
- Linguistic realization
- Apply the rules of grammar to produce the final
text - e.g. There are 20 trains each day from Aberdeen
to Glasgow.
19NLG Architecture
Goal
- Three stage pipeline
- Text Planner
- Content Determination
- Discourse planning
- Sentence Planner
- Sentence aggregation
- Lexicalization
- Referring expression generation
- Linguistic Realizer
- Linguistic realization
Text Planner
Text Plan
Sentence Planner
Sentence Plans
Linguistic Realizer
Surface Text
20NLG Architecture
Goal
- Three stage pipeline
- Text Planner
- Content Determination
- Discourse planning
- Sentence Planner
- Sentence aggregation
- Lexicalization
- Referring expression generation
- Linguistic Realizer
- Linguistic realization
Text Planner
Text Plan
Sentence Planner
Sentence Plans
Linguistic Realizer
Surface Text
21Text PlanningContent Determination
- For most systems, content determination is based
on content-specific rules - Less flexible than deep reasoning
- Easy to accommodate bureaucratic/legal concerns
- Results are more similar to existing texts (by
humans) - To acquire content rules
- Separate all phrases that contain information
- Classify them (unchanging, computable, etc.)
- Group similar phrases
- Try to find conditions of when different messages
appear - Discuss results with domain expert
- Repeat with larger set of texts
22Text PlanningDiscourse Planning
- Gives discourse relations to sentences
- e.g. Elaboration
- I like fruits.
- My favourite snack is an apple.
- Contrast
- I like fruits.
- However, my favourite snack is a candy bar.
- Schema-based approaches
23NLG Architecture
Goal
- Three stage pipeline
- Text Planner
- Content Determination
- Discourse planning
- Sentence Planner
- Sentence aggregation
- Lexicalization
- Referring expression generation
- Linguistic Realizer
- Linguistic realization
Text Planner
Text Plan
Sentence Planner
Sentence Plans
Linguistic Realizer
Surface Text
24Sentence PlanningSentence Aggregation
- Several kinds of sentence-formation aggregation,
including - Simple conjunction
- and
- Ellipsis merge sentences with common constituent
- John saw. and John won. -gt John saw and
won. - Set formation group sentences that are identical
except for a single constituent - John bought an apple. and John bought a
banana. -gt John bought an apple and a banana. - Embedding
- John is ill. and John smiles. -gt John, who
is ill, smiles.
25Sentence PlanningLexicalization
- Most systems use decision trees, which can be
used for - Select different synonyms to add variety
- e.g. leave and depart
- Select a different word for a different context
- e.g. but if a contrast appears in one sentence,
however if it relates multiple sentences - Select words based on stylistic parameters
- e.g. father is formal, dad is informal
26Sentence PlanningReferring Expression Generation
- Initial introduction
- Give the name of the object (if it has one)
- Describe the physical location of the object
- Pronouns
- Use pronous when the entity was mentioned in the
previous clause, and no ambiguity is present - Definite descriptions
- e.g. use the train instead of the 1015am train
from Aberdeen to Edinburgh, add information if
ambiguous
27NLG Architecture
Goal
- Three stage pipeline
- Text Planner
- Content Determination
- Discourse planning
- Sentence Planner
- Sentence aggregation
- Lexicalization
- Referring expression generation
- Linguistic Realizer
- Linguistic realization
Text Planner
Text Plan
Sentence Planner
Sentence Plans
Linguistic Realizer
Surface Text
28Linguistic Realization
- Apply rules from the grammar of the natural
language, e.g. - Rules about verb group formation
- Rules about agreement
- Rules about syntactically required
pronominalization - John saw John in the mirror should be John saw
himself in the mirror
29Conclusion
30Questions?