Natural Language Generation - PowerPoint PPT Presentation

About This Presentation
Title:

Natural Language Generation

Description:

There has been active research in machine translation and automatic ... It's the only thing users see/hear (do not ruin your otherwise great system) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 20
Provided by: alic49
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Generation


1
Natural Language Generation
  • Alice Oh
  • aliceo_at_cs.cmu.edu
  • 18 November 2014

2
What is NLG?
  • Natural Language Understanding (NLU)
  • Natural Language Generation (NLG)
  • There has been active research in machine
    translation and automatic summarization (e.g.,
    stock quotes, medical information) communities.

Text
Semantic (Syntactic) Representation
Text
Semantic (Syntactic) Representation
3
Application of NLG in MT
  • Generation of output
  • Generation of paraphrase
  • Generation of interlingua (SQL queries,
    summarization tables, etc.)
  • Generation of controlled language
  • The whole MT problem can be viewed as an NLG
    problem!

Source Language
Target Language
NLG
4
Why spend time/effort on NLG?
  • What is the problem here? There are many more
    researchers working on NLU than NLG (esp. in the
    U.S.)
  • Why? Because of the squeezing out toothpaste
    analogy -- which is not true!
  • Why is NLG important? Its the only thing users
    see/hear (do not ruin your otherwise great
    system)
  • What makes NLG difficult? NLG needs to KNOW vs.
    UNDERSTAND
  • Existing comprehension systems as a rule extract
    considerably less information from a text than a
    generator must appreciate in generating one.
    Examples include the reasons why a given word or
    syntactic construction is used rather than an
    alternative, what constitutes the style and
    rhetoric appropriate to a given genre and
    situation, or why information is clustered in one
    pattern of sentences rather than another. -
    McDonald, 1993

5
Levels of NLG
6
Surface Realization
  • Determining how the underlying content of a text
    should be mapped into a sequence of grammatically
    correct sentences. An NLG system has to decide
    which syntactic form to use, and it has to ensure
    that the resulting text is syntactically and
    morphologically correct.
  • (Mellish and Dale, 1998)

7
Why Surface Realization?
  • Relative agreement on
  • input
  • output
  • goals
  • More universally needed in MT than
  • content planning
  • text planning
  • Somewhat easier to compare different techniques

8
Surface Realization Different Techniques
  • Rule-based
  • Templates
  • Corpus-based
  • Rules Corpus
  • Rules Templates

9
Surface Realization Rule-Based
  • Generates text using generation grammar rules
    (similar to rule-based understanding techniques)
  • Most popular in the research community
  • Long development time
  • Often based on a specific linguistic theory
    (Systemic Functional Grammar seems most popular)
  • Portable across domains (when grammar coverage is
    good)
  • Systems FUF/SURGE, Penman, KPML

10
Surface Realization Rule-Based
  • Input extensive semantic/syntactic features
    (e.g., from interlingua)
  • Output high-quality sentences
  • Knowledge Sources generation grammar,
    (domain-specific) lexicon
  • Knowledge Acquisition hand-crafted
  • Degradation from underspecified input default
    handling
  • Degradation from lacking knowledge lower-quality
    output

11
Surface Realization Templates
  • Generates text using canned expressions and
    hand-crafted templates
  • Popular in commercial applications where similar
    documents are produced in large quantities (e.g.,
    customer service letter writing)
  • Also popular in systems where generated output
    spans a narrow range (e.g., spoken dialog
    systems)
  • Systems CLINT (business-letter writing)

12
Surface Realization Templates
  • Input minimally specified
  • Output limited set of sentences
  • Knowledge Sources templates
  • Knowledge Acquisition hand-crafted by looking at
    domain-specific corpora
  • Degradation from underspecified input N/A
  • Degradation from lacking knowledge no output

13
Surface Realization Corpus-based
  • Developed for a task-oriented spoken dialog
    system
  • Implemented in the CMU Communicator System
  • Fast prototyping for different domains
  • Natural output for spoken dialog
  • A first attempt at a truly corpus-based
    stochastic generation
  • Systems CMU, IBM

14
Surface Realization Corpus-Based
  • Input dialog act
  • Output medium-quality sentences
  • Knowledge Sources language models
  • Knowledge Acquisition domain-specific corpora
  • Degradation from underspecified input N/A
  • Degradation from lacking knowledge no output

15
Surface Realization Rules Corpus
  • Nitrogen
  • Developed to account for underspecified input
    which causes problems for rule-based techniques
  • Developed for a machine translation project at
    ISI/USC
  • Stochastic Generation at U. Edinburgh
  • Accounts for sentence-level attributes (in
    addition to word-level attributes)
  • An interesting technique to apply a certain
    authors style to another text (applying
    Shakespeares style, e.g., sentence length
    distribution and vocabulary diversity, to Mark
    Twain!)

16
Surface Realization Rules Corpus
  • Input underspecified syntactic/semantic features
  • Output high-quality sentences
  • Knowledge Sources language models
  • Knowledge Acquisition (domain-specific) corpora
  • Degradation from underspecified input accounted
    for by language models
  • Degradation from lacking knowledge lower-quality
    output

17
Surface Realization Rules Templates
  • Enables more efficient development of NLG system
    for a concrete application
  • Combines generation grammar rules, templates, and
    canned expressions
  • Compensates for the shortcomings of each
    technique (i.e., utilizes the advantages of each
    technique)

18
Surface Realization Rules Templates
  • Input underspecified syntactic/semantic features
  • Output high-quality sentences
  • Knowledge Sources generation grammar rules,
    lexicon, templates
  • Knowledge Acquisition hand-crafted
  • Degradation from underspecified input accounted
    for by templates and canned expressions
  • Degradation from lacking knowledge lower-quality
    output

19
Future Directions
  • Corpus-Based Techniques
  • Langkilde, USC
  • Multimodal (Multimedia) Generation
  • McKeown, et al. Columbia University
  • Concept-to-Speech Generation
  • Hitzeman, et al. U. Edinburgh
  • Hypertext (Web documents) Generation
  • Dale, Macquarie University
  • Reference Architecture for NLG
  • RAGS Project (U. Edinburgh, U. Brighton)
Write a Comment
User Comments (0)
About PowerShow.com