The Question Generation Task - PowerPoint PPT Presentation

About This Presentation
Title:

The Question Generation Task

Description:

Community wide efforts are needed for building resources, infrastructure ... set of related questions where anaphora and other discourse aspects are present ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 19
Provided by: ArtG152
Category:

less

Transcript and Presenter's Notes

Title: The Question Generation Task


1
The Question Generation Task
  • Vasile Rus, Zhiqiang Cai, and Art Graesser

2
Outline
  • Shared task for NLG?
  • Why is question generation important?
  • Landscape of example questions
  • Definition of Question Generation
  • Subtasks
  • Evaluation Methodologies
  • Black-box vs. Glass-box
  • Manual vs. Automatic
  • Data sets

3
NLG Shared Task(s) or Not?
  • Shared Tasks (3)
  • Pros
  • Define evaluation metrics
  • Compare approaches to the chosen task
  • Monitor task
  • Community wide efforts are needed for building
    resources, infrastructure
  • Bring the community together
  • increase visibility of NLG
  • Cons
  • Too much effort spent on the chosen task
  • Shadow other basic research effort

4
What Shared Task(s)?
  • Principle due to the inherent difficulty of
    Language Generation choose a (relatively) simple
    task
  • Question Answering has avoided deep questions
  • Summarization focuses on extractive summaries
  • Textual Entailment text understanding?
  • Full-fledged NLU evaluation?

5
Why is Question Generation Important?
  • Help systems and FAQ facilities need example
    questions for users to model
  • Information retrieval queries need suggested
    revised questions
  • A need for automated systems with proactive
    question asking and answering
  • Intelligent tutoring systems need automated hints
    and other question probes

6
Who may care about Question Generation?
  • Natural Language Generation community
  • Learning Technologies community
  • Intelligent Tutoring Systems
  • Subject testing (ETS)
  • Question Answering community

7
  • Landscape of Questions to Generate
  • (Graesser and Person,1994 Lehnert, 1978)
  • LEVEL 1 SIMPLE or SHALLOW
  • 1. Verification Is X true or false? Did an
    event occur?
  • 2. Disjunctive Is X, Y, or Z the case?
  • 3. Concept completion Who? What? When?
    Where?
  • 4. Example What is an example or instance of a
    category?).
  • LEVEL 2 INTERMEDIATE
  • 5. Feature specification What qualitative
    properties does entity X have?
  • 6. Quantification What is the
    value of a quantitative variable? How much?
  • 6. Definition questions What does X mean?
  • 8. Comparison How is X similar to Y? How is X
    different from Y?
  • LEVEL 3 COMPLEX or DEEP
  • 9. Interpretation What
    concept/claim can be inferred from a pattern of
    data?
  • 10. Causal antecedent Why did an event occur?
  • 11. Causal consequence What are the consequences
    of an event or state?

8
Question Generation
  • Input one or more sentences
  • Output set of questions related to the input
    text

9
Examples
  • AutoTutor
  • INPUT There are no horizontal forces on the
    packet after release.
  • OUTPUT What can you say about the horizontal
    forces on the packet?
  • NIST QA track
  • INPUT But here is who will actually direct
    Dreamgirls -- none other than Frank Oz, the voice
    of Miss Piggy on the Muppets.
  • OUTPUT Who is the voice of Miss Piggy?

10
Subtasks - Input
  • INPUT
  • Input one sentence
  • Input one paragraph
  • Input specified in a formalism appropriate for
    Language Generation

11
Subtasks - Output
  • OUTPUT
  • Subtask 1 generate question containing only
    words from input
  • Subtask 2 generate questions containing only
    words from input, except for one word
  • Subtask 3 generate questions containing replaced
    phrases from input
  • Subtask 4 generate WHO questions, WHEN
    questions, etc.
  • Subtask 5 freely generate questions

12
Evaluation
  • Black-box
  • Simply look at the quality of the output
  • Glass-box
  • Some subtask are designed to test for particular
    components of language generation
  • Subtask 1 is suitable for testing syntactic
    variability and microplanning
  • Subtask 2 is suitable for testing lexical
    generation

13
Evaluation
  • Manual
  • Human experts judge the questions on quality
    and/or relevance
  • What is a good question?
  • Automatic
  • Suitable for some subtasks
  • Use automatic evaluation techniques from
    summarization (extractive summarization)

14
Evaluation - Metrics
  • Precision
  • Recall
  • Prepare a set of good questions for each input
  • Re-use existing data, e.g. NIST QA data
  • Use NIST method
  • Collect all good questions from all submissions
    and use it as the pool of GOLD STANDARD questions
  • Ranking MRR (mean reciprocal rank)
  • Confidence measure confidence weighted measure

15
Data
  • AutoTutor
  • Hints and prompts to elicit physics principles
  • Expert-generated questions in curriculum scripts
  • NIST QA track
  • Thousands of Question-Answer pairs
  • Manipulate existing data
  • New data

16
Pros and Cons
  • Pros
  • Textual input could help with wide adoption
  • Suitable for glass- and black-box evaluation
  • Automatic evaluation is possible
  • Data sets already available or almost available
  • Cons
  • Discourse planning
  • Alternative generate set of related questions
    where anaphora and other discourse aspects are
    present
  • Pre-posed context clause
  • Fundamental issue
  • What is a good question?

17
Summary
  • Simple and attractive
  • Automatic evaluation possible
  • Data sets available

18
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com