Automatic Question Generation for Vocabulary Assessment - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Automatic Question Generation for Vocabulary Assessment

Description:

We describe an approach to automatically generating questions for ... of questions: definition,synonym,antonym,hypernym, hyponym, and cloze questions. ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 20
Provided by: free74
Category:

less

Transcript and Presenter's Notes

Title: Automatic Question Generation for Vocabulary Assessment


1
Automatic Question Generation for Vocabulary
Assessment
  • J. C. Brown M. Eskenazi G. A. Frishkoff
  • CMU CMU
    University of Pittsburgh
  • HLT/EMNLP 2005

2
Abstract
  • REAP( Reader Specific Lexical Practice)
  • We describe an approach to automatically
    generating questions for vocabulary assessment.
  • We suggest that these automatically-generated
    questions give a measure of vocabulary skill that
    correlates well with subject performance on
    independently developed human-written questions.

3
Measuring Vocabulary Knowledge
  • such as knowledge of the spoken form, the written
    form, grammatical behavior, collocation behavior,
    word frequency, conceptual meaning, ... (Nation,
    1990).
  • In this, we focus on knowledge of conceptual
    word meaning.

4
Question Types
  • we generated 6 types of questions
    definition,synonym,antonym,hypernym, hyponym, and
    cloze questions.
  • choose the correct sense of the word
  • POS tag
  • ??? human-annotated

5
Question Forms
  • Each of the 6 types of questions can be generated
    in several forms, the primary ones being wordbank
    and multiple-choice.

6
Question Forms
  • chooses distractors of the same POS and similar
    frequency to the correct answer, Coniam (1997).
  • from Kilgarriffs (1995) word frequency database,
    based on the British National Corpus (BNC) . POS
    tagged using the CLAWS tagger.
  • chooses 20 words from this database
  • of the same POS
  • equal or similar in frequency to the correct
    answer
  • randomly chooses as distractors

7
Question Coverage
  • 156 low frequency and rare English words.
  • Unable to generate any questions for 16(9) of
    these words.
  • All four questions were able to be generated for
    75(50)of the words.

8
Experiment Design
  • The human-generated questions were developed by a
    group of three learning researchers.
  • Examples of each question type were hand-written
    for each of the 75 words.

9
Experiment Design
  • synonym and cloze questions, were similar in form
    to the corresponding computer-generated question
    types.
  • An inference task, a sentence completion task,
    and a question based on the Osgood semantic
    differential task.

10
Experiment Design
  • inference task Which of the following is most
    likely to be lenitive? a glass of iced tea
  • a shot of tequila
  • a bowl of rice
  • a cup of chowder.

11
  • sentence completion taskThe music was so
    lenitive,
  • it was tempting to lie back and go to
    sleep.
  • it took some concentration to appreciate
    the complexity.

12
  • semantic differential task were asked to
    classify a word such as lenitive along one of
    these dimensions (e.g., more good or more bad).
  • Osgood three dimensions valence (goodbad),
    potency (strongweak), and activity
    (activepassive).

13
  • we administered a battery of standardized tests,
    including the Nelson-Denny Reading Test, the
    Ravens Matrices Test, and the Lexical Knowledge
    Battery.
  • 21 native-English speaking adults participated in
    two experiment sessions.

14
  • Session 1 lasted for 1 hour and included the
    battery of vocabulary and reading-related
    assessments described above.
  • Session 2 lasted 2-3 hours and comprised 10
    tasks, including 5 human and 4 computer-generated
    questions.
  • Confidence rating task, on a 15 scale.

15
Experiment Results
  • We report on aspects of this study
  • participant performance on questions
  • correlations between question types
  • correlations with confidence ratings

16
participant performance on questions
  • Mean accuracy scores for each question type
    varied from .5286 to .6452.
  • computer-generated cloze task .5286
  • computer-generated definition task and the
    human-generated semantic differential task, both
    having mean accuracy scores of .6452.

17
correlations between question types
  • rgt.7, plt.01 for all correlations
  • computer-generated synonym and the
    human-generated synonym questions was
    particularly high (r.906)
  • the correlation between the human and computer
    cloze questions (r .860)

18
correlations with confidence ratings
  • The average correlation between accuracy on the
    question types and confidence ratings for a
    particular word was .265 (low).
  • may be because participants thought they knew
    these words, but were confused by their rarity.
  • confidence simply does not correlate well with
    accuracy.

19
Conclusion
  • computer-generated questions give a measure of
    vocabulary skill for individual words that
    correlates well with human-written questions and
    standardized assessments of vocabulary skill.
Write a Comment
User Comments (0)
About PowerShow.com