A Maximum Entropy-based Model for Answer Extraction - PowerPoint PPT Presentation

About This Presentation
Title:

A Maximum Entropy-based Model for Answer Extraction

Description:

Q: What actress has received the most Oscar nominations? A: Oscar perennial Meryl Streep is up for best actress for the film , tying ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 39
Provided by: coliUnis
Category:

less

Transcript and Presenter's Notes

Title: A Maximum Entropy-based Model for Answer Extraction


1
A Maximum Entropy-based Model for Answer
Extraction
  • Dan Shen
  • IGK, Saarland University
  • Supervisors Prof. Dietrich Klakow
  • Dr. ir. Geert-Jan M. Kruijff

2
Part I -- Introduction
  • Answer Extraction Module in QA
  • Statistical Method for Answer Extraction
  • Motivation
  • Framework

3
Answer Extraction Module in QA
  • Open-Domain factoid Question Answering
  • Basic modules
  • Information Retrieval Module
  • ? a set of relevant sentences / paragraphs
  • Answer Extraction (AE) Module
  • ? the appropriate answer phrase

Q What is the capital of Japan ? A Tokyo Q
How far is it from Earth to Mars ? A 249
million miles
4
Techniques and Resources for AE
Techniques Resources
Pattern Matching NER Parsing Semantic analysis Reasoning .. WordNet Web Database Ontology
  • ? How to incorporate them ?
  • Pipeline structure
  • Mathematical framework

5
Motivation Use Statistical Methods ?
  • Flexibility
  • Integrating various techniques / resources
  • Easy to extend to span more in the future
  • Effectiveness

6
(No Transcript)
7
Research Issues
  • Answer Candidate Selection
  • Which constituent is regarded as an AC ?
  • Methods
  • classification / ranking /
  • Features

8
Part II ME-based model
  • Method
  • Features
  • Experiments and Results

9
Part II ME-based model
  • Method
  • Features
  • Experiments and Results

10
Maximum Entropy Formulation I
  • Given a set of answer candidates
  • Model the probability
  • Define Features Functions
  • Decision Rule

11
Maximum Entropy Formulation II
  • Given a set of answer candidates
  • Model the probability
  • Define Features Functions
  • Decision Rule

12
Some Considerations
  • Model I
  • Judge whether each candidate is a correct answer
  • v Can find more than one correct answer in a
    sentence
  • ? Is the probability comparable ?
  • Suffer from the unbalanced data set (1Pos /
    gt20Neg)
  • Model II
  • Find the best answer among the candidates
  • In a sentence, it just find one correct answer
  • v Directly make the probabilities of the
    candidates comparable
  • Experiment
  • Model II outperform Model I by about 5

13
Part II ME-based model
  • Method
  • Features
  • Experiments and Results

14
Question Analysis
Q What US biochemists won the Nobel Prize in
medicine in 1992 ? Question Word --
what Target Word biochemist Subject Word --
Nobel Prize / medicine / 1992 Verb win Q What
is the name of the highest mountain in Africa
? Question Word -- what Target Word --
mountain Subject Words -- highest / Africa Verb
-- be
PERSON
LOCATION
15
Answer Candidate Selection
  • Preprocessing
  • Named Entity Recognition
  • Parsing Collins Parser
  • To dependency tree
  • Answer Candidate Selection
  • Base noun phrase
  • Named entities
  • Leaf nodes
  • Answer Candidate Coverage
  • 11876 / 14039 84.6
  • Missing some sentences ? to consider all of the
    nodes ?

16
Features Syntactic / POS Tag Features
  • Observation
  • For who / where Question, answers Proper Noun
  • For how / when Question, answers CD
  • Question Word Syntactic tag / Pos tag
  • QWord how SynTag CD
  • QWord who SynTag NNP
  • QWord when SynTag NNP
  • QWord when SynTag CD

17
Features Surface Word Features
  • Word formations
  • Length / Capitalized / Digits,
  • Question Word Word formations
  • QWord who word is capitalized
  • QWord who word length lt 3
  • Words co-occurrence between Q and A
  • Observation -- Answer arent a subsequence of
    question

18
Features Named Entity Features
  • Question Type NE type
  • QType Person NE type Person
  • QType Date NE type Date
  • QType how much NE type Money
  • Useful for who, where, when Question
  • But for What / Which / How questions ?
  • Many expected answer types not belong to a
    defined NE type

Q1 What language is most commonly used in Bombay
? Q2 What city is Q3 Which movie win .
19
Features TWord Relation for WHAT I
  • TWord is a hypernym of answer
  • TWord is the head of answer

Q What city is Disneyland in ? A Not bad for
a struggling actor who was working at Tokyo
Disneyland just a few years ago .
Q What is the name of the airport in Dallas Ft.
Worth ? A Wednesday morning , the low
temperature at the Dallas-Fort Worth
International Airport was 81 degrees .
20
Features TWord Relation for WHAT II
  • TWord is the Appositive of answer
  • Feature Function
  • QWord what TWord is hypernym of answer
    candidate

Q What book did Rachel Carson write in 1962 ?
A1 In her 1962 book Silent Spring , Rachel
Carson , a marine biologist , chronicled
DDT 's poisonous effects , . A2 In 1962 ,
former U.S. Fish and Wildlife Service biologist
Rachel Carson shocked the nation with her
landmark book , Silent Spring .
21
Features Tword Relation for HOW
  • How many / much NN
  • How long / far / tall / fast
  • How long ? year / day / month /
  • How tall ? feet / inch / mile /
  • How fast ? per day / per hour /
  • Use some trigger word features

Q How many time zones are there in the world ?
A The world is divided into 24 time zones .
22
Features Subject Word Relations I
Q Who invented the paper clip ? S1 The paper
clip , weighing a desk-crushing 1320 pounds , is
a faithful copy of Norwegian Johan Vaaler
s 1899 invention, said S2 Like the
guy who invented the safety pin , or the guy who
invented the paper clip , David says .

23
Features Subject Word Relations II
  • Match subject word in the answer sentence
  • Minimal Edit Distance
  • Dependency Relationship Matching
  • Observation answer are close to SWord in
    Dependency Tree ? answer and SWord have
    some relation
  • Answer candidate is a subject word
  • Answer candidate is the parent / child / brother
    of SWord
  • The path from the answer candidate to SWord

Q What is the name of the airport in Dallas Ft.
Worth ? A Wednesday morning , the low
temperature at the Dallas-Fort Worth
International Airport was 81 degrees
24
Part II ME-based model
  • Method
  • Features
  • Experiments and Results

25
Experiment Settings
  • Training Data
  • TREC 1999, TREC 2000, TREC 2002
  • Total Number of Questions 1108
  • Total Number of Sentences 11331
  • Test Data
  • TREC 2003
  • Total Number of Questions 362 (remove NIL
    question)
  • Total Number of Sentences 2708

26
Question Word Distribution
27
Overall Performance
Who When Where What
MRR 0.75 0.745 1 0.609
Which How Why Other
MRR 1 0.508 0 0
Overall Overall Overall Overall
MRR 0.60 0.60 0.60 0.60
  • MRR Mean Reciprocal Rank
  • return five answers for each question

28
Contribution of Different Features
29
Features Syntactic / POS Tag Features
30
Features Surface Word Features
31
Features Named Entity Features
32
Features TWord Relations for WHAT
33
Features TWord Relations for HOW
34
Features Subject Word Relations
35
Error Analysis I
  • Target Word Concept Unresolved
  • Q What is the traditional dish served at
    Wimbledon?
  • vA And she said she wasn't wild about Wimbledon
    's famed strawberries and cream .
  • A And she said she wasn't wild about Wimbledon
    's famed strawberries and cream .
  • Choosing the Wrong Entity
  • Q What actress has received the most Oscar
    nominations?
  • vA Oscar perennial Meryl Streep is up for best
    actress for the film , tying Katharine Hepburn
    for most acting nominations with 12 .
  • A Oscar perennial Meryl Streep is up for best
    actress for the film , tying Katharine Hepburn
    for most acting nominations with 12 .

36
Error Analysis II
  • Answer Candidate Granularity
  • Q What city is Disneyland in?
  • vA Not bad for a struggling actor who was
    working at Tokyo Disneyland just a few years
    ago .
  • A Not bad for a struggling actor who was
    working at Tokyo Disneyland
    just a few years ago .
  • Repeated Target Word in Answer
  • Q How many grams in an ounce?
  • vA NOTE 30 grams is about 1 ounce .
  • A NOTE 30 grams is about 1 ounce .
  • Misc.

37
Future Work
  • Extract answer from Web
  • Evaluate on other data sets
  • Knowledge Master Corpus
  • How to deal with NIL question ?
  • Incorporate more linguistic-motivated features

38
The End
Write a Comment
User Comments (0)
About PowerShow.com