Title: A QA Database and Extended KINDLE Resources
1A Q/A Database and Extended KINDLE Resources
- Edward Loper
- University of Pennsylvania
2Reading Comprehension Corpora
- Two evaluation corpora
- A collection of existing reading comprehension
tests - New reading comprehension questions based on
existing WSJ articles (to leverage existing
annotation) - Annotate both passages questions with
- Syntax (TreeBank)
- Verbal predicate information (PropBank)
- Noun predicate information (NomBank)
- Course grained sense tagging (PropBank II)
- Nominal coreference (PropBank II)
- Discourse connectives (PropBank II)
- Event variables (PropBank II)
3Goals
- Evaluate system performance
- Unbiased questions
- Drive research forward
- See how our current techniques need to be
improved - Include examples just beyond the current state of
the art - Include examples that are well beyond current
capabilities
4Why Reading Comprehension?
- Each answer occurs exactly once
- Only one chance to find the answer
- More incentive to solve the difficult cases
- Answer is guaranteed to exist
- Limited text to process
- Allows us to experiment w/ deep processing
- Simplifies evaluation
5Which Examples to Annotate?
- Passage
- Random or Selected Topic? (person, episode,
location,..) - 2 degrees of Difficulty (writing style,
vocabulary, etc.) - Question
- Topic (person, duration, cause, etc.)
- Reasoning needed (spatial, event, domain
specific, etc.) - Knowledge needed (coref, discourse analysis,
etc.) - Difficulty
6An Informal Survey of Reading Comprehension
Passages
- What knowledge resources do we need?
- Examine Q/A pairs from reading comprehension
tests - What resources do we need to get an extended
subsumption match between the question and answer?
7Dale Earnhardt
When was his first win?
Q
He won his first championship in 1980.
A
- Mapping arguments between a verb and its
nominalization - Coreference (in both question passage)
Need
8Abraham Lincoln
How did Lincoln die?
Q
He was shot by John Wilkes Booth in 1865.
A
- Match shoot w/ kill (via VerbNet or WordNet)
- WN hypernym
- VerbNet share a superclass
- Match kill w/ cause to die (WN def.)
- Use PropBank to line up arguments (Lincolnarg1)
Need
9One of those days
What did he have for breakfast?
Q
Then he went downstairs and had his breakfast.
This consisted of a bowl of cereal with a banana
and some milk.
A
- Sense tagging (consume sense of have WN6)
- Specify answer type as food (from
predicate/argument information about the
consume sense of have). - Treat consists as equality (consists is a
hyponym of to be) - Coreference (to match thishis breakfast)
Need
10Teddy Bears
Why did President Roosevelt refuse to shoot the
bear?
Q
President Roosevelt was on a hunting trip in
Mississippi when members of the hunting party
caught a black bear and tied him to a tree.
President Roosevelt was called to the area to
shoot the bear, which he refused to do and said
it was unsportsmanlike and showed poor manners.
A
- Coreference (do shoot the bear)
- Coreference (he President Roosevelt)
- Discourse connective and because in the
last sentence. - For complete answer discover that it to
shoot a bear tied to a tree, not just to shoot
a bear. (general reasoning?)
Need
11Knowledge Resources Needed
- Vocabulary variation Lexical matching
- Syntactic variation Phrasal matching
- Alternations (incl. lexical-specific
alternations) - Coreference
- Discourse structure analysis discourse
connectives - Reasoning about locations
- Reasoning about times and events
- Reasoning about opinions
- World knowledge (scenario reasoning/matching)
-
12Extending Resources
- Extending VerbNet
- Extending coverage of PropBank
- PropBank II
- Event variables (continue growing rice)
- Sense tagging w/ grouped senses
- Nominal coreference
- Discourse connectives
13http//www.cis.upenn.edu/pdtb/
Joshi, Miltisaki, Prasad
He failed the exam although he had studied
hard. When the stock market dropped nearly 7
Oct 13, for instance, the Mexico Fund plunged
about 18, and the Spain Fund fell 16.
14http//www.cis.upenn.edu/group/verbnet/
15http//www.cis.upenn.edu/ace
16Q/A Corpus Construction
- How should we select passages?
- What kind(s) of topic? Bibliographic? Events?
Fictional? - What kind of questions?
- What difficulty level(s)? How far do we push
state of the art? How much reasoning?