Title: QUALIFIER in TREC12 QA Main Task
1QUALIFIER in TREC-12 QA Main Task
- Hui Yang, Hang Cui, Min-Yen Kan, Mstislav
Maslennikov, Long Qiu, Tat-Seng Chua - School of Computing
- National University of Singapore
- Email yangh_at_comp.nus.edu.sg
2Outline
- Introduction
- Factoid Subsystem
- List Subsystem
- Definition Subsystem
- Result
- Conclusion and Future Work
3Introduction
- Given a question and a large text corpus, return
an answer rather than relevant documents - QA is at the intersection of IR IE NLP
- Our system - QUALIFIER
- Consists 3 subsystems
- External Resources Web, WordNet, Ontology
- Event-based Question Answering
- New Modules introduced
4Outline
- Introduction
- Factoid Subsystem
- List Subsystem
- Definition Subsystem
- Result
- Conclusion and Future Work
5Factoid System Overview
6Factoid Subsystem
- Detailed Question Analysis
- QA Event Construction
- QA Event Mining
- Answer Selection
- Answer Justification
- Fine-grained Named Entity Recognition
- Anaphora Resolution
- Canonicalization Coreference
- Successive Constraint Relaxation
7Factoid Subsystem
- Detailed Question Analysis
- QA Event Construction
- QA Event Mining
- Answer Selection
- Answer Justification
- Fine-grained Named Entity Recognition
- Anaphora Resolution
- Canonicalization Coreference
- Successive Constraint Relaxation
8Why Event-based QA - I
- The world consists of two basic types of things
entities and events and people often ask
questions about them. - From Question Answerings Point of View
- Questions enquiries about entities or events.
9Why Event-based QA - II
- QA Entities
- Anything having existence (living or nonliving)
- E.g. What is the democratic party symbol?
- QA Events
- Something that happens at a given place and
time. - E.g. How did donkey become democratic party
symbol?
Thomas Nast
1870
Harpers Weekly cartoon
10Why Event-based QA - III
- Entity Questions
- Properties, or
- entities themselves
- definition questions.
- Event Questions
- Elements of events
- Location,
- Time,
- Subject,
- Object,
- Quantity
- Description
- Action, etc.
- Table 1 Correspondence of WH-Questions Event
Elements
question event event_element entity
entity_property event event_element
event_element time location subject
object quantity description action
other entity object subject entity_property
quantity description other
11Event-based QA Hypothesis
- Equivalency ? QA event Ei,Ej ,if
all_elements(Ei) all_elements(Ej), then Ei
Ej, and vice versa - Generality if all_elements(Ei) is a subset of
all_elements(Ej), then Ei is more general than
Ej - Cohesiveness if elements a, b both belong to an
event Ei, and a, c do not belong to a known
event, then co-occurrence(a,b) is greater than
co-occurrence(a,c) - Predictability if elements a, b both belong to
an event Ei, then a b and b a.
12QA Event Space
- Consider an event to be a point in a
multi-dimensional QA event space. - If we know all the elements about an event, then
we can easily answer different questions about it - E.g. When did Bob Marley die ?
- As there are innate associations among these
elements if they belong to the same event
(Cohesiveness), we can use what are already known - To narrow the search scope
- To find rest of the unknown event elements, the
answer (Predictability)
13Problems to be Solved
- However, for most of the cases, it is difficult
to find the correct unknown element(s), i.e., the
correct answer - Two major problems
- Insufficient known elements
- Inexact known elements
- Solution
- Explore the use of world knowledge (Web and
WordNet glosses) to find more known elements - Exploit the lexical knowledge from (WordNet
synsets and morphemics) to find exact forms.
14How to Find a QA Event
- Using Web
- From original query term q(0) , retrieve top N
web documents - ? qi(0)?q(0), extract nearby non-trivial words in
one sentence or n words away (in Cq ) and rank
them by computing its probability of correlation
with qi(0) - Using WordNet
- ? qi(0)?q(0), extract terms that are lexically
related to qi(0) by locating them in Gloss Gq
and Synset Sq - Combine the external knowledge resources to form
term collection - Kq Cq (Gq ? Sq)
15QA Event Construction
- Structured Query Formulation
- We perform structural analysis on Kq to form
semantic groups of terms
- Given any two distinct terms ti, tj ? Kq , we
compute their - Lexical correlation
- Co-occurrence correlation
- Distance correlation
16QA Event Construction
- For example, What Spanish explorer discovered
the Mississippi River?
The final Boolean query becomes (Mississippi)
(FrenchSpanish) (Hernando Soto De)
(1541) (explorer) (first European river).
17QA Event Mining
- Extract important association rules among the
elements by using data mining techniques. - Given a QA event Ei, we define X, Y as two sets
of event elements. - Event mining studies the rules of the form X ? Y,
where X, Y are QA event element sets, X ? Y ?,
and Y? elementoriginal ?. - if X ? Y , ignore X ? Y.
- if cardinality(Y) 1, ignore X ? Y.
- if Y? elementoriginal ??, ignore X ? Y.
18Passage Answer Selection
- Select Passage based on Answer Event Score (AES)
from the relevant documents in the QA corpus - Support (X ? Y)
- Confidence (X ? Y)
- The weight for answers candidate j is defined as
19Related Modules Fine-grained Named Entity
Recognition
- Fine-grained NE Tagging
- Non-ascii Character Remover
- Number Format Converter
- E.g. one hundred eleven 111
- Rule Confliction Revolver
- Longer Length
- Ontology
- Handcrafted Priorities
20Related Modules Answer Justification
- We generate axioms based on our manually
constructed ontology. For example, - q1425 What is the population of Maryland?
- Sentence Maryland 's population is 50,000 and
growing rapidly. - Ontology Axiom (OA) Maryland (c1) population
(c1, c2) - 5000000(c2) - In this way, we could identify the wrong answer
50000, which is the surface text shown.
21Factoid Results
22Factoid Results
23Outline
- Introduction
- Factoid Subsystem
- List Subsystem
- Definition Subsystem
- Result
- Conclusion and Future Work
24List System Overview
25List Subsystem
- Multiple Answers from Same Paragraph
- Canonicalization Resolution
- Unique answer
- the States , USA, United States, etc
- Pattern-based Answer Extraction
- , and
verb - include , ,
- list of
- top number adj-superlative
26List Results
27Outline
- Introduction
- Factoid Subsystem
- List Subsystem
- Definition Subsystem
- Result
- Conclusion and Future Work
28System Overview
29Definition Subsystem
30Definition Subsystem
- Pre-processing
- document filter
- anaphora resolution
- sentence positive set and negative set
- Sentence Ranking
- Sentence weighting in Corpus
- Sentence weighting in Web
- Overall weighting
31Definition Subsystem
- Answer Generation (Progressive Maximal Margin
Relevance) - All sentences are ordered in descending order by
weights. - Add the first sentence to the summary.
- Examine the following sentences. If Weight(stc)-
Weight(next_stc) avg_sim(stc), Add next_stc to
summary - Go to Step 3) till the length limit of the target
summary is satisfied.
32Definition Results
- We empirically set the length of the summary for
People and Objects based on question
classification results.
33Outline
- Introduction
- Factoid Subsystem
- List Subsystem
- Definition Subsystem
- Result
- Conclusion and Future Work
34Overall Performance
35Conclusion and Future Work
- Conclusion
- Event-based Question Answering
- Factoid question and list questions explore the
power of Event-based QA - Definition questions answering combines IR and
Summarization - Use Ontology to boost the performance of our NE
and answer justification modules - Future Work
- Give a formal proof of our QA event hypothesis
- Working towards an online question answering
system - Interactive QA
- Analysis and opinion questions
- VideoQA question answering on news video