Title: TREC2003 QA Report
1TREC2003 QA ReportA re-examination of IR
techniques in QA system Changyi Xuhongbo 20
03-11-17LCC , ICT
2Outline
- What is QA
- TREC 2003 QA Task
- Related Work
- Our Approach
- Error Analysis
- Compare Analysis
- Future Work
- Conclusion
3What is QA?
- ask a question in nature language
- return the most possible information as the
answer - a difficult problem exist in decades
4QA system categorization
- Close Domain
- Knowledge input
- Open Domain(our focus)
- IR answer extraction
5When did the Titanic sink?
6(No Transcript)
7(No Transcript)
8(No Transcript)
9Why difficult?
- How to analyze question?
- How to gather information?
- How to distill the possible answers?
- How to select answer?
10Outline
- What is QA
- TREC 2003 QA Task
- Related Work
- Our Approach
- Error Analysis
- Compare Analysis
- Future Work
- Conclusion
11Task
- Main task
- Return actual answers
- Passage task
- Only for factoid question
- Return a passage less than 250 bytes
12Document Set
- AQUAINT dist set(1,033,461 documents)
- New York Times
- Associated Press
- Xinhua News Agency newswires
13Question Types
- Factoid question (413)
- List question (37)
- Definition question (50)
- final score 1/2factoid-score 1/4list-score
1/4definition-score
14Factoid question
- short, fact-based answer
- may not have an answer in the document
collection, that is, NIL - for example
- How far is it from Earth to Mars?
- What book did Rachel Carson write in 1962?
- When was "Cold Mountain" written?
15Factoid question (Evaluation)
- Right
- Not Right
- Wrong
- Inexact
- China V.S. Chinese
- Unsupported
- China (population) V.S. China (United Nations)
16List question
- a set of instances of a specified type
- for example
- Which past and present NFL players have the last
name of Johnson? - List the names of chewing gums.
- Which U.S. presidents have died while in office?
17Definition question
- a set of interesting and salient information
items about a person, organization, or thing - for example
- Who is Aaron Copland?
- What is the vagus nerve?
18Outline
- What is QA
- TREC 2003 QA Task
- Related Work
- Our Approach
- Error Analysis
- Compare Analysis
- Future Work
19Special Method(perform excellent)
- IR Logic Prover (LCC)
- Essential is the extended WordNet which supplies
the Prover with word knowledge axioms. - IR Indicative Pattern (InsightSoft)
- The indicative patterns can be considered as a
special case of the more-general approach to text
information retrieval.
20General Method (IR IE)
- Named Entity Identifier
- GATE(Sheffield),IdentiFinder(BBN),CASS(ATT),
Textract(IBM) - PERSON,ORGANIZATION,LOCATION, COUNTRY,MONEY
21Outline
- What is QA
- TREC 2003 QA Task
- Related Work
- Our Approach
- Error Analysis
- Compare Analysis
- Future Work
- Summary
22Tools incorporated
- LT_CHUNK (Edinburgh)
- Chunks of sentence
- Pos tags of words
- GATE (Sheffield)
- Named Entity
23System Description
24To answer each question
- makes use of Chunk to identify identify the
required NE type - selects top 50 out of the 1000 given relevant
documents - matches the 50 documents at different levels and
retrieves some top rank Bi-sentence - identities the candidate entities
- selects the answer in a voting method.
25Question Analyzing Module
- Direct map question
- who(PERSON), where(LOCATION), how
many(NUMBER) - Indirect map question
- Which N
- What N
- Other question
- Simply answer NIL
26Indirect map question
- Which city is home to Superman?
- Which past and present NFL players have the last
name of Johnson? - What type of bee drills holes in wood?
- Core Noun
- the noun in question that indicates the answer.
- Use a predefined Map Lexicon to identify the
required NE type - build an Abstract Noun Lexicon
- breed,type,name
27algorithm to find Core Noun
- Step 1 Take the last noun in the first Noun
Group as Core Noun - Step 2 If the Core Noun is in Abstract Noun
Lexicon, find the last noun in the next Noun
Group as Core Noun - Step 3 If there is no suitable noun that can be
found, the Core Noun is empty.
28Multilevel Bi-sentence Selecting Module
- Bi-sentence
- two consecutive sentences without repetetion
- S1_S2 , S3_S4
- Keyword
- a word in the question but not in our Stop-word
list. - Phrase
- a sequence of keywords or one keyword in a
question
29Assumption
- 1) Bi-sentences that can match a phrase more than
one keywords are more relevant than those only
can match separate keywords. - Snow White VS. snow,white
- 2) Bi-sentences that can match a phase in
original form are more relevant than those only
can match in stemmed form. - Happy Days(book name) VS. happy days
30Four-level method (list, factoid)
- All relevant Bi-sentence are ranked
- the Bi-sentence selected from the higher level
has a higher priority - in the same level, the Bi-sentence with a larger
weight has a higher priority - the first level is based on raw matching
- the other three levels are based on stemmed
matching.
31two-level method (definition)
- Make use of the definition pattern proposed by
InsightSoft in the first level - 1. ltA is/area/an/the Xgt
- 2. ltA comma a/an/the X comma/periodgt
- 3. ltA comma or X commagt
- 4. ltA comma also called X commagt
- 5. ltX, dash A dash A dash X dashgt
- 6. ltX parenthesis- A parenthesis gt
- Our Indicative words
32Entity Recognizing Module
- GATE
- PERSON, LOCATION, COUNTRY
- Our own strategies
- YEAR, COLOR, DISEASE
- construct the possible phrase based on Core Noun.
33Answer Selecting Module
- more than one suitable Named Entity found
- assume that the right answer is more likely to
appear for several times - Voting VS. the First the Answer
- an improvement of 15.58 (TREC2002 QA corpus)
- Voting VS. Weighted Voting
- the results are similar (TREC2002 QA corpus)
34- list question
- choose the entities whose frequency in voting are
beyond a threshold - threshold varies from the required NE type
- Threshold is got from the training of TREC-02 QA
corpus. - definition question
- choose the first passage as the answer
35Result
36Outline
- What is QA
- TREC 2003 QA Task
- Related Work
- Our Approach
- Error Analysis
- Compare Analysis
- Future Work
- Summary
37Question Analyzing Error 30/500
- 1) 16.7 (5/30) error is caused by chunk error of
LT_CHUNK. - 2) 83.3 (25/30) error is that our Question
Analyzing algorithm cannot cover all questions.
38Retrieval Error (2 modules)
- only focus our analysis on the question whose
answer is not NIL - the maximum correct rate of document retrieval
with top 50 documents is 71.80 275/383 - the maximum correct rate of passage selecting by
our Multilevel Passage Selecting Module is 48.0
132/275
39Loss Distribution
40Answer Error
- inexact answer error
- inexact identification of required NE type
- inexact recognizing of NE
- unsupported answer error
- cannot avoid
- Error distribution
41Conclusion(1)
- Specific retrieval technique should be improved.
- 65.54 error results from retrieval process
including document retrieval and passage
selecting, while only 22.45 error results from
question analyzing, NE recognizing and NE
selecting
42Outline
- What is QA
- TREC 2003 QA Task
- Related Work
- Our Approach
- Error Analysis
- Compare Analysis
- Future Work
43Reason of poor result
- Loss in Document Retrieval
- Accuracy 71.80
- Loss in Sentence Retrieval
- Accuracy 48.0
- Accumulated Loss
- Accuracy 34.46
44Document Retrieval Methods
- Motivation
- Algorithm V.S. Query
- Algorithm
- PRISE
- Basic SMART
- Enhanced-SMART (ICT)
- Enhanced-SMART with with pseudo-relevant feedback
45Comparison
46What we learn
- PRISE is good
- Our Enhance-SMART do really take effect
- The performance is not satisfactory
- The point is the query
- pseudo-relevant feedback do not take effect(query
expansion) - semantic information is important in IR
- Query reformulation is necessary !
47Sentence level Retrieval Methods
- Motivation
- finding relevant sentences from the document is
difficult based on VSM (Allan, 2003) - Algorithm
- Keyword-match Retrieval
- Keyword number matched
- TFIDF-based Retrieval
- Similarity between question and sentence
- Multilevel Retrieval
- Enhanced-SMART-based Retrieval
48Comparison
49What we learn
- Our approach(Multilevel) is not effective
- the techniques for document retrieval are also
effective in sentence-level retrieval.
50Retrieval Granularity
- Bi-sentence
- S1_S2 S3_S4
- Overlapping Bi-sentence
- S1_S2 S2_S3 S3_S4
- Single sentence
- S1 S2 S3 S4
51Comparison
52What we learn
- Our granularity is the worst
- The other two are similar
- We can improve 45.45 in sentence level retrieval
- from 132 to 192
- Without information reformulation, single
sentence is the best granularity.
53Future Work
- Directly retrieve sentences from the corpus
- To eliminate Accumulated Loss
- Query reformulation
- Especially, required NE type
54Conclusion
- IR is our current problem in QA system
- A question is not a good query, need to be
reformulated - Using E-SMART to retrieval single sentence is
much more effective than ours.
55Thank you for your attention