Title: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon
1SEARCHING QUESTION AND ANSWER ARCHIVES Dr.
Jiwoon Jeon
- Presented by
- CHARANYA VENKATESH KUMAR
2Discussion
- Current Information Retrieval systems?
3OVERVIEW
- Introduction
- QA Retrieval
- Test Collections
- Translation Based QA retrieval framework
- Learning word-to-word translations
4INTRODUCTION
- QA Retrieval problem
- Challenges
- Semantically similar questions
- Problem Word mismatch problem
- Solution Machine translation-based information
retrieval model - Quality of the Answers
- Problem Many answers to a given question
- Solution Answer Quality Prediction Technique
5What is New?
- New Type of Information System
- New Translation-based Retrieval Model
- New Document Quality Estimation Method
- Integration of Advances in Multiple research
Areas - New Paraphrase Generation Method
- Utilizing Web as a Resource for Retrieval
6OVERVIEW
- Introduction
- QA Retrieval
- Test Collections
- Translation Based QA retrieval framework
- Learning word-to-word translations
7Q A RETRIEVAL
- Question Answer Archives
- Websites with FAQ
- Community based question answering services
- Task Definition
8Q A Retrieval (Contd..)
9Q A Retrieval (Contd..)
- Advantages
- Handle natural language questions
- Return answers instead of relevant documents
- Disadvantages
- Can answer only previously answered questions
10Q A RETRIEVAL SYSTEM ARCHITECTURE
11CHALLENGES
- Finding relevant Question Answer Pairs
- Importance of question parts
- Word mismatch problem
- Estimating Answer Quality
- Importance
12OVERVIEW
- Introduction
- QA Retrieval
- Test Collections
- Translation Based QA retrieval framework
- Learning word-to-word translations
13TEST COLLECTIONS
- Components
- Set of documents
- Set of information needs (queries)
- Set of relevance judgment
- Pooling Method
14WONDIR COLLECTION
- Earliest community based QA service in the US.
- 1 million question and answer pairs used from
this service - Average question length 27 words
- Average answer length 28 words
15Examples
16Queries
- Closed-class questions that ask fact based short
answers. - E.g. Where is Charlotte located?
- Relevance Judgment
- 220 relevant QA pairs for 50 queries using
pooling method. - Relevance Judgment Criteria
-
17WebFAQ COLLECTIONby Jijkoun and Rijke
- Collection of FAQs using web crawlers-made public
for research purposes. - Found web pages that contain the word FAQ.
- Used heuristic methods to automatically extract
question and answer pairs from the web pages.
18NAVER COLLECTION
- Leading portal site in South Korea
- Community-based answering service
- Collection A
- Category information To test category specific
translations - Collection B
- Non-Textual Information To build answer quality
prediction technique
19Naver Collection (Contd..)
- Question Title Body
- Naver Test Collection A
- Naver Test Collection B
- Relevance
- Question semantically related to query and
- Question contains all query terms
- QA pair was clicked multiple times for the
query.
20Comparison of test Collections
21OVERVIEW
- Introduction
- QA Retrieval
- Test Collections
- Translation Based QA retrieval framework
- Learning word-to-word translations
22Translation Based QA Retrieval framework
- Use of Machine Translation technique for
information retrieval - Word mismatch problem
- Translation based approach
23IBM Statistical Machine translation Models
- Do not require any linguistic knowledge of the
source or target language. - Exploits only co-occurrence statistics of terms
in training data.
24IBM Models
- Model 1
- Treats every possible word alignment equally
- Model 2
- Assumes only positions of terms are related to
the word alignment - Model 3
- The first term and the second term generated from
the same term are independent -
25IBM Models (Contd..)
- Model 4
- First order alignment model
- Every word is dependent only on the previous
aligned word. - Model 5
- Reformulation of Model 4
26Advantages of Model 1
- Efficient implementation is possible using a form
of query expansion. - Performance gain of using low level translation
models is high. - Can be easily integrated into the query likelihood
27IBM Model 1 Equation
- The probability that a query Q of length m is the
translation of a document D (of length n) is
given as
28IBM Model 1 Equation
29Translation based Language Models
- Language model is a mechanism for generating
text. - Unigram language model
- Assumes each word is generated independently
- Concerns only probabilities of sampling a single
word.
30Language modeling approach to IR
- In maximum likelihood estimator, unseen words in
a document have zero probability. - Smoothing
- Transfers some probability mass from the seen
words to the unseen words. - Dirichlet smoothing good performance and cheap
computational cost.
31Language modeling approach to IR (Contd..)
- The ranking function for the query likelihood
language model with Dirichlet smoothing can be
written as
32IBM Model 1 vs. Query Likelihood
- Comparable components in the two models
33Self Translation Model
- Every word has some probability to translate to
itself. - Cannot be 1
- If too low deteriorate retrieval performance
34TransLM
- Final ranking Function looks like
35Efficiency Issues and Implementation of TransLM
- Flipped Translation Tables
36Term-at-a-time Algorithm
37OVERVIEW
- Introduction
- QA Retrieval
- Test Collections
- Translation Based QA retrieval framework
- Learning word-to-word translations
38Properties of Word Relationships
- Not Symmetric
- Not fixed
- Change depending on retrieval or translation
tasks. - must be given as probability values.
39Training Sample Generation
- Key Idea
- If two answers are very similar, then the
corresponding questions are semantically similar. - Similarity Measures
- Cosine Similarity
- Query Likelihood scores between two answers (LM
SCORE) - LM-HRANK
40Word Relationship Types
- P(QA)
- Source Answer Target Question
- P(AQ)
- Source Question Target Answer
- P(QQ)
- P(Qlt-gtQ)
41EM Algorithm
- Find word relationships that maximize the
likelihood of sampling the target text from the
source text in training samples.
42EM Algorithm (Contd..)
- The translation probability from a source word t
to a target word w is given as
43EM Algorithm (Contd..)
- The translation probability from a source word t
to a target word w is given as
44Examples
45Examples (Contd..)
46SUMMARY
- Introduction
- QA Retrieval
- Test Collections
- Translation Based QA retrieval framework
- Learning word-to-word translations
47Coming Up Next
- Estimating Answer Quality
- Experiments