SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon

About This Presentation

Title:

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon

Description:

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR * Test collection A 1. constructed from collection A. 2.100 pairs of Q&A ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 48

Provided by: uncc152

Category:

more less

Transcript and Presenter's Notes

Title: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon

1
SEARCHING QUESTION AND ANSWER ARCHIVES Dr.
Jiwoon Jeon

Presented by
CHARANYA VENKATESH KUMAR

2
Discussion

Current Information Retrieval systems?

3
OVERVIEW

Introduction
QA Retrieval
Test Collections
Translation Based QA retrieval framework
Learning word-to-word translations

4
INTRODUCTION

QA Retrieval problem
Challenges
Semantically similar questions
Problem Word mismatch problem
Solution Machine translation-based information
retrieval model
Quality of the Answers
Problem Many answers to a given question
Solution Answer Quality Prediction Technique

5
What is New?

New Type of Information System
New Translation-based Retrieval Model
New Document Quality Estimation Method
Integration of Advances in Multiple research
Areas
New Paraphrase Generation Method
Utilizing Web as a Resource for Retrieval

6
OVERVIEW

Introduction
QA Retrieval
Test Collections
Translation Based QA retrieval framework
Learning word-to-word translations

7
Q A RETRIEVAL

Question Answer Archives
Websites with FAQ
Community based question answering services
Task Definition

8
Q A Retrieval (Contd..)
9
Q A Retrieval (Contd..)

Advantages
Handle natural language questions
Return answers instead of relevant documents
Disadvantages
Can answer only previously answered questions

10
Q A RETRIEVAL SYSTEM ARCHITECTURE
11
CHALLENGES

Finding relevant Question Answer Pairs
Importance of question parts
Word mismatch problem
Estimating Answer Quality
Importance

12
OVERVIEW

Introduction
QA Retrieval
Test Collections
Translation Based QA retrieval framework
Learning word-to-word translations

13
TEST COLLECTIONS

Components
Set of documents
Set of information needs (queries)
Set of relevance judgment
Pooling Method

14
WONDIR COLLECTION

Earliest community based QA service in the US.
1 million question and answer pairs used from
this service
Average question length 27 words
Average answer length 28 words

15
Examples
16
Queries

Closed-class questions that ask fact based short
answers.
E.g. Where is Charlotte located?
Relevance Judgment
220 relevant QA pairs for 50 queries using
pooling method.
Relevance Judgment Criteria

17
WebFAQ COLLECTIONby Jijkoun and Rijke

Collection of FAQs using web crawlers-made public
for research purposes.
Found web pages that contain the word FAQ.
Used heuristic methods to automatically extract
question and answer pairs from the web pages.

18
NAVER COLLECTION

Leading portal site in South Korea
Community-based answering service
Collection A
Category information To test category specific
translations
Collection B
Non-Textual Information To build answer quality
prediction technique

19
Naver Collection (Contd..)

Question Title Body
Naver Test Collection A
Naver Test Collection B
Relevance
Question semantically related to query and
Question contains all query terms
QA pair was clicked multiple times for the
query.

20
Comparison of test Collections
21
OVERVIEW

Introduction
QA Retrieval
Test Collections
Translation Based QA retrieval framework
Learning word-to-word translations

22
Translation Based QA Retrieval framework

Use of Machine Translation technique for
information retrieval
Word mismatch problem
Translation based approach

23
IBM Statistical Machine translation Models

Do not require any linguistic knowledge of the
source or target language.
Exploits only co-occurrence statistics of terms
in training data.

24
IBM Models

Model 1
Treats every possible word alignment equally
Model 2
Assumes only positions of terms are related to
the word alignment
Model 3
The first term and the second term generated from
the same term are independent

25
IBM Models (Contd..)

Model 4
First order alignment model
Every word is dependent only on the previous
aligned word.
Model 5
Reformulation of Model 4

26
Advantages of Model 1

Efficient implementation is possible using a form
of query expansion.
Performance gain of using low level translation
models is high.
Can be easily integrated into the query likelihood

27
IBM Model 1 Equation

The probability that a query Q of length m is the
translation of a document D (of length n) is
given as

28
IBM Model 1 Equation
29
Translation based Language Models

Language model is a mechanism for generating
text.
Unigram language model
Assumes each word is generated independently
Concerns only probabilities of sampling a single
word.

30
Language modeling approach to IR

In maximum likelihood estimator, unseen words in
a document have zero probability.
Smoothing
Transfers some probability mass from the seen
words to the unseen words.
Dirichlet smoothing good performance and cheap
computational cost.

31
Language modeling approach to IR (Contd..)

The ranking function for the query likelihood
language model with Dirichlet smoothing can be
written as

32
IBM Model 1 vs. Query Likelihood

Comparable components in the two models

33
Self Translation Model

Every word has some probability to translate to
itself.
Cannot be 1
If too low deteriorate retrieval performance

34
TransLM

Final ranking Function looks like

35
Efficiency Issues and Implementation of TransLM

Flipped Translation Tables

36
Term-at-a-time Algorithm
37
OVERVIEW

Introduction
QA Retrieval
Test Collections
Translation Based QA retrieval framework
Learning word-to-word translations

38
Properties of Word Relationships

Not Symmetric
Not fixed
Change depending on retrieval or translation
tasks.
must be given as probability values.

39
Training Sample Generation

Key Idea
If two answers are very similar, then the
corresponding questions are semantically similar.
Similarity Measures
Cosine Similarity
Query Likelihood scores between two answers (LM
SCORE)
LM-HRANK

40
Word Relationship Types