SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon

Description:

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR * Test collection A 1. constructed from collection A. 2.100 pairs of Q&A ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 48
Provided by: uncc152
Category:

less

Transcript and Presenter's Notes

Title: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon


1
SEARCHING QUESTION AND ANSWER ARCHIVES Dr.
Jiwoon Jeon
  • Presented by
  • CHARANYA VENKATESH KUMAR

2
Discussion
  • Current Information Retrieval systems?

3
OVERVIEW
  • Introduction
  • QA Retrieval
  • Test Collections
  • Translation Based QA retrieval framework
  • Learning word-to-word translations

4
INTRODUCTION
  • QA Retrieval problem
  • Challenges
  • Semantically similar questions
  • Problem Word mismatch problem
  • Solution Machine translation-based information
    retrieval model
  • Quality of the Answers
  • Problem Many answers to a given question
  • Solution Answer Quality Prediction Technique

5
What is New?
  • New Type of Information System
  • New Translation-based Retrieval Model
  • New Document Quality Estimation Method
  • Integration of Advances in Multiple research
    Areas
  • New Paraphrase Generation Method
  • Utilizing Web as a Resource for Retrieval

6
OVERVIEW
  • Introduction
  • QA Retrieval
  • Test Collections
  • Translation Based QA retrieval framework
  • Learning word-to-word translations

7
Q A RETRIEVAL
  • Question Answer Archives
  • Websites with FAQ
  • Community based question answering services
  • Task Definition

8
Q A Retrieval (Contd..)
9
Q A Retrieval (Contd..)
  • Advantages
  • Handle natural language questions
  • Return answers instead of relevant documents
  • Disadvantages
  • Can answer only previously answered questions

10
Q A RETRIEVAL SYSTEM ARCHITECTURE
11
CHALLENGES
  • Finding relevant Question Answer Pairs
  • Importance of question parts
  • Word mismatch problem
  • Estimating Answer Quality
  • Importance

12
OVERVIEW
  • Introduction
  • QA Retrieval
  • Test Collections
  • Translation Based QA retrieval framework
  • Learning word-to-word translations

13
TEST COLLECTIONS
  • Components
  • Set of documents
  • Set of information needs (queries)
  • Set of relevance judgment
  • Pooling Method

14
WONDIR COLLECTION
  • Earliest community based QA service in the US.
  • 1 million question and answer pairs used from
    this service
  • Average question length 27 words
  • Average answer length 28 words

15
Examples
16
Queries
  • Closed-class questions that ask fact based short
    answers.
  • E.g. Where is Charlotte located?
  • Relevance Judgment
  • 220 relevant QA pairs for 50 queries using
    pooling method.
  • Relevance Judgment Criteria

17
WebFAQ COLLECTIONby Jijkoun and Rijke
  • Collection of FAQs using web crawlers-made public
    for research purposes.
  • Found web pages that contain the word FAQ.
  • Used heuristic methods to automatically extract
    question and answer pairs from the web pages.

18
NAVER COLLECTION
  • Leading portal site in South Korea
  • Community-based answering service
  • Collection A
  • Category information To test category specific
    translations
  • Collection B
  • Non-Textual Information To build answer quality
    prediction technique

19
Naver Collection (Contd..)
  • Question Title Body
  • Naver Test Collection A
  • Naver Test Collection B
  • Relevance
  • Question semantically related to query and
  • Question contains all query terms
  • QA pair was clicked multiple times for the
    query.

20
Comparison of test Collections
21
OVERVIEW
  • Introduction
  • QA Retrieval
  • Test Collections
  • Translation Based QA retrieval framework
  • Learning word-to-word translations

22
Translation Based QA Retrieval framework
  • Use of Machine Translation technique for
    information retrieval
  • Word mismatch problem
  • Translation based approach

23
IBM Statistical Machine translation Models
  • Do not require any linguistic knowledge of the
    source or target language.
  • Exploits only co-occurrence statistics of terms
    in training data.

24
IBM Models
  • Model 1
  • Treats every possible word alignment equally
  • Model 2
  • Assumes only positions of terms are related to
    the word alignment
  • Model 3
  • The first term and the second term generated from
    the same term are independent

25
IBM Models (Contd..)
  • Model 4
  • First order alignment model
  • Every word is dependent only on the previous
    aligned word.
  • Model 5
  • Reformulation of Model 4

26
Advantages of Model 1
  • Efficient implementation is possible using a form
    of query expansion.
  • Performance gain of using low level translation
    models is high.
  • Can be easily integrated into the query likelihood

27
IBM Model 1 Equation
  • The probability that a query Q of length m is the
    translation of a document D (of length n) is
    given as

28
IBM Model 1 Equation
29
Translation based Language Models
  • Language model is a mechanism for generating
    text.
  • Unigram language model
  • Assumes each word is generated independently
  • Concerns only probabilities of sampling a single
    word.

30
Language modeling approach to IR
  • In maximum likelihood estimator, unseen words in
    a document have zero probability.
  • Smoothing
  • Transfers some probability mass from the seen
    words to the unseen words.
  • Dirichlet smoothing good performance and cheap
    computational cost.

31
Language modeling approach to IR (Contd..)
  • The ranking function for the query likelihood
    language model with Dirichlet smoothing can be
    written as

32
IBM Model 1 vs. Query Likelihood
  • Comparable components in the two models

33
Self Translation Model
  • Every word has some probability to translate to
    itself.
  • Cannot be 1
  • If too low deteriorate retrieval performance

34
TransLM
  • Final ranking Function looks like

35
Efficiency Issues and Implementation of TransLM
  • Flipped Translation Tables

36
Term-at-a-time Algorithm
37
OVERVIEW
  • Introduction
  • QA Retrieval
  • Test Collections
  • Translation Based QA retrieval framework
  • Learning word-to-word translations

38
Properties of Word Relationships
  • Not Symmetric
  • Not fixed
  • Change depending on retrieval or translation
    tasks.
  • must be given as probability values.

39
Training Sample Generation
  • Key Idea
  • If two answers are very similar, then the
    corresponding questions are semantically similar.
  • Similarity Measures
  • Cosine Similarity
  • Query Likelihood scores between two answers (LM
    SCORE)
  • LM-HRANK

40
Word Relationship Types
  • P(QA)
  • Source Answer Target Question
  • P(AQ)
  • Source Question Target Answer
  • P(QQ)
  • P(Qlt-gtQ)

41
EM Algorithm
  • Find word relationships that maximize the
    likelihood of sampling the target text from the
    source text in training samples.

42
EM Algorithm (Contd..)
  • The translation probability from a source word t
    to a target word w is given as

43
EM Algorithm (Contd..)
  • The translation probability from a source word t
    to a target word w is given as

44
Examples
45
Examples (Contd..)
46
SUMMARY
  • Introduction
  • QA Retrieval
  • Test Collections
  • Translation Based QA retrieval framework
  • Learning word-to-word translations

47
Coming Up Next
  • Estimating Answer Quality
  • Experiments
Write a Comment
User Comments (0)
About PowerShow.com