Title: Finding Question-Answer Pairs from Online Forums
1Finding Question-Answer Pairs from Online Forums
Gao Cong Aalborg University, Aalborg, Denmark
Long Wang Tianjin University, Tianjin, China
Chin-Yew Lin Microsoft Research Asia, Beijing,
China Young-In Song Korea University, Seoul,
South Korea Yueheng Sun Tianjin University,
Tianjin, China
2Introduction
- Yahoo! Answers.
- Forums contain a huge amount of valuable user
generated content on a variety of topics. - Find Question-Answer pair in forums.
3Algorithms
- Question Detection
- 5W1H
- Most of questions are not begin with 5W1H.
- Question Mark
- 30 questions do not end with question mark.
- I am wondering where I can buy cheap and good
clothing in beijing. - Labeled Sequential Pattern (LSP)
4Graph based propagation method
- Building Graph
- Given a question q, and the set A_q of its
candidate answers. - For 2 candidate answers a1 a2 , compute KL(a1a2)
- If 1/(1KL(a1a2)) is lager than a threshold ?,
then add an edge from a1 to a2.
5Graph based propagation method
- Edge Weight
- Normalized
- ?0.01
6Computing Propagated Scores
- Propagation without initial score
- Propagation with initial score
7Answer Detection
- score(q,a)
- Cosine Similarity.
- Query likelihood language model.
- KL-divergence language model.
8Experiment
- Data
- Select three forums of different scales to obtain
source data. - Two annotators
- The kappa statistic for identifying questions is
0.96. - The kappa statistic for linking answers and
questions given a question is 0.69.
9Experiment
- Q-Tinter intersection of two annotators.
10Experiment
- 1,535 questions from 600 threads, 284 questions
do not have answers.
11Experiment
- Improved results on subsets
- Of 486 first questions, only 21 of them do not
have answers for A-TUnion data and 45 for
A-TInter data.
12Experiment
- G_K Computing weight with KL-Divergence alone.
- G_1 Propagation without initial score.
- G_2 Propagation with initial score.
13Experiment
- Data from three forum
- Tripadvisor, Lonely Planet, Bootsnall.
14 15Experiment