Title: Recommending Questions Using the MDLbased Tree Cut Model
1Recommending Questions Using the MDL-based Tree
Cut Model
- Yunbo CAO, Huizhong DUAN,
- Chin-Yew LIN, Yong YU, and Hsiao-Wuen HON
- Natural Language Computing Group
- Microsoft Research Asia
2Community-based QA Service
Question Search
Other Aspects about Hamburg or Berlin
More Aspects (NOT DISCOVERED) How far is it from
Berlin to Hamburg? Where to see between Hamburg
and Berlin?
3Question Recommendation
- The problem
- You ask
- Any cool clubs in Berlin or Hamburg?
- We recommend
- How far is it from Berlin to Hamburg?
- Where to see between Hamburg and Berlin?
- Any good hostels in Hamburg or Berlin?
- The principle of question recommendation
- A good recommendation should be different from
the queried question in question focus but
similar in question topic.
4Outline
- Question recommendation
- Our approach
- A walk-through of our approach
- The uses of the MDL-based tree cut model
- The flow of question recommendation
- Related work
- Experimental results
- Conclusions
5Our Approach
The Principle A good recommendation should be
different from the queried question in question
focus but similar in question topic.
Query Any cool clubs in Hamburg or Berlin?
Topic terms cool clubs, Hamburg, Berlin
How can we discriminate question topic from
question focus?
different
Same or close
Topic terms where to see, Hamburg, Berlin
Related question where to see in Hamburg or
Berlin
6Specificity Weighing Terms
- China
- Anyone know where to see the Dragon Boat Festival
in Beijing? - Where is a good (Less expensive) place to shop in
Beijing? - What's the cheapest way to get from Beijing to
Hong Kong? - Europe
- How far is it from Berlin to Hamburg?
- What is the cheapest way from Berlin to Hamburg?
- Where to see between Hamburg and Berlin?
- How long does it take from Hamburg to Berlin?n
the train?
The specificity of a topic term is the inverse
entropy of the distribution of the topic term
over the sub-categories.
7Order Topic Terms by Specificity
Query Any cool clubs in Hamburg or Berlin?
Topic Chain Hamburg ? Berlin ? cool clubs
Topic Terms cool clubs, Hamburg, Berlin
cool clubs
Question Topic
Question Focus
Hamburg
Berlin
where to see
how far
Topic Terms where to see, Hamburg, Berlin
Topic Chain Hamburg ? Berlin ? where to see
Hamburg ? Berlin ? how far
Related questions Where to see in Hamburg or
Berlin? How
far is it from Berlin to Hamburg?
8Scoring the Candidates
- The recommendation score over a
queried question and a recommendation
candidate is defined as - where
9Outline
- Question recommendation
- Our approach
- A walk-through of our approach
- The uses of the MDL-based tree cut model
- The flow of question recommendation
- Related work
- Experimental results
- Conclusions
10The MDL-based Tree Cut Model
- The MDL principle
- Model description length uniform prior
- Parameter description length number of
parameters - Data description length minus log likelihood
- The tree cut model (Li and Abe, 1998)
11Reduction of Topic Terms
12Reduction of Topic Terms
13Determining the Cut
14Outline
- Question recommendation
- Our approach
- A walk-through of our approach
- The uses of the MDL-based tree cut model
- The flow of question recommendation
- Related work
- Experimental results
- Conclusions
15Flow of Question Recommendation
16Outline
- Question recommendation
- Our approach
- A walk-through of our approach
- The uses of the MDL-based tree cut model
- The flow of question recommendation
- Related work
- Experimental results
- Conclusions
17Related Work
- Question search (Jeon et al., 2005 Sneiders,
2002 Lai et al., 2002 Burke et al., 1997) - Find semantically equivalent questions given
queries - Satisfying different users needs when compared
to question recommendation - Query suggestion (Cuerzan White, 2007 Jensen
et al., 2006 Fonseca et al., 2003) - Suggest related queries through query log mining
- Query logs are usually absent for questions
- Query substitution (Jones et al., 2006)
- Generate queries by replacing query terms
- New queries are close to the original queries
18Outline
- Question recommendation
- Our approach
- A walk-through of our approach
- The uses of the MDL-based tree cut model
- The flow of question recommendation
- Related work
- Experimental results
- Conclusions
19Data and Evaluation Measures
- The data
- The resolved question from Yahoo! Answers
- 314,616 about travel and 210,785 about
computers internet - The test set developed via human judgments
20Experimental Results (Basic)
- Travel
- Computers Internet
21Experimental Results (Basic)
What's a good but cheap hotel/motel/anything in
downtown Chicago?
22Effectiveness of MDL
- The baseline methods
- First our approach the MDL-based reduction of
topic terms - Second our approach the MDL-based
discrimination bet. question topic and question
focus - Third our approach the MDL-based reduction of
topic terms the MDL-based discrimination bet.
question topic and question focus - The use of the MDL is significant
- The size of the vocabulary is 289,251 before the
reduction of topic terms and 173,202 after the
reduction. The reduction is about 40. - The contribution given by the MDL-based selection
of substitution is statistically significant
23Outline
- Question recommendation
- Our approach
- A walk-through of our approach
- The uses of the MDL-based tree cut model
- The flow of question recommendation
- Related work
- Experimental results
- Conclusions
24Conclusions
- Studied question recommendation by identifying
question topics and question foci - Used the MDL-based tree cut model for
- Reducing the set of topic terms
- Discriminating question topics from question foci
- Empirically verified the effectiveness of our
approach to question recommendation
25- Questions and Discussions!