Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia

Description:

Where is the next Hannah Montana concert? Q2Q as a question generation shared task ... 'How do I get Hannah Montana concert tickets for a really good price? ... – PowerPoint PPT presentation

Number of Views:297
Avg rating:3.0/5.0
Slides: 15
Provided by: chiny
Category:

less

Transcript and Presenter's Notes

Title: Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia


1
Automatic Question Generation from Queries
Natural Language Computing, Microsoft Research
Asia
  • Chin-Yew LIN cyl_at_microsoft.com

2
Generating Questions from Queries
Where is the next Hannah Montana concert?
Q2Q as a question generation shared task
3
Remember Ask Jeeves?
  • How large is British Columbia?

4
Live Search QnA (English)
5
Naver Knowledge iN (Korea)
  • Naver Knowledge iN Service
  • Opened at October 2002
  • 70 Millions Knowledge iN DB are collected (2007.
    06)
  • of Users 12 millions
  • Upper level users (higher than Kosu) 6,648
    (0.05)
  • Distribution of knowledge
  • Education, Learning 17.78
  • Computer, Communication 12.89
  • Entertainments, Arts 11.42
  • Business, Economy 11.42
  • Home, Life 7.44

6
Baidu Zhidao (China)
  • 17,012,767 resolved questions in two years
    operation.
  • 8,921,610 are knowledge related.
  • 96.7 of questions are resolved.
  • 10,000,000 daily visitors.
  • 71,308 new questions per day.
  • 3.14 answers per question.
  • http//www.searchlab.com.cn (?????????/User
    Research Lab of Chinese Search)

7
Yahoo! Answers (Global Marciniak)
  • Launched in December 2005.
  • 20 million users in the U.S. (gt 90 million
    worldwide).
  • 33,557,437 resolved questions (US April 2008).
  • 70,000 new questions per day (US).
  • 6.76 answers per question (US).

8
Question Taxonomy
  • ISIs question answer typology (Hovy et al. 2001
    2002)
  • Results of analyzing over 20K online questions
  • 140 different question types with examples
  • http//www.isi.edu/natural-language/projects/webcl
    opedia/Taxonomy/taxonomy_toplevel.html
  • Liu et al. (COLING 2008)s cQA question taxonomy
  • Derived from Broders (SIGIR Forum 2002) web
    serach taxonomy
  • Results of analyzing 100 randomly sampled
    questions from top 4 Yahoo! Answers categories
  • Entertainment Music, Society Culture, Health,
    and Computer Internet

9
Main Task Q2Q
  • Generate questions given a query
  • Query Hannah Montana concert
  • Questions
  • How do I get Hannah Montana concert tickets for
    a really good price?
  • What should i wear to a hannah montana concert?
  • How long is the Hannah Montana concert?
  • Subtasks
  • Predict user goals
  • Learn question templates
  • Normalize questions

10
Data Preparation
  • cQA archives
  • Live Search QnA
  • Yahoo! Answers
  • Ask.com
  • Other sources
  • Query logs
  • MSN/Live Search
  • Yahoo!
  • Ask.com
  • TREC and other sources
  • Possible process
  • Sample queries from search engine query logs
  • Ensure broad topic coverage
  • Find candidate questions from cQA archives given
    queries
  • Create mapped Q2Q corpus for training and testing

11
Intrinsic Evaluation
  • Given a query term
  • Generate a rank list of questions related to the
    query term
  • Open set use pooling approach
  • Pool all questions from participants
  • Rate each question as relevant or not
  • Compute recall/precision/F1 scores
  • Closed set use test set data as gold standard
  • Metrics
  • Diversity, interestingness, utility, and so on.

12
Extrinsic Evaluation
  • A straw man scenario
  • Task online information seeking
  • Setup
  • A user select a topic (T) she is interested in.
  • Generate a set of N queries given T and a query
    log.
  • The user select a query (q) from the set.
  • Generate a set of M questions given q.
  • The user select the question (Q) that she has in
    mind.
  • If the user does not select any question, record
    it as not successful.
  • Send q to a search engine (S) get results X.
  • Send q, Q, and anything inferred from Q to S get
    results Y.
  • Compare results X and Y using standard IR
    relevance metrics.

13
Summary
  • Task Question generation from queries
  • Data
  • Search engine query logs
  • cQA question answer archives
  • Question taxonomies
  • Evaluation
  • Intrinsic evaluate specific technology areas
  • Extrinsic evaluate its effect on real world
    scenarios
  • Real data, real task, and real impact

14
Analyze cQA Questions (Liu et al. COLING 08)
Write a Comment
User Comments (0)
About PowerShow.com