Title: Predicting Question Quality
1Predicting Question Quality
- Bruce Croft and Stephen Cronen-Townsend
- University of Massachusetts Amherst
2Predicting Question Quality
- Actually predicting quality of retrieved passages
(or documents) - Basic result We can predict retrieval
performance (with some qualifications) - Works well on TREC ad-hoc queries
- Can set thresholds automatically
- Works with most TREC QA question classes
- For example
- Where was Tesla born?
- Clarity score 3.57
- What is sake?
- Clarity score 1.28
3Clarity score computation
Question Q, text
retrieve
model question- related language
Where was Tesla born?
yugoslavia
tesla
born
unit
nikola
film
Compute divergence
Log P
terms
4Predicting Ad-Hoc Performance
Correlations with Av. Precision for TREC Queries
Av. Precision vs. Clarity for 100 TREC title
queries. Optimal and automatic threshold values
shown
5Passage-Based Clarity
- Passages
- Whole sentence based, 250 character maximum
- From top retrieved docs
- Passage models smoothed with all of TREC-9
- Measuring performance
- Average precision (rather than MRR)
- Top ranked passages used to estimate clarity
scores - Top 100 gives 99 of max correlation
6Correlation by Question Type
7Correlation Analysis
- Strong on average (R0.255, P10-8)
- Allows prediction of question performance
- Challenging cases Amount and Famous
- General comments on difficulty
- Questions have been preselected to be good
questions for TREC QA track - Questions are less ambiguous in general than
short queries
8Precision vs. Clarity (Location Qs)
What is the location of Rider College?
Where was Tesla born?
Average Precision
What was Poes birthplace?
Where is Venezula?
Clarity Score
9Predictive Mistakes
- High clarity, low ave. prec.
- Answerless, coherent context
- What was Poes birthplace?
- birthplace and Poe do not co-occur
- Bad candidate passages
- Variant Where was Poe born? performs well,
predicts well - Low clarity, high ave. prec.
- Very rare, often few correct passages
- What is the location of Rider College?
- One passage containing correct answer
- Cannot increase language coherence among passages
- Ranked first, so average precision 1
10Challenging Types Famous
Who is Zebulon Pike?
Average Precision
Define thalassemia.
Clarity Score
- Who is Zebulon Pike?
- Many correct answers decrease clarity of good
ranked list - Define thalassemia.
- Passages using term are highly coherent, but
often do not define it
11Web Experiments
- 445 well-formed questions randomly chosen from
the Excite log - WT10g test collection
- Human predicted values of quality
- Where can I purchase an inexpensive computer?
- Clarity 0.89, human predicted ineffective
- Where can I find the lyrics to Eleanor Rigby?
- Clarity 8.08, human predicted effective
- Result Clarity scores are significantly
correlated with human predictions
12Distribution of Clarity Scores
13Predicting When to Expand Questions
- Best simple strategy always use expanded
questions - e.g. Always use relevance model retrieval
- But some questions do not work well when expanded
- NRRC workshop looking at this
- Can clarity scores be used to predict which?
- Initial idea Do ambiguous queries get worse
when expanded? Not always. - New idea Perform the expansion retrieval. Can
we use a modified clarity score to guess if the
expansion helped? Yes.
14Using Clarity to Predict Expansion
- Evaluated using TREC ad-hoc data
- Choice query-likelihood retrieval or relevance
model retrieval - Ranked list clarity measure coherence of ranked
list - Mix documents according to their rank alone
- For example top 600 documents, linearly
decreasing weights - Compute improvement in ranked list clarity scores
- First thought if difference positive, choose
relevance model results - Best thought if difference is higher than some
threshold, choose relevance model results
15Clarity and Expansion Results
- Choosing expansion using this method produces 51
of optimal improvement for TREC-8 - Choosing when to expand has more impact in TREC-8
where expanded query performance is more mixed
(only marginally better, on average, than
unexpanded) - In TREC-7, only 4 queries perform really badly
with relevance model and Clarity method predicts
2 of them.
16Predicting Expansion Improvements
threshold
tourists, violence
Change in Ave. Precision
women clergy
Legionnaires disease
killer bee attacks
Stirling engine
Change in Clarity (new ranked list old)
17Future Work
- Continue expansion experiments
- with queries and questions
- Understanding the role of the corpus
- predicting when coverage is inadequate
- more experiments on Web, heterogeneous
collections - Providing a Clarity tool
- user interface or data for QA system?
- efficiency
- Better measures...