Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements - PowerPoint PPT Presentation

About This Presentation
Title:

Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

Description:

Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements ... x = candidate refinement. popularity(x) given by recent search logs ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 11
Provided by: janpf
Learn more at: https://nlp.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements


1
Applying Diversity Metrics to Improve the
Selection of Web Search Term Refinements
  • CS224N 2008
  • Tague Griffith, Jan Pfeifer

2
Web Search Refinements
3
Problem
  • Redundant refinements in a limited space
  • Technical senses dominate others
  • Java island vs Java programming language
  • Amazon river/rain forest vs Amazon the company
  • What happens with too much diversity
  • Amazon grill houston
  • Embraer ERJ 145 Amazon

4
CBC Word Sense Similarity
  • Similarity of terms measured by feature vectors
  • Features are a combination of co-occurring words
    with their syntactic context
  • wine sip _Verb-Object, ...
  • Data from Wikipedia corpus
  • Problems
  • Little overlap between web data and Wikipedia
    data
  • Hyponym siblings too similar, but good
    refinements
  • planet jupiter and planet earth

5
Web Semantic Similarity
  • Similarity as a function of web search engines
    results
  • Maximum Marginal Relevance greedy algorithm
  • MMRargmax_x (1-a)popularity(x)
    (a)diversity(x)
  • x candidate refinement
  • popularity(x) given by recent search logs
  • diversity(x) given by overlapping search results
  • Clustering of terms demonstrates validity

6
Tools demo
  • http//abstract.homelinux.org9240/janpf/fp/divers
    ity_demo.php?termtarget

7
Tools demo
8
Tools demo
9
AB Editorial Test
  • 0.0, 0.3 and 0.8 diversity
  • Evaluate utility of refinements
  • Scale definitely better, slightly better, same
  • 17 editors
  • Mixed results, with high variability

10
Results
  • Problems with increased diversity
  • Editor penalized long refinements
  • Spam and adult terms have artificial diversity
    in web semantic
  • More mixed language results
  • Esoteric refinements
  • Refinement selection should include
  • Popularity feature
  • Diversity feature
  • Length feature
  • Category classification feature (spam, adult,
    etc.)?
Write a Comment
User Comments (0)
About PowerShow.com