Mining Anchor Text for Query Refinement - PowerPoint PPT Presentation

About This Presentation
Title:

Mining Anchor Text for Query Refinement

Description:

Collections of anchor text can give a view of the destination page. Na ve approach: ... This ranked pages based on the most frequently occurring 2 and 3 term ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 14
Provided by: markstr1
Category:

less

Transcript and Presenter's Notes

Title: Mining Anchor Text for Query Refinement


1
Mining Anchor Text for Query Refinement Reiner
Kraft and Jason Zien IBM Almaden Research Center
Mark Strohmaier
2
Problem Motivation
  • 23 of search queries are single-term
  • Expanding the query can lead to more accurate
    searches
  • Previous studies indicate that anchor text is
    statistically similar to search queries
  • Can this similarity be exploited to improve
    search queries?

3
What is anchor text?
  • lta hrefthis is the websitegt This is the
    anchor text lt/agt
  • Destination pages can have multiple links
    pointing to them
  • Collections of anchor text can give a view of the
    destination page
  • Naïve approach
  • Find links whose anchor text is similar to the
    query
  • Return the links destination pages to the user

4
Problems with naïve approach
  • High term frequency is not directly related to
    page quality
  • Repeated terms may lead to unnatural queries
  • IDF is not necessarily relevant
  • Anchor text may appear multiple times

5
Methods of Query Refinement
  • Weighting the number of occurrences
  • Weight based on the type of anchor text
  • Number of terms in the anchor text
  • Smaller terms is better
  • Number of characters in the anchor text
  • More concise queries are better

6
Benefits of the Anchor Text
  • There is much less anchor text than document text
  • Pages can have many incoming links
  • Refined anchor text can capture a degree of site
    popularity

7
Mining Anchor Text
  • Initial web crawl covered 33 million links on IBM
    intranet
  • Additionally, roughly 350,000 queries were
    analyzed
  • Both categories showed a similar relationship
    between length and number of occurrences

8
Pre-processing Summaries
  • Query refinement is sensitive to the number of
    terms
  • Too few may not lead to much improvement
  • Too many may lead to overspecialization

Best results were for MAXCOUNT 3
9
Studies Performed
  • Three different approaches were compared
  • Anchor
  • Ranked Anchor Text refinement
  • Doc.SW
  • This ranked pages based on the most frequently
    occurring 2 and 3 term phrases
  • DOC
  • Similar to Doc.SW, but not counting stop words

10
Ranking Anchor Texts
  • The results are ranked based on
  • WCOUNT score
  • Number of terms in the anchor summary
  • Number of characters in the anchor summary

11
Comparison of Methods
  • Second comparison tested 22 different queries
  • QUERYLOG processes and dynamically updates user
    queries based on previous ones, in a similar
    manner as ANCHOR

12
Conclusions
  • Using anchor text leads to better results than
    performing similar methods on document
    collections
  • A similar approach can be used to refine user
    search queries as well

13
Future Directions
  • Broadening search queries
  • Lexical analysis, rather than straight textual
  • Pre- and Post- anchor text
Write a Comment
User Comments (0)
About PowerShow.com