Relevance Propagation for Web Search - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Relevance Propagation for Web Search

Description:

Generic framework for relevance propagation ... Web Search Information Retrieval. Beside the content relevance, various structure information also plays an ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 21
Provided by: ResearchM53
Category:

less

Transcript and Presenter's Notes

Title: Relevance Propagation for Web Search


1
Relevance Propagation for Web Search
  • Dr. Tie-Yan Liu
  • Web Search and Mining Group
  • Microsoft Research Asia
  • Joint Work with Tao Qin, Tsinghua University.

2
Outline
  • Introduction
  • Generic framework for relevance propagation
  • Evaluations
  • Effectiveness analysis
  • Complexity analysis
  • Conclusions

3
Introduction
  • Web Search ? Information Retrieval
  • Beside the content relevance, various structure
    information also plays an important role in Web
    search
  • Hyperlink graph
  • Local sitemap
  • Webpage layout

4
Introduction
  • Three ways of utilizing the structure information
    for Web search
  • Linear combination of content relevance and
    importance scores computed from hyperlink graph
  • ßRelevance (1-ß) PageRank
  • Enhance link analysis with the help of content
    relevance
  • Query-dependent link graph in HITS
  • Topic-sensitive PageRank
  • Propagate content relevance along the Web
    structure
  • The use of anchor text in Search Engines
  • Hyperlink-based relevance score propagation (TREC
    2003)
  • Sitemap-based feature propagation (TREC 2004)

5
Hyperlink-based Relevance Score Propagation (Zhai
et al, TREC2003)
  • Assumption
  • Hyperlinked pages have correlated content

outlinks
links
6
Hyperlink-based Relevance Score Propagation (Zhai
et al, TREC2003)
  • Assumption
  • Hyperlinked pages have correlated content
  • Propagation model
  • Weighted inlink model
  • Weighted outlink model
  • Uniform outlink model

7
Sitemap-based Feature Propagation (Liu and Qin,
TREC2004)
  • Assumption
  • Child pages are extensions of their parent page
  • One should consider the contribution of the child
    pages while computing the relevance of the parent
    page to a query.
  • Propagation model

8
Generic Relevance Propagation Framework
  • Modification of the sitemap-based feature
    propagation model
  • Reminder of the hyperlink-based propagation model
  • A generic framework to cover both hyperlink-based
    and sitemap-based propagations

9
More Derived Propagation Models
10
Summary All Models Covered by the Generic
Framework
11
Benchmark Datasets
  • Corpora
  • .GOV
  • 1M pages
  • Queries TD 2003, 2004
  • MSN
  • 2M pages
  • Query 100 most popular queries from MSN query
    log
  • Base Ranking function
  • BM2500

12
Experimental Results (1)
TREC 2003
13
Experimental Results (2)
TREC 2004
14
Experimental Results (3)
MSN
15
Conclusions on Effectiveness
  • In general, relevance propagation can boost the
    search performance with proper parameter
    settings
  • The sitemap-based models are more effective than
    the hyperlink-based models
  • Hyperlinks ? Content Correlation, while the pages
    in the same sub site usually talk about
    correlated topics.
  • Detailed comparisons
  • The two sitemap-based models have similar
    performance.
  • Among the hyperlink-based models, the HF-WI model
    performs best.

16
Online Complexity
  • w is the size of the working set, q is the number
    of query terms, l is the average number of
    inlinks / outlinks, t is the number of
    iterations.
  • For the SS model, the complexity is O(w),
  • The SS model needs to propagate the relevance
    score of a page to its parent only once if we
    conduct the propagation from the leaf nodes in a
    bottom-up manner.
  • For the SF model, the complexity is O(qw).
  • For the HS models, the complexity is O(twl)
  • In each step of t iterations of the HS models, we
    need to propagate the relevance score of a page
    along its in-link or out-link in the sub graph of
    the working set.
  • For the HF models, the complexity is O(tqwl).

17
Online Complexity
The sitemap-based models are more efficient than
the hyperlink-based models The score-level
propagation models are faster than feature-level
models
18
Offline Complexity
  • Score-level propagation is very difficult to
    implement offline
  • The score can only be computed online w.r.t the
    query.
  • For feature-level propagations,
  • The time complexity of the SF model for offline
    implementation is acceptable
  • 62.2 hours, or 2.6 days to re-index 8 billion
    pages
  • The time complexity of the HF model is out of
    tolerance.
  • 1083 hours, or 45 days to re-index 8 billion
    pages
  • The ST model is easy for parallel implementation
    while the parallel implementation of the HF model
    is non-trivial

19
Conclusions of this Study
  • Generally speaking, relevance propagation can
    boost the performance of web information
    retrieval.
  • Sitemap-based propagation models outperform
    hyperlink-based propagation models in terms of
    both effectiveness and efficiency. Notably,
    sitemap-based propagation can be implemented in
    parallel.
  • Score-level propagation and feature-level
    propagation have almost similar effectiveness.
    Although the former is more efficient in on-line
    implementations, it is not practical for
    real-world search engines because it can not be
    implemented offline.
  • Overall speaking, sitemap-based feature
    propagation model is the best choice for real
    search engines.

20
Thanks!
  • tyliu_at_microsoft.com
  • http//research.microsoft.com/users/tyliu/
Write a Comment
User Comments (0)
About PowerShow.com