Exploiting the Hierarchical Structure for Link Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Exploiting the Hierarchical Structure for Link Analysis

Description:

Presented by: Xiaoguang Qi. 2005-10-18. intro. Page 2. Introduction ... Web pages are aggregated based on their hierarchical structure at directory, ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 11
Provided by: xiaogu
Category:

less

Transcript and Presenter's Notes

Title: Exploiting the Hierarchical Structure for Link Analysis


1
Exploiting the Hierarchical Structure for Link
Analysis
  • Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu,
    Zheng Chen
  • Presented by Xiaoguang Qi

2
Introduction
  • Existing link analysis algorithms often suffer
    from two problems
  • Sparsity of link graph
  • Biased-ranking of newly-emerging pages
  • Incorporate the inherent hierarchical structure
    of the web into link analysis to deal with these
    problems

3
Sketch of Hierarchical Ranking Algorithm
  1. Web pages are aggregated based on their
    hierarchical structure at directory, host or
    domain level
  2. Link analysis if performed on the aggregated
    graph
  3. The importance of each node on the aggregated
    graph is distributed to individual pages belong
    to the node

4
Two-Layer Hierarchical Graph
  • Upper-layer graph
  • Partition the page set on a certain level
  • One supernode for each partition
  • Edges between supernodes are weighted
  • Weight (Si?Sj) links from pages in Si to
    pages in Sj
  • Lower-layer graph
  • All the pages within a supernode are organized in
    a hierarchical structure based on the URL
    relationship

5
Hierarchical Random Walk Model
  • Surf on the lower-layer graph
  • Go to another page within current supernode
  • Surf on the upper-layer graph
  • Follow a link originated from current supernode
  • Jump to a random supernode

6
Calculating Supernode Importance
  • Supernode importance
  • In matrix form

7
Calculating Page Importance
  • Constructing weighted tree structure
  • Calculating page importance by DHC

8
Parameter Tuning
  • Aggregation level
  • Host level aggregation is the best choice
  • Parameter tuning
  • ?0.6
  • a0.6
  • ß0.4
  • ?0.8

9
Experimental Results
  • Hierarchical ranking algorithm consistently
    outperforms other well-known ranking algorithms
  • BM2500, BlockRank, PageRank, LayerRank,
    WeightedRank, HostRank
  • Ranking on sparse data
  • Effectively alleviate the sparse link problem

10
Experimental Results (Cont.)
  • Ranking of new pages
  • Aim to assign reasonable rank to newly-emerging
    web pages
  • Test in an analogous way
  • Test set 10,000 pages randomly selected with
    different rank values
  • Remove 90 of their incoming links
  • Perform algorithms on the modified graph
Write a Comment
User Comments (0)
About PowerShow.com