Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods

Description:

Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 35
Provided by: csCmuEdu108
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods


1
Distributed PageRank ComputationBased on
Iterative Aggregation-Disaggregation Methods
  • Yangbo Zhu, Shaozhi Ye and Xing Li
  • Tsinghua University, Beijing, China
  • ACM CIKM 2005, Bremen

2
Outline
  • Quick Review of PageRank
  • Distributed PageRank Computation
  • Motivation
  • Basic Idea
  • Algorithm
  • Experiments
  • Conclusion and Future Work

3
PageRank - Background
  • Ranking Web pages
  • Content-based methods
  • Link-based methods
  • PageRank Page Brin, 1998
  • HITS Kleinberg, 1998
  • SALSA Lempel Moran, 2000

4
PageRank - Intuition
  • Page A points to B means that the author of A
    recommends B.
  • A page is of high quality if it is
  • referred to by many other pages
  • referred to by pages of high quality

5
PageRank - Model
  • Random Surfer - Markov Chain

6
PageRank - Algorithm
  • Power method

7
Outline
  • Quick Review of PageRank
  • Distributed PageRank Computation
  • Motivation
  • Basic Idea
  • Algorithm
  • Experiments
  • Conclusion and Future Work

8
Motivation
  • Compass search engine confederation

9
Motivation (cont.)
10
Basic Idea
  • Divide and conquer
  • Make use of the natural block structure of web
    graphs

11
DPC Algorithm
  • Step 1 - Initialization
  • Local nodes compute local PageRank vectors.

12
DPC Algorithm (cont.)
  • Step 2 - Aggregation
  • Central node computes the NodeRank vector.

13
DPC Algorithm (cont.)
  • Step 3 - Disaggregation
  • Local nodes compute extended local PageRank
    vectors.

X External nodes
14
DPC Algorithm (cont.)
  • Step 4 - Central node computes the L1 distance
    between current global PageRank vector and
    previous one.

15
Advantages
  • DPC mainly consists of standard PageRank
    computation.
  • Small matrices fit into main memory.
  • Low communication overhead.

16
Outline
  • Quick Review of PageRank
  • Distributed PageRank Computation
  • Motivation
  • Basic Idea
  • Algorithm
  • Experiments
  • Conclusion and Future Work

17
Experimental Setup
  • Simulation on a single Linux box.
  • Group web pages by sites.
  • For comparison
  • Classic power method
  • LPR-Ref-2 algorithm in Wang, VLDB 2004

18
Data Sets
  • ST01/03 - crawled in 2001/2003 by Stanford
    WebBase Project
  • CN04 - crawled in 2004 from web sites in China.

19
Evaluation Metrics
  • L1 distance
  • Kendall's t-distance
  • if page i and j are
    in different order in the two ranking lists.

20
Accuracy of the First Iteration
  • L1
  • Kendall

21
Convergence Rate
  • Number of iteration
  • for convergence
  • ( )

22
Outline
  • Quick Review of PageRank
  • Distributed PageRank Computation
  • Experiments
  • Conclusion and Future Work

23
Conclusion
  • A distributed PageRank computation algorithm
    based on iterative aggregation-disaggregation
    (IAD) methods with Block Jacobi smoothing.
  • Experiments on real web graphs show that DPC
    outperforms LPR-Ref-2Wang, VLDB'04, and
    converges 57 times faster than Power method.

24
Future Work
  • Implement DPC in distributed system. Integrate
    with Compass search engine confederation.
  • How to update PageRank vectors efficiently within
    DPC framework?

25
  • Thank you !

26
General PageRank Algorithm
27
IAD Method - Notations
  • Aggregation matrix(nN)
  • Disaggregation matrix(Nn)

28
IAD Method
29
DPC Algorithm
30
DPC Algorithm (Cont.)
31
DPC Algorithm (Cont.)
32
DPC -Convergence Analysis
  • The global convergence of IAD method is still an
    open problem.
  • The difficulty partly comes from that the
    disaggregation step is non-linear.
  • The paper proves the global convergence of Block
    Jacobi method in PageRank scenario when n gt 2.

33
Experiments - Basic Facts
  • Distribution over number of pages hosted by sites
    of different size
  • Distribution over size of sites

34
Experiments - Communication Overhead
Power
LPR-Ref-2 / DPC
  • Pos() - Number of positive elements
  • L/U - Block strictly lower/upper triangular part
    of P
Write a Comment
User Comments (0)
About PowerShow.com