Exploiting Web Matrix Permutations to Speedup PageRank Computation - PowerPoint PPT Presentation

About This Presentation

Title:

Exploiting Web Matrix Permutations to Speedup PageRank Computation

Description:

Subtract off estimates of non-principal eigenvectors (Kamvar et al ) ... O - orders nodes by increasing out-degree. Q - orders nodes by decreasing out-degree ... – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 30

Provided by: kayanar

Category:

more less

Transcript and Presenter's Notes

Title: Exploiting Web Matrix Permutations to Speedup PageRank Computation

1
Exploiting Web Matrix Permutations to Speedup
PageRank Computation

Presented by Aries Chan, Cody Lawson, and
Michael Dwyer

2
Introduction

Internet Statistics
151 million active internet user as of January
2004
76 used a search engine at least once during the
month
Average time spent searching was about 40 minutes

3
Introduction

Search Engines
Most common means of accessing Web
Easiest method of organizing and accessing
information
Therefore, high quality / usable search engines
are very important

Rank Provider Searches (000) Share of Searches
- All Search 10,812,734 100.0
1 Google Search 6,986,580 64.6
2 Yahoo! Search 1,726,060 16.0
3 MSN/Windows Live/Bing Search 1,156,415 10.7
4
Introduction

Basic Idea of a Search Engine
Scanning Web Graph
Using crawlers to create an index of the
information
Ranking
Organizing and ordering information in a usable
way

5
Introduction

Webpage Ranking Problems
Personalization
http//www.google.com/history/?hlen
Updating Keeping order up to date
25 of links are changed in one week
5 new content in one week
35 of entire web changed in eleven weeks
Reducing Computation Time

6
Introduction

Accelerating the PageRank
Use compression to fit the Web Graph into main
memory
Use sequence of scans of the Web Graph to
efficiently compute in external memory
Reduce computation time through numerical methods

7
Introduction

Reduce computation time through numerical methods
Use iterative methods such as Gauss-Seidel and
Jacobi method (Arasu et al and Bianchini et al)
Subtract off estimates of non-principal
eigenvectors (Kamvar et al )
Sort the graph lexicographically by url resulting
in an approximate block structure (Kamvar et al)
Split into two problems dangling nodes and
non-dangling nodes (Lee et al)
Others have been working on ways to only update
the ranks of nodes influenced by Web Graph changes

8
Introduction

Del Corso, Gulli, and Romani(authors of the
paper)
Numerical optimization of PageRank
View the PageRank computation as a linear system
Transform a dense problem into one which uses a
sparse matrix
Treatment of dangling nodes which naturally
adapts to the random surfer model
Exploiting web matrix permutations
Increase data locality and reduce number of
iterations necessary to solve the problem

9
Googles PageRank Overview

Web as an oriented graph
The random surfer
vi ? pji vj
(sum of PageRanks of nodes linking to i weighted
by the transition probability)
Equilibrium distribution
vT vTH (left eigenvector of H with eigenvalue 1)

10
Problems with the ideal model

Dangling nodes (trap the user)
Impose a random jump to every other page
B H auT
Cyclic paths in the Web Graph (reducibility)
Brin and Page suggested adding artificial
transitions (low probability jump to all nodes)
G ?B (1 ? ?)euT

11
Current PageRank Solution

Since G is just a rank one modification of ?H,
the power method takes advantage of the sparsity
of matrix H.

12
Googles PageRank eigenproblem as a linear system

Expand
vTG vT (eigenproblem)
G ?(H auT) (1 ? ?)euT
(expansion of Google matrix)
? vT (?H ?auT) (1 ? ?)vTeuT vT
Restructure
Taking S I ? ?HT ? ?uaT
And with vTe 1
? Sv (1 ? ?)u
(after taking transpose and rearranging)

13
Dangling Nodes Problems

Dangling Nodes
Pages with no links to other pages
Pages whose existence is inferred but crawler has
not reached
According to a 2001 sample, approximately 76 are
dangling nodes.

14
Dangling Nodes Problems

Natural Model
B H auT
Jump with probability 1 to any other node
Drastic Model
Completely removes dangling nodes
Problems
Dangling nodes themselves are not ranked
Removing nodes create new dangling nodes
Self-loop Model eg.
B H F
Fij 1 if i j dangling 0 otherwise
Still row stochastic and is similar to natural
Problem
Gives unreasonably high rank to the dangling nodes

15
Which model is the best?

Natural model
the most accurate.
Problem
Gives a much more dense matrix B
It is at least partially for this reason that the
problem is approached as an eigenproblem to
exploit the sparsity of H
Can we have an equally lightweight iterative
approach?

16
Iterative Approach with Sparsity

Sparse matrix R
R I ? ?HT
PageRank v obtained by solving
Ry u, v ?y such that v11
Why does this work?
Since S R ? ?uaT
and Sv (1 ? ?)u
(R ? ?uaT)v (1 ? ?)u
Use Sherman-Morrison formula to calculate the
inverse

17
Iterative Approach with Sparsity

We get
The vector v is our PageRank vector and was
solved using sparse matrix R

(R ? ?uaT)-1 R-1 (R ? ?uaT)-1 R-1 (R ? ?uaT)-1 R-1 R-1uaTR-1 R-1uaTR-1 R-1uaTR-1
(R ? ?uaT)-1 R-1 (R ? ?uaT)-1 R-1 (R ? ?uaT)-1 R-1 1/? aTR-1u 1/? aTR-1u 1/? aTR-1u
v (1? ?)(1 aTy aTy aTy )y )y )y
v (1? ?)(1 1/? aTy 1/? aTy 1/? aTy )y )y )y
v ?y v ?y v ?y v ?y v ?y v ?y v ?y
? (1? ?)(1 ? (1? ?)(1 aTy aTy aTy ) )
? (1? ?)(1 ? (1? ?)(1 1/? aTy 1/? aTy 1/? aTy ) )
18
Exploiting Web Matrix Permutations

Use a variety of cheap operators to permute the
web graph in an organized way in hopes to
increase data locality
reduce the number of iterations in order to solve
the problem
Explore different iterative methods in order to
solve the problem quicker.
Compare the performance of different iterative
methods based on specifically permuted web
graphs.

19
Permutation Strategies

The following operations were chosen based on
their limited impact on the computational cost
(Del Corso et al.)
O - orders nodes by increasing out-degree
Q - orders nodes by decreasing out-degree
X - orders nodes by increasing in-degree
Y - orders nodes by decreasing in-degree
B - ordering the nodes according to their BFS
(Breadth First Search) order of visit
T - transposes the matrix

20
Permutation Strategies (cont.)

O, Q, X, Y, and T operators ? a full matrix
The B operator conveniently arranges R into a
lower block triangular structure due to BFS order
of visit.
Combining these operations on R, the following
structures of the permuted web matrix are
produced.

21
Permutation Strategies (cont.)

Visual representations of the permuted web graph.

22
Iterative Strategies

Power Method
Computes dominant eigenvector
Jacobi Method
Using an initial guess, approximates the solution
to the linear system of equations. Each
successive iteration uses the previous
approximated solution as its next guess till a
degree of convergence is reached.
(further explained)
Gauss-Seidel Method (Reverse Gauss-Seidel)
Modification of the Jacobi Method, which
approximates the solution to each successive
equation in the linear system based on the values
derived from the previous equations, all within
each iterative loop. (further explained)
(http//college.cengage.com/mathematics/larson/el
ementary_linear/5e/students/ch08-10/chap_10_2.pdf)

23
Iterative Strategies (cont.)

Further exploration led to iterative methods
based on the distinct block structure of certain
web graph permutations.
DN (or DNR) Method
The permuted matrix ROT has the property that the
lower diagonal block coincides with the identity
matrix. The matrix can be easily partitioned into
non-dangling and dangling portions. Then the
non-dangling part is solved by Gauss-Seidel (or
Reverse Gauss-Seidel respectively).
LB/UB/LBR/UBR Methods
Uses the Gauss-Seidel or Reverse Gauss-Seidel
methods to solve the individual blocks of the
triangular block matrices produced by the B
operator.

24
Results
25
Results (cont.)

For both the Power Method and Jacobi Method, the
number of Mflops is not dependent on permutations
of the web matrix. (the small differences in
numerical data are due to the finite precision
Del Corso et al.)
The Jacobi Method (applied to the matrix R) is
only a slight improvement (about 3) compared to
the Power Method (applied to the matrix G).

26
Results (cont.)

The Gauss-Seidel and Reverse Gauss-Seidel Methods
reduced Mflops by around 40 and running time by
around 45 on the full matrix compared to the
Power Method on the full matrix.
In particular the Reverse Gauss-Seidel Method
performed on the permuted matrix RYTB reduced the
number of Mflops by 51 and running time by 82
when compared to the Power method on the full
matrix.

27
Results (cont.)

The block methods achieved even better results
In particular, the best overall reduction in
computation time and number of Mflops was
achieved by LBR method on the permuted matrix
RQTB.
58 reduction in terms of Mflops
89 reduction in terms of running time

28
Conclusion

Objective
Accelerating Google PageRank by numerical methods
Contribution
Viewed web matrix as a sparse linear system
Formalized new method for treating dangling nodes
Explored new iterative methods and applied them
to web matrix permutations
Achievement
1/10 of the computation time
Reduced over 50 Mflops

29
References

G.M. Del Corso, A. Gulli, F. Romani, Exploiting
Web Matrix Permutations to Speedup PageRank
Computation
Nielsen MegaView Search (http//en-us.nielsen.com/
rankings/insights/rankings/internet)

Write a Comment

User Comments (0)