Title: Extrapolation Methods for Accelerating PageRank Computations
1Extrapolation Methods for Accelerating PageRank
Computations
- Sepandar D. Kamvar
- Taher H. Haveliwala
- Christopher D. Manning
- Gene H. Golub
- Stanford University
2Motivation
- Problem
- Speed up PageRank
- Motivation
- Personalization
- Freshness
Note PageRank Computations dont get faster as
computers do.
3Outline
- Definition of PageRank
-
- Computation of PageRank
- Convergence Properties
- Outline of Our Approach
- Empirical Results
4Link Counts
Seps Home Page
Tahers Home Page
CS361
CNN
DB Pub Server
Yahoo!
Linked by 2 Important Pages
Linked by 2 Unimportant pages
5Definition of PageRank
- The importance of a page is given by the
importance of the pages that link to it.
6Definition of PageRank
Taher
Sep
Yahoo!
CNN
DB Pub Server
7PageRank Diagram
0.333
0.333
0.333
Initialize all nodes to rank
8PageRank Diagram
0.167
0.333
0.333
0.167
Propagate ranks across links (multiplying by link
weights)
9PageRank Diagram
0.5
0.333
0.167
10PageRank Diagram
0.167
0.5
0.167
0.167
11PageRank Diagram
0.333
0.5
0.167
12PageRank Diagram
0.4
0.4
0.2
After a while
13Computing PageRank
- Initialize
- Repeat until convergence
14Matrix Notation
15Matrix Notation
Find x that satisfies
16Power Method
- Initialize
- Repeat until convergence
17A side note
- PageRank doesnt actually use PT. Instead, it
uses AcPT (1-c)ET. - So the PageRank problem is really
- not
18Power Method
- And the algorithm is really . . .
- Initialize
- Repeat until convergence
19Outline
- Definition of PageRank
-
- Computation of PageRank
- Convergence Properties
- Outline of Our Approach
- Empirical Results
20Power Method
Express x(0) in terms of eigenvectors of A
u1 1
u2 a2
u3 a3
u4 a4
u5 a5
21Power Method
u1 1
u2 a2?2
u3 a3?3
u4 a4?4
u5 a5?5
22Power Method
u1 1
u2 a2?22
u3 a3?32
u4 a4?42
u5 a5?52
23Power Method
u1 1
u2 a2?2k
u3 a3?3k
u4 a4?4k
u5 a5?5k
24Power Method
u1 1
u2 0
u3 0
u4 0
u5 0
25Why does it work?
- Imagine our n x n matrix A has n distinct
eigenvectors ui.
26Why does it work?
- From the last slide
- To get the first iterate, multiply x(0) by A.
- First eigenvalue is 1.
- Therefore
27Power Method
u1 1
u2 a2
u3 a3
u4 a4
u5 a5
28Convergence
- The smaller l2, the faster the convergence of the
Power Method.
u1 1
u2 a2?2k
u3 a3?3k
u4 a4?4k
u5 a5?5k
29Our Approach
Estimate components of current iterate in the
directions of second two eigenvectors, and
eliminate them.
u1
u2
u3
u4
u5
30Why this approach?
- For traditional problems
- A is smaller, often dense.
- l2 often close to l1, making the power method
slow. - In our problem,
- A is huge and sparse
- More importantly, l2 is small1.
- Therefore, Power method is actually much faster
than other methods.
- 1(The Second Eigenvalue of the Google Matrix
dbpubs.stanford.edu/pub/2003-20.)
31Using Successive Iterates
32Using Successive Iterates
x(0)
x(1)
u1
33Using Successive Iterates
x(0)
x(1)
x(2)
u1
34Using Successive Iterates
x(0)
x(1)
x(2)
u1
35Using Successive Iterates
x(0)
x(1)
x u1
36How do we do this?
- Assume x(k) can be written as a linear
combination of the first three eigenvectors (u1,
u2, u3) of A. - Compute approximation to u2,u3, and subtract it
from x(k) to get x(k)
37Assume
- Assume the x(k) can be represented by first 3
eigenvectors of A
38Linear Combination
- Lets take some linear combination of these 3
iterates.
39Rearranging Terms
- We can rearrange the terms to get
Goal Find b1,b2,b3 so that coefficients of u2
and u3 are 0, and coefficient of u1 is 1.
40Summary
- We make an assumption about the current iterate.
- Solve for dominant eigenvector as a linear
combination of the next three iterates. - We use a few iterations of the Power Method to
clean it up.
41Outline
- Definition of PageRank
-
- Computation of PageRank
- Convergence Properties
- Outline of Our Approach
- Empirical Results
42Results
Quadratic Extrapolation speeds up convergence.
Extrapolation was only used 5 times!
43Results
Extrapolation dramatically speeds up convergence,
for high values of c (c.99)
44Take-home message
- Speeds up PageRank by a fair amount, but not by
enough for true Personalized PageRank. - Ideas are useful for further speedup algorithms.
- Quadratic Extrapolation can be used for a whole
class of problems.
45The End
- Paper available at http//dbpubs.stanford.edu/pub/
2003-16