PageRank - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

PageRank

Description:

At time t 1, the surfer follows an outlink from P uniformly at random ... Random surfer gets trapped ... At each time step, the random surfer has two options: ... – PowerPoint PPT presentation

Number of Views:238

Avg rating:3.0/5.0

Slides: 29

Provided by: ryanr7

Category:

more less

Transcript and Presenter's Notes

Title: PageRank

1
PageRank
2
x1 p21p34p41 p34p42p21 p21p31p41
p31p42p21 / S x2 p31p41p12 p31p42p12
p34p41p12 p34p42p12 p13p34p42 / S x3
p41p21p13 p42p21p13 / S x4 p21p13p34 / S
p12
s1
s2
p21
p31
p13
p41
p42
p34
s3
s4
S p21p34p41 p34p42p21 p21p31p41 p31p42p21
p31p41p12 p31p42p12 p34p41p12 p34p42p12
p13p34p42 p41p21p13 p42p21p13 p21p13p34
3
Ergodic Theorem Revisited

If there exists a reverse spanning tree in a
graph of the
Markov chain associated to a stochastic system,
then
the stochastic system admits the following
probability vector as a solution
(b) the solution is unique.
(c) the conditions xi 0i1,n are redundant
and the
solution can be computed by Gaussian elimination.

4
Google PageRank Patent

The rank of a page can be interpreted as the
probability that a surfer will be at the page
after following a large number of forward links.

The Ergodic Theorem
5
Google PageRank Patent

The iteration circulates the probability through
the linked nodes like energy flows through a
circuit and accumulates in important places.

Kirchoff (1847)
6
Rank Sinks
5
1
3
7
6
2
4
No Spanning Tree
7
Ranking web pages

Web pages are not equally important
www.joe-schmoe.com v www.stanford.edu
Inlinks as votes
www.stanford.edu has 23,400 inlinks
www.joe-schmoe.com has 1 inlink
Are all inlinks equal?

8
(No Transcript)
9
Simple recursive formulation

Each links vote is proportional to the
importance of its source page
If page P with importance x has n outlinks, each
link gets x/n votes

10
Matrix formulation

Matrix M has one row and one column for each web
page
Suppose page j has n outlinks
If j ! i, then Mij1/n
Else Mij0
M is a column stochastic matrix
Columns sum to 1
Suppose r is a vector with one entry per web page
ri is the importance score of page i
Call it the rank vector

11
Example
Suppose page j links to 3 pages, including i
r
12
Eigenvector formulation

The flow equations can be written
r Mr
So the rank vector is an eigenvector of the
stochastic web matrix
In fact, its first or principal eigenvector, with
corresponding eigenvalue 1

13
Example
14
Power Iteration method

Simple iterative scheme (aka relaxation)
Suppose there are N web pages
Initialize r0 1/N,.,1/NT
Iterate rk1 Mrk
Stop when rk1 - rk1 lt ?
x1 ?1iNxi is the L1 norm
Can use any other vector norm e.g., Euclidean

15
Random Walk Interpretation

Imagine a random web surfer
At any time t, surfer is on some page P
At time t1, the surfer follows an outlink from P
uniformly at random
Ends up on some page Q linked from P
Process repeats indefinitely
Let p(t) be a vector whose ith component is the
probability that the surfer is at page i at time
t
p(t) is a probability distribution on pages

16
Spider traps

A group of pages is a spider trap if there are no
links from within the group to outside the group
Random surfer gets trapped
Spider traps violate the conditions needed for
the random walk theorem

17
Random teleports

The Google solution for spider traps
At each time step, the random surfer has two
options
With probability ?, follow a link at random
With probability 1-?, jump to some page uniformly
at random
Common values for ? are in the range 0.8 to 0.9
Surfer will teleport out of spider trap within a
few time steps

18
Matrix formulation

Suppose there are N pages
Consider a page j, with set of outlinks O(j)
We have Mij 1/O(j) when j!i and Mij 0
otherwise
The random teleport is equivalent to
adding a teleport link from j to every other page
with probability (1-?)/N
reducing the probability of following each
outlink from 1/O(j) to ?/O(j)
Equivalent tax each page a fraction (1-?) of its
score and redistribute evenly

The google matrix
Gj,i q/n (1-q)Ai,j/ni
Where A is the adjacency matrix, n is the
number of nodes and q is the teleport
Probability .15

20
Page Rank

Construct the NN matrix A as follows
Aij ?Mij (1-?)/N
Verify that A is a stochastic matrix
The page rank vector r is the principal
eigenvector of this matrix
satisfying r Ar
Equivalently, r is the stationary distribution of
the random walk with teleports

21
Example
1/2 1/2 0 1/2 0 0 0 1/2
1
1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3
0.2
Yahoo
0.8
y 7/15 7/15 1/15 a 7/15 1/15 1/15 m
1/15 7/15 13/15
Msoft
Amazon
22
Dead ends

Pages with no outlinks are dead ends for the
random surfer
Nowhere to go on next step

23
Microsoft becomes a dead end
1/2 1/2 0 1/2 0 0 0 1/2
0
1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3
0.2
Yahoo
0.8
y 7/15 7/15 1/15 a 7/15 1/15 1/15 m
1/15 7/15 1/15
Msoft
Amazon
24
Dealing with dead-ends