Linkage Analysis

About This Presentation

Title:

Linkage Analysis

Description:

Imagine a surfer surfing the WWW. At each step of the walk, the surfer will perform ... Let xp(t) be the probability that the surfer is at the page p at time t. ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 17

Provided by: yxie

Category:

more less

Transcript and Presenter's Notes

Title: Linkage Analysis

1
Linkage Analysis

Dr. Ying Xie

2
Acknowledgement

This lecture note cites, adapts, or refers to
some information from the following sources
http//www.stanford.edu/class/cs276a/
http//www.stanford.edu/class/cs276b/
http//www-clips.imag.fr/mrim/essir03/PDF/10.Meluc
ci.pdf
Diligenti ET AL., A unified probabilistic
framework for web page scoring systems
IEEE Trans. Knowledge and Data Engineering,
Vol, 16(1), Jan, 2004
T. H. Haveliwala, Topic-Sensitive Pageranking A
context-sensitive ranking algorithm for web
search, IEEE Trans. Knowledge and Data
Engineering, Vol, 15(4), July/August, 2003

3
Web Search Process
Crawler
Information Need
Formulation
Indexing
Query Rep
Inverted index and web graph
Ranking
Ranked List
Learning
User Relevance Feedback
4
Some challenges of web search

No central editorials
Web page quality is highly heterogeneous
- Although some pages are relevant to your
query, but they are in very low quality.
Therefore web search should take advantage of
both relevance and quality/reputation of a page.
The question is how to evaluate the
quality/reputation of a web page.
The answer is by linkage analysis

5
The web as a graph

The Web can be viewed as a complex graph G
Each page is a node Dp, Dp ,
Each hyperlink is an arc. e (Dp, Dq)
The topology of this graph G is the result of the
behaviors of the community of all the web
authors.

6
Web as a graph (2)

Therefore, the graph topology carries much
information related to the cooperative
interaction of many agents
Based on this, two assumptions can be made
If a page is pointed to by a number of good
pages, this page itself should be good
Less quality sites are unlikely to have many
high-quality sites linking to them

7
Model for linkage analysis Random walk theory

Based on the web graph G, Random walk theory has
been proposed as a framework for conducting the
linkage analysis.
By random walk theory, the quality/reputation of
a page can be computed as the probability of
visiting that page in a random walk on the web
graph.

Imagine a surfer surfing the WWW. At each step
of the walk, the surfer will perform one of the
following three actions
Randomly jump to any node/page in the graph
(this action is denoted as j)
Following a hyperlink from the current page
(this action is denoted as l)
Following a hyperlink in the inverse direction
(this action is denoted as b)

9
Random walk theory single surfer walk

Based on the random walk model, we can have two
set of probabilities
Set 1
- x(jq) probability of the surfer choosing
jumping from page q.
- x(lq) probability of the surfer choosing
following a hyperlink in page q.
- x(bq) probability of the surfer following
an inverse link from q.
The above probability must satisfy the following
constrains

10
Random walk theory single surfer walk (2)
Set 2 x(pq, j) probability of jumping from
page q to page p. x(pq, l) probability of
following a hyperlink in page q to page p. x(pq,
b) probability of following an inverse link back
to page p from q. The above probabilities
should satisfy the following constraints
11
Random walk theory single surfer walk (3)

Let xp(t) be the probability that the surfer is
at the page p at time t.
Then x(t) x1(t), , xN(t) is the probability
distribution on all the pages (N is the total
number of pages) at time t.
So, how to calculate xp(t1)?

12
Random walk theory single surfer walk (4)

So, if x(0) x1(0), , xN(0) is known, we can
calculate
xp(t).
So, how to get x(0) x1(0), , xN(0)?
Here is an interesting proposition

That means x x1, , xN can be used to
represent the quality/reputation of each page.
13
Googles page ranking

The calculation of Googles page ranking can be
modeled by a simpler version of single surfer
walk - Only two actions are allowed by the
surfer
- Randomly jump to any node/page in the
graph from page q with probability x(jq) 1-d
- Following a hyperlink from the current
page q with the probability x(lq) d
- Given the jump action is taken, the
probability of jumping to page p is x(pj) 1/N,
where N is the total number of the web pages
- Given the following a link action is
taken, the probability of reaching page p from q
by following link l is x(pq, l) 1/hq, where
hq is the number of links in the page q.

14
Googles page ranking(2)

So we can calculate the probability that the
surfer will reach page p at time t.

x(jq) 1-d gt 0 guarantees that the pagerank
vector x(t) converges to a distribution of page
scores that doesnt depend on the initial
distribution.

15
Googles page rank (3)
16
Googles page rank (4)
B
A
C

Write a Comment

User Comments (0)

About PowerShow.com

Linkage Analysis - PowerPoint PPT Presentation

Linkage Analysis

Imagine a surfer surfing the WWW. At each step of the walk, the surfer will perform ... Let xp(t) be the probability that the surfer is at the page p at time t. ... – PowerPoint PPT presentation