Prestige Seeley, 1949 Brin

About This Presentation

Title:

Prestige Seeley, 1949 Brin

Description:

Use edge-weighted, directed graphs to model social networks. Status/Prestige ... Query Sensitive retrieving [Krishna Bharat SIGIR'98] PageRank vs. HITS ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 31

Provided by: olvilman

Category:

more less

Transcript and Presenter's Notes

Title: Prestige Seeley, 1949 Brin

1
Prestige(Seeley, 1949 Brin Page, 1997
Kleinberg,1997)

Use edge-weighted, directed graphs to model
social networks
Status/Prestige
In-degree is a good first-order indicator

2
Notations

Document citation graph,
Node adjacency matrix E
Ei,j 1 iff document i cites document j, and
zero otherwise.
Prestige pv associated with every node v
Prestige vector over all nodes p

3
(No Transcript)
4
(No Transcript)
5
Fixpoint Prestige Vector

Confer to all nodes v the sum total of prestige
of all u which links to v
Gives a new prestige score p
Fixpoint for prestige vector
iterative assignment
Fixpoint principal eigenvector of E
Variants attenuation factor

6
Centrality

Graph-based notions of centrality
Distance d(u,v) number of links between u and v
Radius of node u is
Center of the graph is
Example
Influential papers in an area of research by
looking for papers u with small r(u)
No single measure is suited for all applications

7
Co-citation

v and w are said to be co-cited by u.
If document u cites documents v and w
Ei,j document citation matrix
gt ETE co-citation index matrix
Indicator of relatedness between v and w.
Clustering
Using above pair-wise relatedness measure in a
clustering algorithm

8
MDS Map of WWW Co-citationsSocial structure of
Web communities concerning Geophysics, climate,
remote sensing, and ecology. The cluster labels
are generated manually. Courtesy Larson
9
The surfing model

Correspondence between surfer model and the
notion of prestige
Page v has high prestige if the visit rate is
high
This happens if there are many neighbors u with
high visit rates leading to v
Deficiency
Web graph is not strongly connected
Only a fourth of the graph is !
Web graph is not aperiodic
Rank-sinks
Pages without out-links
Directed cyclic paths

10
Surfing Model Simple fix

Two way choice at each node
With probability d (0.1 lt d lt 0.2), the surfer
jumps to a random page on the Web.
With probability 1d the surfer decides to
choose, uniformly at random, an out-neighbor
MODIFIED EQUATION 7.9
Direct solution of eigen-system not feasible.

11
Solution Power Iterations
12
PageRank Architecture at Google

Ranking of pages more important than exact values
of pi
Convergence of page ranks in 52 iterations for a
crawl with 322 million links.
Pre-compute and store the PageRank of each page.
PageRank independent of any query or textual
content.

Ranking scheme combines PageRank with textual
match
Unpublished
Many empirical parameters, human effort and
regression testing.
Criticism Ad-hoc coupling and decoupling
between relevance and prestige

14
HITS Hyperlink Induced Topic Search

Relies on query-time processing
To select base set Vq of links for query q
constructed by
selecting a sub-graph R from the Web (root set)
relevant to the query
selecting any node u which neighbors any r \in R
via an inbound or outbound edge (expanded set)
To deduce hubs and authorities that exist in a
sub-graph of the Web
Every page u has two distinct measures of merit,
its hub score hu and its authority score au.
Recursive quantitative definitions of hub and
authority scores

15
Use text-based search engine to create a root set
of matching documents Expand root set to form
base set context graph of depth 1 additional
heuristics
16

Associate two numerical scores with each document
in a hyperlinked collection authority score and
hub score
Authorities most definitive information sources
(on a specific topic)
Like conference papers (new ideas)
Hubs most useful compilation of links to
authoritative documents
Like journal papers or books (consolidate or
survey significant research)

17
J. Kleinberg. Authoritative sources in a
hyperlinked environment. Proc. 9th ACM-SIAM
Symposium on Discrete Algorithms, 1998

Basic presumptions
Creation of links indicates judgment conferred
authority, endorsement
Authority is not conferred directly from page to
page, but rather mediated through hub nodes
authorities may not be linked directly but
through co-citation
Example major car manufacturer pages will not
point to each other, but there may be hub pages
that compile links to such pages

18
Hub Authority Scores

Hubs and authorities exhibit what could be
called a mutually reinforcing relationship a
good hub is a page that points to many good
authorities a good authority is a page that is
pointed to by many good hubs Kleinberg 1999

19
(No Transcript)
20
The HITS algorithm. h and aare L1 vector
norms
21
Iterative Score Computation (1)

Translate mutual relationship
into iterative update equations

22
Iterative Score Computation (2)

Matrix notation

Adjacency matrix
Score vectors
23
Iterative Score Computation (3)

Condense into a single update equation (e.g.)
Question of convergence (ignore absolute scale)
Notice resemblance with eigenvector equations

24
Example

Simple example graph
Hub authority matrices
Authority and Hub weights

25
HITS Topic Distillation Process

Send query to a text-based IR system and obtain
the root-set.
Expand the root-set by radius one to obtain an
expanded graph.
Run power iterations on the hub and authority
scores together.
Report top-ranking authorities and hubs.

26
HITS Applications

Clever model http//www.almaden.ibm.com/cs/k53/cl
ever.html
Fine-grained ranking Soumen WWW10
Query Sensitive retrieving Krishna Bharat
SIGIR98

27
PageRank vs. HITS

PageRank advantage over HITS
Query-time cost is low
HITS computes an eigenvector for every query
Less susceptible to localized link-spam
HITS advantage over PageRank
HITS ranking is sensitive to query
HITS has notion of hubs and authorities
Topic-sensitive PageRanking Haveliwala WWW11
Attempt to make PageRanking query sensitive

28
HITS Discussion

Pros
Derives topic-specific authority scores
Returns list of hubs in addition to authorities
Computational tractable (due to focused
sub-graph)
Cons
Sensitive to Web spam (artificially increasing
hub and authority weight)
Query dependence requires expensive context graph
building step
Topic drift dominant topic in base set may not
be the intended one

29
Relation between HITS, PageRank and LSI

HITS algorithm running SVD on the hyperlink
relation (source,target)
LSI algorithm running SVD on the relation
(term,document).
PageRank on root set R gives same ranking as the
ranking of hubs as given by HITS

30
(No Transcript)

Write a Comment

User Comments (0)