Title: Prestige Seeley, 1949 Brin
1Prestige(Seeley, 1949 Brin Page, 1997
Kleinberg,1997)
- Use edge-weighted, directed graphs to model
social networks - Status/Prestige
- In-degree is a good first-order indicator
2Notations
- Document citation graph,
- Node adjacency matrix E
- Ei,j 1 iff document i cites document j, and
zero otherwise. - Prestige pv associated with every node v
- Prestige vector over all nodes p
3(No Transcript)
4(No Transcript)
5Fixpoint Prestige Vector
- Confer to all nodes v the sum total of prestige
of all u which links to v - Gives a new prestige score p
- Fixpoint for prestige vector
- iterative assignment
- Fixpoint principal eigenvector of E
- Variants attenuation factor
6Centrality
- Graph-based notions of centrality
- Distance d(u,v) number of links between u and
v0 - Radius of node u is
- Center of the graph is
- Example
- Influential papers in an area of research by
looking for papers u with small r(u) - No single measure is suited for all applications
7Co-citation
- v and w are said to be co-cited by u.
- If document u cites documents v and w
- Ei,j document citation matrix
- gt ETE co-citation index matrix
- Indicator of relatedness between v and w.
- Clustering
- Using above pair-wise relatedness measure in a
clustering algorithm
8MDS Map of WWW Co-citationsSocial structure of
Web communities concerning Geophysics, climate,
remote sensing, and ecology. The cluster labels
are generated manually. Courtesy Larson
9The surfing model
- Correspondence between surfer model and the
notion of prestige - Page v has high prestige if the visit rate is
high - This happens if there are many neighbors u with
high visit rates leading to v - Deficiency
- Web graph is not strongly connected
- Only a fourth of the graph is !
- Web graph is not aperiodic
- Rank-sinks
- Pages without out-links
- Directed cyclic paths
10Surfing Model Simple fix
- Two way choice at each node
- With probability d (0.1 lt d lt 0.2), the surfer
jumps to a random page on the Web. - With probability 1d the surfer decides to
choose, uniformly at random, an out-neighbor - MODIFIED EQUATION 7.9
- Direct solution of eigen-system not feasible.
11Solution Power Iterations
12PageRank Architecture at Google
- Ranking of pages more important than exact values
of pi - Convergence of page ranks in 52 iterations for a
crawl with 322 million links. - Pre-compute and store the PageRank of each page.
- PageRank independent of any query or textual
content.
13- Ranking scheme combines PageRank with textual
match - Unpublished
- Many empirical parameters, human effort and
- regression testing.
- Criticism Ad-hoc coupling and decoupling
- between relevance and prestige
14HITS Hyperlink Induced Topic Search
- Relies on query-time processing
- To select base set Vq of links for query q
constructed by - selecting a sub-graph R from the Web (root set)
relevant to the query - selecting any node u which neighbors any r \in R
via an inbound or outbound edge (expanded set) - To deduce hubs and authorities that exist in a
sub-graph of the Web - Every page u has two distinct measures of merit,
its hub score hu and its authority score au. - Recursive quantitative definitions of hub and
authority scores
15Use text-based search engine to create a root set
of matching documents Expand root set to form
base set context graph of depth 1 additional
heuristics
16Query dependent input
Root Set
OUT
IN
17Query dependent input
Root Set
OUT
IN
18Query dependent input
Base Set
Root Set
OUT
IN
19- Associate two numerical scores with each document
in a hyperlinked collection authority score and
hub score - Authorities most definitive information sources
(on a specific topic) - Like conference papers (new ideas)
- Hubs most useful compilation of links to
authoritative documents - Like journal papers or books (consolidate or
- survey significant research)
20J. Kleinberg. Authoritative sources in a
hyperlinked environment. Proc. 9th ACM-SIAM
Symposium on Discrete Algorithms, 1998
- Basic presumptions
- Creation of links indicates judgment conferred
authority, endorsement - Authority is not conferred directly from page to
- page, but rather mediated through hub nodes
- authorities may not be linked directly but
- through co-citation
- Example major car manufacturer pages will not
point to each other, but there may be hub pages
that compile links to such pages
21Hub Authority Scores
- Hubs and authorities exhibit what could be
called a mutually reinforcing relationship a
good hub is a page that points to many good
authorities a good authority is a page that is
pointed to by many good hubs Kleinberg 1999
22(No Transcript)
23The HITS algorithm. h and aare L1 vector
norms
24Iterative Score Computation (1)
- Translate mutual relationship
- into iterative update equations
25Iterative Score Computation (2)
Adjacency matrix
Score vectors
26Iterative Score Computation (3)
- Condense into a single update equation (e.g.)
- Question of convergence (ignore absolute scale)
- Notice resemblance with eigenvector equations
27Example
- Simple example graph
- Hub authority matrices
- Authority and Hub weights
28HITS Topic Distillation Process
- Send query to a text-based IR system and obtain
the root-set. - Expand the root-set by radius one to obtain an
expanded graph. - Run power iterations on the hub and authority
scores together. - Report top-ranking authorities and hubs.
29HITS Applications
- Clever model http//www.almaden.ibm.com/cs/k53/cl
ever.html - Fine-grained ranking Soumen WWW10
- Query Sensitive retrieving Krishna Bharat
SIGIR98
30PageRank vs. HITS
- PageRank advantage over HITS
- Query-time cost is low
- HITS computes an eigenvector for every query
- Less susceptible to localized link-spam
- HITS advantage over PageRank
- HITS ranking is sensitive to query
- HITS has notion of hubs and authorities
- Topic-sensitive PageRanking Haveliwala WWW11
- Attempt to make PageRanking query sensitive
31HITS Discussion
- Pros
- Derives topic-specific authority scores
- Returns list of hubs in addition to authorities
- Computational tractable (due to focused
sub-graph) - Cons
- Sensitive to Web spam (artificially increasing
hub and authority weight) - Query dependence requires expensive context graph
building step - Topic drift dominant topic in base set may not
be the intended one
32Relation between HITS, PageRank and LSI
- HITS algorithm running SVD on the hyperlink
relation (source,target) - LSI algorithm running SVD on the relation
(term,document). - PageRank on root set R gives same ranking as the
ranking of hubs as given by HITS
33(No Transcript)