Prestige Seeley, 1949 Brin - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Prestige Seeley, 1949 Brin

Description:

Ranking of pages more important than exact values of pi ... Ranking scheme combines PageRank ... Report top-ranking authorities and hubs. HITS : Applications ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 34
Provided by: olvilman
Category:

less

Transcript and Presenter's Notes

Title: Prestige Seeley, 1949 Brin


1
Prestige(Seeley, 1949 Brin Page, 1997
Kleinberg,1997)
  • Use edge-weighted, directed graphs to model
    social networks
  • Status/Prestige
  • In-degree is a good first-order indicator

2
Notations
  • Document citation graph,
  • Node adjacency matrix E
  • Ei,j 1 iff document i cites document j, and
    zero otherwise.
  • Prestige pv associated with every node v
  • Prestige vector over all nodes p

3
(No Transcript)
4
(No Transcript)
5
Fixpoint Prestige Vector
  • Confer to all nodes v the sum total of prestige
    of all u which links to v
  • Gives a new prestige score p
  • Fixpoint for prestige vector
  • iterative assignment
  • Fixpoint principal eigenvector of E
  • Variants attenuation factor

6
Centrality
  • Graph-based notions of centrality
  • Distance d(u,v) number of links between u and
    v0
  • Radius of node u is
  • Center of the graph is
  • Example
  • Influential papers in an area of research by
    looking for papers u with small r(u)
  • No single measure is suited for all applications

7
Co-citation
  • v and w are said to be co-cited by u.
  • If document u cites documents v and w
  • Ei,j document citation matrix
  • gt ETE co-citation index matrix
  • Indicator of relatedness between v and w.
  • Clustering
  • Using above pair-wise relatedness measure in a
    clustering algorithm

8
MDS Map of WWW Co-citationsSocial structure of
Web communities concerning Geophysics, climate,
remote sensing, and ecology. The cluster labels
are generated manually. Courtesy Larson
9
The surfing model
  • Correspondence between surfer model and the
    notion of prestige
  • Page v has high prestige if the visit rate is
    high
  • This happens if there are many neighbors u with
    high visit rates leading to v
  • Deficiency
  • Web graph is not strongly connected
  • Only a fourth of the graph is !
  • Web graph is not aperiodic
  • Rank-sinks
  • Pages without out-links
  • Directed cyclic paths

10
Surfing Model Simple fix
  • Two way choice at each node
  • With probability d (0.1 lt d lt 0.2), the surfer
    jumps to a random page on the Web.
  • With probability 1d the surfer decides to
    choose, uniformly at random, an out-neighbor
  • MODIFIED EQUATION 7.9
  • Direct solution of eigen-system not feasible.

11
Solution Power Iterations
12
PageRank Architecture at Google
  • Ranking of pages more important than exact values
    of pi
  • Convergence of page ranks in 52 iterations for a
    crawl with 322 million links.
  • Pre-compute and store the PageRank of each page.
  • PageRank independent of any query or textual
    content.

13
  • Ranking scheme combines PageRank with textual
    match
  • Unpublished
  • Many empirical parameters, human effort and
  • regression testing.
  • Criticism Ad-hoc coupling and decoupling
  • between relevance and prestige

14
HITS Hyperlink Induced Topic Search
  • Relies on query-time processing
  • To select base set Vq of links for query q
    constructed by
  • selecting a sub-graph R from the Web (root set)
    relevant to the query
  • selecting any node u which neighbors any r \in R
    via an inbound or outbound edge (expanded set)
  • To deduce hubs and authorities that exist in a
    sub-graph of the Web
  • Every page u has two distinct measures of merit,
    its hub score hu and its authority score au.
  • Recursive quantitative definitions of hub and
    authority scores

15
Use text-based search engine to create a root set
of matching documents Expand root set to form
base set context graph of depth 1 additional
heuristics
16
Query dependent input
Root Set
OUT
IN
17
Query dependent input
Root Set
OUT
IN
18
Query dependent input
Base Set
Root Set
OUT
IN
19
  • Associate two numerical scores with each document
    in a hyperlinked collection authority score and
    hub score
  • Authorities most definitive information sources
    (on a specific topic)
  • Like conference papers (new ideas)
  • Hubs most useful compilation of links to
    authoritative documents
  • Like journal papers or books (consolidate or
  • survey significant research)

20
J. Kleinberg. Authoritative sources in a
hyperlinked environment. Proc. 9th ACM-SIAM
Symposium on Discrete Algorithms, 1998
  • Basic presumptions
  • Creation of links indicates judgment conferred
    authority, endorsement
  • Authority is not conferred directly from page to
  • page, but rather mediated through hub nodes
  • authorities may not be linked directly but
  • through co-citation
  • Example major car manufacturer pages will not
    point to each other, but there may be hub pages
    that compile links to such pages

21
Hub Authority Scores
  • Hubs and authorities exhibit what could be
    called a mutually reinforcing relationship a
    good hub is a page that points to many good
    authorities a good authority is a page that is
    pointed to by many good hubs Kleinberg 1999

22
(No Transcript)
23
The HITS algorithm. h and aare L1 vector
norms
24
Iterative Score Computation (1)
  • Translate mutual relationship
  • into iterative update equations

25
Iterative Score Computation (2)
  • Matrix notation

Adjacency matrix
Score vectors
26
Iterative Score Computation (3)
  • Condense into a single update equation (e.g.)
  • Question of convergence (ignore absolute scale)
  • Notice resemblance with eigenvector equations

27
Example
  • Simple example graph
  • Hub authority matrices
  • Authority and Hub weights

28
HITS Topic Distillation Process
  • Send query to a text-based IR system and obtain
    the root-set.
  • Expand the root-set by radius one to obtain an
    expanded graph.
  • Run power iterations on the hub and authority
    scores together.
  • Report top-ranking authorities and hubs.

29
HITS Applications
  • Clever model http//www.almaden.ibm.com/cs/k53/cl
    ever.html
  • Fine-grained ranking Soumen WWW10
  • Query Sensitive retrieving Krishna Bharat
    SIGIR98

30
PageRank vs. HITS
  • PageRank advantage over HITS
  • Query-time cost is low
  • HITS computes an eigenvector for every query
  • Less susceptible to localized link-spam
  • HITS advantage over PageRank
  • HITS ranking is sensitive to query
  • HITS has notion of hubs and authorities
  • Topic-sensitive PageRanking Haveliwala WWW11
  • Attempt to make PageRanking query sensitive

31
HITS Discussion
  • Pros
  • Derives topic-specific authority scores
  • Returns list of hubs in addition to authorities
  • Computational tractable (due to focused
    sub-graph)
  • Cons
  • Sensitive to Web spam (artificially increasing
    hub and authority weight)
  • Query dependence requires expensive context graph
    building step
  • Topic drift dominant topic in base set may not
    be the intended one

32
Relation between HITS, PageRank and LSI
  • HITS algorithm running SVD on the hyperlink
    relation (source,target)
  • LSI algorithm running SVD on the relation
    (term,document).
  • PageRank on root set R gives same ranking as the
    ranking of hubs as given by HITS

33
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com