Making PageRank Algorithm Robust to Collusion - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Making PageRank Algorithm Robust to Collusion

Description:

Study of PageRank's robustness to collusion. ... Collusion-proofness is an essential criterion in evaluating ... The basic concept is a democratic meritocracy. ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 38
Provided by: HuiZ5
Category:

less

Transcript and Presenter's Notes

Title: Making PageRank Algorithm Robust to Collusion


1
Making PageRank Algorithm Robust to Collusion
Hui Zhang1, Ashish Goel2, Ramesh Govindan1, Kahn
Mason2, Benjamin Van Roy2 1University of
Southern California 2Stanford University
2
Outline
  • Research motivation.
  • PageRank algorithm a brief introduction.
  • Study of PageRanks robustness to collusion.
  • Adaptive-resetting make PageRank robust to
    collusion.
  • Conclusion future works.

3
Research motivation
  • Build reputation in large-scale systems
  • P2P file sharing systems
  • Blogging communities
  • Networked gaming, , etc.
  • Collusion-proofness is an essential criterion in
    evaluating a rating scheme.

4
PageRank Brin1998
  • A rating scheme to rank hypertext documents on
    the WWW.
  • An iterative algorithm to calculate the
    importance of a web page based on the importance
    of its parent pages.
  • Can be applied to other systems than WWW.

5
PageRank random walk model
node
referential link
The walker
X
1/2
1/3
Z
Y
  • As time goes on, the expected percentage of steps
    the walker is at each node v converges to the
    PageRank weight PR(v).

6
PageRank is it collusion-proof?
  • Can a node easily boost its rank by manipulating
    its out-going links with others?

7
Amp(G) a metric on group collusion
WG(G) PR(i)PR(j)
Win(G)
8
Theorem on Amp
  • In the original PageRank system,
  • where ? is the resetting probability.

9
Two experimental topologies
  • W, a Web link topology
  • Contains the link structure of upwards of 80
    million URLs.
  • Source the Stanford WebBase.
  • B, a weblog blogrolling topology
  • Contains the blogrolling structure of upwards of
    72,000 blogs.
  • Source www.blogstreet.com, the XML-RPC webblog
    service.

10
Experiment 1 Collusion200
  • Model a small number of web pages simultaneously
    colluding.
  • Methodology
  • 100 colluding groups
  • Each colluding group has the circle topology
    consisting of two nodes with adjacent ranks
  • Arbitrarily chose nodes originally ranked around
    1000th, 2000th, , 100000th.
  • ? 0.15.

11
Experiment result of Collusion200 (I)
Figure 1 W - Amplification factors of the 100
colluding groups in Collusion200.
12
Experiment result of Collusion200 (III)
Figure 2 W new PR rank after Collusion200.
13
There is a long flat portion
Figure 3 The PR weight distribution of 4
topologies.
14
Next step how to detect collusions?
  • Identifying colluding groups is unlikely to be
    computationally tractable.
  • The densest k-subgraph problemFeige et al.
    1997.
  • The classical CLIQUE problem.
  • The problem of finding hiding large cliques in
    random graphsJuels 1998.

15
An observation on collusion behaviors
  • To increase their PR weight, i.e., the stationary
    weight in the random walk, the colluding nodes
    will stall the random walk.
  • When the resetting probability ? increases, the
    colluding nodes must suffer a significant drop in
    PR weight.
  • Therefore, we expect the PR weight of colluding
    nodes to be highly correlated with 1/ ? (the
    average walk length), while that of non-colluding
    nodes is relatively insensitive to the change in
    ?.

16
An intuitive example
node
referential link
17
An intuitive example
node
referential link
A colluding group
18
An intuitive example
node
referential link
A colluding group
19
Co-co distribution in real-world graphs
Figure 4 the co-co PDF distribution in W and B
the 0, 0.1 range actually corresponds to -1,
0.1 range.
20
Adaptive-resetting scheme
  • Part I collusion detection
  • Given the topology, calculate the PR vector under
    different ? values.
  • ? 0.0375, 0.05, 0.075, 0.15, 0.3, 0.45,
    0.6, ?default 0.15.
  • Calculate the correlation coefficient between the
    curve of each node x's PR weight and the curve of
    1/ ?. Label it as co-co(x).

21
Experiment result of Collusion200 (IV)
Figure 5 W - Amplification factors of the 100
colluding groups in Collusion200.
22
Experiment result of Collusion200 (V)
Figure 6 W new PR weight after Collusion200.
23
Experiment result of Collusion200 (VI)
Figure 7 W new PR rank after Collusion200.
24
Experiment 2 Collusion22
  • Model various colluding subgraphs.
  • Methodology
  • 3 colluding groups

node
referential link
G1 10-node ring
G2 10-node star topology
G3 2-node ring
25
Experiment result of Collusion22 (I)
Figure 8 Amplification factors of the 3
colluding groups in Collusion22.
26
Experiment result of Collusion22 (II)
Figure 9 W new PR weight after Collusion22.
27
Dropped out
New top-25 URL list in W
Dropping
New
28
Conclusion future works
  • A collusion-proof rating scheme based on PageRank
    algorithm.
  • Future works
  • Optimum analysis of the adaptive-resetting
    scheme.
  • Study of Web link structure evolution under
    PageRank within the framework of game theory.

29
Backup slides
30
Reputation systems Okita2003
  • A means of describing social trust networks.
  • The basic concept is a democratic meritocracy.
  • A rating system is used to evaluate individual
    members, and those results are then collated to
    produce a consensus about the merit of any given
    member.
  • Examples
  • Livejournal, Friendster, eBay, Advogato

31
PageRank algorithm Brin1998
  • Assume N pages.
  • Assign all pages the initial value 1/N
  • Let Nu be the out-degree of Page u, Rank(v) the
    importance of Page v, Bv the set of pages
    pointing to v.

32
Experiment result of Collusion200 (II)
Figure A W new PR weight after Collusion200.
33
Experiment result of Collusion200 (VII)
Figure B B new PR rank after Collusion200
34
Experiment result of Collusion200 (X)
Figure C B new PR weight after Collusion200
35
Correlation coefficient
36
Experiment result of Collusion22 (III)
Figure D W new PR rank after Collusion22.
37
How about using finer statistics of the random
walk
  • The revisit intervals of the random walk on a
    colluding node will likely to have a large
    variance compared to its expectation.
Write a Comment
User Comments (0)
About PowerShow.com