Personalized PageRank Seminar: Link mining - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Personalized PageRank Seminar: Link mining

Description:

Choose set P from set H. Basis vector hub vector rp for each p ? H ... of rp unique to p (capture distances from hub to arbitrary node without ... – PowerPoint PPT presentation

Number of Views:458
Avg rating:3.0/5.0
Slides: 23
Provided by: informati3
Category:

less

Transcript and Presenter's Notes

Title: Personalized PageRank Seminar: Link mining


1
PersonalizedPageRankSeminar Link mining
2
Contents
  • PageRank Overview
  • Motivation for PPVs
  • PPVs
  • Main Theorems
  • Algorithms
  • Experimental Result
  • Other Related Work
  • Conclusion
  • References

3
PageRank Overview
  • Method to rank web pages giving to it a numeric
    value that represent their importance
  • Based on the link structure of the web
  • A page X has a high rank if
  • It has many In-links or few but highly ranked
  • Has few Out-links
  • Dumping factor Probability that the random
    surfer picks a web page and keeps clicking on
    links inside of this web
  • Dangling links web pages that doesnt have any
    Out-link

4
(No Transcript)
5
Motivation for PPVs
  • PR reflects a democratic importance
  • Random user may have a set of preferred pages
  • Create "personalized views, i.e. consider user
    preferences
  • Idea
  • PPV - Personalized PageRank vector
  • Have a vector with length n pages in web
    graph
  • ppvp i-th component of vector ppv

6
PPVs
  • Problem
  • PPVs too large to compute and store offline
  • Computing PPVs during query time takes too long
  • Solution Method from Jeh and Widom
  • Computes only limited amount of PageRank vectors
    offline
  • Theorems and a couple of algorithms for the
    computation
  • How
  • Generalize preference set P to a preference
    vector u
  • u 1 and u(p) preference for page p 1/p
    for p ? P
  • Let A be the link matrix corresponding to the web
    graph
  • Calculate PPV with v cu (1-c)Av

7
  • Example
  • P calvinandhobbes.com, tomtu.de, yahoo.com
  • u 0,1/3,1/2,1/60
  • Link matrix A calculated from crawling the web
  • Apply the formula
  • v cu (1-c)Av
  • v 0,9,8,10 0
  • Tomtu is more preferred but global PR of yahoo is
    bigger

8
Main Theorems
  • Linearity Theorem
  • The Hubs Theorem
  • The Decomposition Theorem

9
  • Linearity Theorem
  • if u is a linear combination of u1 and u2, then
    (v1, v2) are also a linear combination with the
    same constants (a1, a2)
  • (a1 v1 a2 v2) (1-c)A (a1 v1 a2v2) c (a1
    u1 a2u2)
  • Represent PPVs as linear combination of basis
    vectors
  • Basis vector ri represent importance of entry j
    in i view
  • Still to hard to compute

10
  • The Hubs Theorem
  • Choose set P from set H
  • Basis vector ? hub vector rp for each p ? H
  • Calculate the hub vectors of H ahead of time and
    store them
  • However each hub vector requiring multiple scans
    of the web
  • Computing and storing all hub vectors is
    impractical

11
  • The Decomposition Theorem
  • Pages important in p are important also in p
    Out-neighbours
  • Identifies relationships among basis vectors
  • To reduce redundancy hub vectors can be encoded
    as
  • partial vectors encodes the part of rp unique
    to p (capture distances from hub to arbitrary
    node without traveling through another hub)
  • and hubs skeleton encodes the
    interrelationships among hub vectors (captures
    distances from hub to hub)

12

13
Algorithms
  • Computing Basis vectors
  • Selective expansion algorithm
  • Repeat squaring algorithm
  • Dynamic programming algorithm
  • Partial Quantities
  • Partial vectors
  • Hub skeleton
  • Web skeleton

14
  • Computing Basis vectors
  • Selective expansion algorithm
  • Expand hub pages to out-neighbours if pass
    through H
  • p deliver q sooner (High rank page) as a random
    page t
  • Repeat squaring algorithm
  • Takes H as set and computes reducing error
  • Dynamic programming algorithm
  • Compute a basis vector for each p using vectors
    from p's neighbours
  • Compute all the possible hub vectors into a table
    for fast lookup

15
  • Partial Quantities
  • Computing Partial vectors
  • Specialization of the selective expansion
    algorithm
  • Uses high PR pages as hub pages to improve
    performance
  • Hub skeleton
  • Specialization of the repeated squaring algorithm
  • Interested only in the computation of a subset of
    the web skeleton
  • Web skeleton
  • Specialization of the Basic dynamic algorithm
  • Yield an approximation of PPVs distances for
    arbitrary nodes
  • Restrict the computation eliminating all p ? P

16
  • Construction of PPVs
  • Now, the ppv using the Hubs equation where
  • u a1 p1 ... az pz
  • h hub pages from H
  • 1
  • And ai weight of page i
  • ppv Si ai(partial vector) i 1/c Sh (hub
    skeleton)h

17
Experimental Results
  • Stanfords web data (120 millions pages)
  • Taking out dangling links 80 millions pages
  • Using a 1.4 GHz CPU with 3.5 GB memory, with
    H100,000
  • Computation of Full vectors aprox. 8 hrs
  • Computation of Partial vectors
  • Using the specialization of the selective
    expansion algorithm
  • Take about 50 min
  • Computation of Hubs Skeleton
  • Using the specialization for the repeated
    squaring algorithm
  • Take about 10 hrs
  • Result Hubs vector containing 14 million entries
    can be constructed in 6 sec.

18

19
Other Related Work
  • Personalized PageRank scores were use to enable
  • Topic-Sensitive PageRank
  • The Weighting of Links Based on Content Analyses
  • Intelligent Surfer
  • Personalized Web Search engines

20
Conclusion
  • Personalized PageRanks (for all pages in the web
    graph) can be calculated from an initial set P of
    personally chosen pages and out link information
    of the web graph
  • Advantages
  • Efficient calculations for computing PPV
  • Experimentation on this has shown its utility and
    promise
  • Disadvantages
  • needs outside info like the calculation of link
    matrix A
  • performance depends on choice of personal pages

21
References
  • Jeh, Widom. Scaling Personalized Web Search, 2003
  • Chirita, Olmedilla, Nejdl. PROS A Personalized
    Ranking Platform for Web Search
  • Aktas, Nacar, Menczer. An Application of
    Personalized PageRank Vectors Personalized
    Search Engine
  • L.Pager S. Brin,The PageRank citation ranking
    Bringing order to the web , Stanford Digital
    Library Technique, Working paper 1999-0120, 1998
  • Haveliwala, Topic-sensitive PageRank, 2002
  • http//pr.efactory.de/
  • http//www-db.stanford.edu/backrub/google.html

22
Summary of Terms
Write a Comment
User Comments (0)
About PowerShow.com