Page Rank Seminar: Link mining - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Page Rank Seminar: Link mining

Description:

4.- Remove dangling links. 5.- Calculate the PR giving each page an initial value. ... 7.- Add the dangling links. Karla Alcazar, 1772627. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 16
Provided by: informati3
Category:
Tags: dangling | link | mining | page | rank | seminar

less

Transcript and Presenter's Notes

Title: Page Rank Seminar: Link mining


1
Page RankSeminar Link mining
2
Contents
  • Introduction
  • PageRank
  • Random Surfer Model
  • Implementation
  • General principle
  • Algorithm
  • Examples
  • Convergence
  • Applications and Areas of Research
  • Conclusion
  • References

3
Introduction
  • Search engines as an important tool.
  • 1.- Find all pages which fit the searched
    query.
  • 2.- Rank all pages by some sort of criterion
  • Google is one of the most used because
  • High performance
  • Ease of use
  • Quality of the search results.

4
PageRank
  • Method to rank web pages giving to it a numeric
    value that represent their importance
  • Lawrence Page and Sergey Brin ( 1998).
  • Based on the link structure of the web.
  • Developed to provide a ranking on the search
    results.

5
Random Surfer Model
  • The random surfer picks a web page and keeps
    clicking on links, this probability is called
    damping factor.
  • The PageRank value of a page reflects the
    frequency of hits on that page by the random
    surfer.
  • If the random surfer arrives to a sink page o
    gets bored then will pick another URL at random.

6
Implementation
  • 1.- URL ? id
  • 2.- Store each hyperlink in a database.
  • 3.- Sort link structure by Parent id.
  • 4.- Remove dangling links.
  • 5.- Calculate the PR giving each page an initial
    value.
  • 6.- Iterate until convergence.
  • 7.- Add the dangling links.

7
General Principle
  • Every page has some number of Outbounds links
    (forward links) and Inbounds links (backlinks).
  • A page X has a high rank if
  • It has many Inbounds links
  • It has highly ranked Inbounds links
  • Page linking to has few Outbounds links.

8
Algorithm
  • PR(A) PR of site A
  • PR(Ti) PR of sites linking A
  • N(Ti) Outbound links of Ti
  • d dumping-factor (usually set on .85)
  • (1-d) prob. that the random surfer get bored.

9
PageBrin Algorithm
  • R(u) PR of site u
  • c factor used for normalization (lt1)
  • Bu Set of pages that point to u
  • Nv Outbound links of v
  • R(v) PR of site v which point to site u
  • E(u) distribution of web pages that a random
    surfer periodically jumps (set to 0.15).

10
Examples
  • PR(A) .15.85(PR(B)/1) 1
  • PR(B) .15.85(PR(A)/1) 1
  • All pages PR of pages

PR1(A).15.85(1/2) .575 PR1(B).15.85(1/1)
1 PR1(C).15.85(1/2) .575
PR2(A).15.85(1/2) .575 PR2(B).15.85(11)
1.85 PR2(C).15.85(1/2) .575
11
with d 0.5 Pr(A) PR(B)
PR(C)
0 1 2 3 . . . . 11 12
12
Convergence
  • PageRank will scale very well even for extremely
    large collections of links.

13
Applications and Areas of research
  • Some other Applications
  • - Find an optimal order.
  • - Help user to decide if a site is trustworthy
    or not.
  • - Estimate web traffic.
  • Areas of Research
  • Computational cost.
  • Updating problem.
  • Dynamic growth of the web.
  • Spam detection and prevention.

14
Conclusion
  • Global ranking based on the webs graph
    structure.
  • Represent how important a page is on the web.
  • Inbound links information to bring order to the
    web.
  • Can be thought as a model of the random surfer
    behaviour.

15
References
  • http//pr.efactory.de/d-index.shtml
  • http//www.webworkshop.net/pagerank.html
  • http//www.iprcom.com/papers/pagerank/
  • http//www.at-web.de/suchmaschinenoptimierung/page
    rank.htm
  • http//www-db.stanford.edu/backrub/google.html
  • L.Pager S. Brin,The PageRank citation ranking
    Bringing order to the web , Stanford Digital
    Library Technique, Working paper 1999-0120, 1998.
  • Monika Henzinger, Link Analysis in Web
    Information Retrieval, 2000.
  • Amy N. Langville Carl D. Meyer, Depper Inside
    PageRank, 2003.
Write a Comment
User Comments (0)
About PowerShow.com