The influence of search engines on preferential attachment - PowerPoint PPT Presentation

About This Presentation
Title:

The influence of search engines on preferential attachment

Description:

The celebrities is far from the Power-Law straight line in log-log plot. ... Each of a clot of celebrities captures a constant fraction of the total degree ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 44
Provided by: jiang9
Category:

less

Transcript and Presenter's Notes

Title: The influence of search engines on preferential attachment


1
The influence of search engines on preferential
attachment
  • Dan Li
  • CS3150
  • Spring 2006

2
The paper
  • The influence of search engines on preferential
    attachment
  • Soumen Chakrabarti, Alan Frieze and Juan Vera

3
Background
  • The evolution of social networks through time
  • Web graph
  • Models
  • Preferential Attachment
  • Copying Model

4
Background
  • Evolution of the Web
  • Power-law
  • Preferential attachment( Barabasi and Albert)
  • Copying Model
  • The author of a newborn page u picks a random
    reference page v from the web, and with some
    probability, copies out-links from v to u.
  • Power-law power 2
  • Organic Evolution
  • NO POWERFUL CENTRAL ENTIRY!

5
The New Problem
  • How the page authors find existing pages and
    create links to them?
  • Highly popular search engines limit the attention
    of the page authors to a small set of celebrity
    pages.
  • Page authors frequently use search engines to
    locate pages, and include the HOT pages they
    visit (with probability p)

6
The New Problem
  • The evolution of the Web graph has been
    influenced permanently and pervasively by the
    existence of search engines.
  • A search engine ranks a page highly,
  • Authors find the page more often, some of them
    link to it, raising its in-degree and Pagerank,
    which leads to a further improvement or
    entrenchment of its rank.

7
The Results in This Paper
  • The celebrity nodes eventually accumulate a
    constant fraction of all links created with high
    probability
  • The degree of the other nodes still follow a
    power-law distribution with a steeper power

8
The New Model
  • Modeling how the web graphs evolves if the author
    use search engine to decide on links that they
    insert into new pages.
  • How the degree distribution deviates from the
    traditional model

9
The New Model
  • Undirected Web Graph
  • Query to the Search Engine is fixed
  • The search Engine returns a fix number of URLs
    ordered by their degree at the previous time-step
  • Limit the analysis to one topic at a time with
    out loss of generality??
  • Comments A new page may involve multiple topics
    at the same time and include different number of
    links for each topic.

10
The New Model
  • Growth process
  • Generates a sequence of graphs Gt, t 1,2,3,
  • At time t, the Graph Gt (Vt, Et) has t vertices
    and mt edges.
  • Parameters
  • p a probability
  • N maximum number of celebrity nodes listed by
    the search engine

11
The New Model Comments
  • Comments
  • The number of links each new page creates is
    fixed? Is this real? How does this affect the
    results?
  • Intuitively, the page author may not have a
    number in mind of how many links he wants to
    include, he will only determine whether a link
    will be included based on the content of that
    link

12
Some Notations in the new model
13
Formal Definition of Process P
14
The New Model
  • In both cases yi is selected by preferential
    attachment within the target subset of old nodes,
    i.e. for x in U

15
The New Model - Comments
  • The m random edges may have duplicate vertices.
    For different i, the same vertex may be selected!
    When t is smaller than m, we have a lot of loops.
  • Should we not start from one vertex? Instead, we
    can start from m vertices or N vertices and the
    initial web graph is created at random.
  • With high probability, the oldest links become
    celebrity page.
  • What happens in the real world?
  • A page becomes hot not only by random, but also
    due to its contents, can we model this??

16
The simulation results
  • Very different from the standard preferential
    attachment!
  • The celebrities is far from the Power-Law
    straight line in log-log plot.
  • As p increases, the power increases as well!
  • P Simulated power Computed power
  • P 0 2.8 3
  • P 0.3 3.96 3.857
  • P 0.6 5.9 6
  • The celebrities command a constant fraction of
    the total degree over all nodes, this fraction
    grows with p.

17
The simulation results
18
Results
19
Theorem 1
20
Interpretations
  • Celebrities capture a large? (depends on the
    constant) fraction of links.
  • Non-celebrities follow a power-law degree
    distribution with a power steeper than in
    preferential attachment.

21
The Proof
  • The celebrity list becomes fixed whp after some
    time tf
  • Once the celebrity list is fixed, process P looks
    very similar to an analogous process P
  • In each step, P takes the N oldest vertices as
    St, instead of the N largest-degree vertices.
  • This is quite reasonable, basically, the oldest
    vertices have higher degree, since they have
    longer time to be included

22
Coupling Gt and Gt
23
Analysis of the degree distribution of Gt
24
Basic Proof to Lemma 2
  • Finding recurrence of
  • Finding a similar recurrence

25
Lemma 3
26
Basic Proof to Lemma 3
27
The celebrity list get fixed
  • WHP, adding m edges to a single non-celebrity
    will not make it a celebrity.
  • The total degree of celebrities is concentrated
    to a constant fraction of all edges ever added to
    the graph

28
List-fixing Lemma
29
Proof to Lemma 4
30
Lemma 5
31
Lemma 6
  • With low degree, the celebrity has low degree

32
Lemma 7
  • With low probability, the non-celebrity has high
    degree

33
Lemma 8
  • With low probability, the gap will keep small

34
Proof of Theorem 1
  • Lst tf to be the last time that St changes in
    the process P

35
Proof of Theorem 1 cont.
36
Proof of Theorem 1 cont.
37
Proof of Theorem 1 cont.
38
Proof of Theorem 1 cont.
39
Proof of Theorem 1 cont.
40
Conclusions
  • Modeling the influence of a search engine within
    the preferential attachment framework leads to a
    qualitative change in the familiar power-law
    degree distribution.
  • Each of a clot of celebrities captures a constant
    fraction of the total degree of the graph, and
    the degree of the remaining nodes follow a
    steeper power law.

41
Is this Model real?
  • The model differs from the reality.
  • Edges are undirected?
  • Outlinks are not modified after creation
  • Pages do not die
  • No topic-based clustering

42
Comments
  • This model is used on to one topic
  • There may be interactions between topics
  • The author may include links for different topics
    into the same page
  • The number of links on a page is fixed, which is
    not the real case

43
Thank you!
Have a nice summer!
Write a Comment
User Comments (0)
About PowerShow.com