SmallWorld FileSharing Communities - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

SmallWorld FileSharing Communities

Description:

Small-World File-Sharing Communities. Adriana Iamnitchi, Matei Ripeanu, Ian Foster ... small-world patterns exist in diverse file-sharing communities. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 32
Provided by: weiso
Category:

less

Transcript and Presenter's Notes

Title: SmallWorld FileSharing Communities


1
Small-World File-Sharing Communities
  • Adriana Iamnitchi, Matei Ripeanu, Ian Foster
  • Department of Computer Science
  • The University of Chicago
  • Chicago, IL 60637
  • anda, matei, foster_at_cs.uchicago.edu
  • Presenter Sean

2
Outline
  • Problems
  • Short tutorial
  • Trace analysis
  • Small-World analysis
  • Understand the causes of characteristics
  • Applications

3
Problems
  • Problems in P2P resource sharing
  • Internet-scale, hard management
  • Dynamic
  • High failure rates
  • Tradeoff
  • No perfect way to solve all problems
    simultaneously
  • Gnutella focus on fast file retrieval
  • Freenet Anonymity
  • How?
  • Understand user behavior and request patterns

Goal of the paper
4
Importance
  • Yes, knowing the user behavior and request
    pattern is important for the efficient solution.
  • Relative users in P2P system are freeriders
  • Napster 40
  • Gnutella 26
  • A Measurement Study of Peer-to-Peer File Sharing
    Systems, Stefan Saroiu, P. Krishna Gummadi,
    Steven D. Gribble.
  • Power law in network
  • Lada Adamic, Bernardo Huberman, Rajan Lukose, and
    Amit Puniyani, Search in power law networks, in
    Proceedings of Computing in High-Energy and
    Nuclear Physics.
  • The popularity of web pages follows a Zipf
    distribution
  • Paul Barford, Azer Bestavros, Adam Bradley, and
    Mark Crov-ella,Changes in web client access
    patterns characteristics and caching
    implications, Tech. Rep. 1998.
  • Lee Breslau, Pei Cao, Li Fan, Graham Phillips,
    and Scott Shenker, Web caching and zipf-like
    distributions Evidence and implications, in
    InfoCom.

5
Research Questions
  • Have been proved, small-world phenomenon exist in
    the scientific collaboration
  • Analyze the same data
  • Use the same software tools
  • Read same reference
  • Questions
  • Q1.Are there any patterns in the way scientists
    share resources that could be exploited for
    designing mechanisms?
  • Q2.Are these characteristics typical of
    scientific communities or are they more general?
  • Q3.Are the small-world characteristics
    consequences of previously documented patterns or
    do they reflect a new observation concerning
    users preferences in data?
  • Q4.Are the properties we identified in the
    data-sharing graph, especially the large
    clustering coefficient, an inherent consequence
    of these well-known behaviors?
  • Conclusion
  • small-world patterns exist in diverse
    file-sharing communities.

6
Outline
  • Problems
  • Short tutorial
  • Trace analysis
  • Small-World analysis
  • Understand the causes of characteristics
  • Applications

7
Short Tutorial Small World
  • What is a Small-world graph?
  • A loosely connected set of highly connected
    sub-graphs.
  • e.g., In Social network, nodes -gt people, edges
    -gt relationships
  • In Web, nodes -gt web pages, edges -gt
    hyperlink
  • Two characteristics of Small-world graphs
  • A small average path length
  • Larger clustering coefficient for small-world
    graphs
  • (Note clustering coefficient captures amount
    of connected nodes neighbors)

Clustering coefficient
8
Short Tutorial Power Law
a small number of nodes act as hubs (having a
large degree), while most nodes have a small
degree
  • P2P applications (Gnutella)
  • WWW
  • Internet on router level
  • Internet on domain level
  • Email contacts

Mapping the Gnutella network Properties of
largescale peer-to-peer systems and implications
for system design. M. Ripeanu, A. Iamnitchi, and
I. Foster
9
Short Tutorial - Zipf Distribution
Linear scales on both axes
Logarithmic scales on both axes
Comparing empirical log data from Sun's website
with a theoretical Zipf distribution
  • Zipf curves have a tendency to hug the axes of
    the diagram when plotted on linear scales. This
    is why we usually plot them on double-logarithmic
    diagrams, even though most people are not used to
    interpret such diagrams. A simple description of
    data that follow a Zipf distribution is that they
    have
  • a few elements that score very high (the left
    tail in the diagrams)
  • a medium number of elements with
    middle-of-the-road scores (the middle part of the
    diagram)
  • a huge number of elements that score very low
    (the right tail in the diagram)

10
Outline
  • Problems
  • Short tutorial
  • Trace analysis
  • Small-World analysis
  • Understand the causes of characteristics
  • Applications

11
Trace Related Description
  • Data-sharing graph A graph in which nodes are
    users and an edge connects two users with similar
    interests in data.
  • Three file-sharing communities
  • A high-energy physics collaboration (D0)
  • The Web as seen from the Boeing traces (Web)
  • Kazaa peer-to-peer file-sharing system seen from
    a large ISP in Israel (Kazaa)
  • Traces

12
Trace of Three Communities (1)
  • The D0 Experiment a High-Energy Physics
    Collaboration

13
Trace of Three Communities (2)
  • The Web, Boeing proxy trace

14
Trace of Three Communities (3)
  • KaZaA Peer-to-Peer Network

15
Outline
  • Problems
  • Short tutorial
  • Trace analysis
  • Small-World analysis
  • Understand the causes of characteristics
  • Applications

16
Small-World Analysis
  • Reminder Data-sharing graph A graph in which
    nodes are users and an edge connects two users
    with similar interests in data.
  • Similarity criteria
  • Definition we say that two users have similar
    data interests if the size of the intersection of
    their request sets is larger than some threshold.
  • Two degrees of freedom
  • Time interval
  • Threshold on the of common requests
  • Conclusions
  • Not all data-sharing graphs are power law
  • All data-sharing graph show small-world
    characteristics

17
Small-World Distribution of Weights
  • Weights
  • Order of hundreds or thousands in D0
  • 5 in Kazaa.
  • Conclusion the sharing in D0 is significantly
    more
  • than in Kazaa

18
Small-World Degree Distribution
The Kazaa data-sharing graph is the closest to a
power-law, while D0 graphs clearly are not
power-law.
19
Small-World Characteristics
  • Watts-Strogatz definition a graph G(V,E) is a
    small world if it has small average path length
    and large clustering coefficient, much larger
    than that of a random graph with the same number
    of nodes and edges.
  • Clustering coefficent
  • Average path length measure it over a random
    sample of nodes

20
Small-World Characteristic
Q2.Are these characteristics typical of
scientific communities or are they more
general? Answer more general
Figures 11, 12, and 13 show the small-world
characteristicslarge clustering coefficient and
small average path lengthremain constant over
time
Figure 14 summarizes the small-world result most
datapoints are concentrated around y 1(same
average path length) and x gt10 (much larger
clustering coefficient).
21
Outline
  • Problems
  • Short tutorial
  • Trace analysis
  • Small-World analysis
  • Understand the causes of characteristics
  • Applications

22
Cause of the Characteristics(1)
  • Is the definition of the data-sharing graph lead
    to the characteristics?
  • Way Compare the theory result with the real
    output
  • Theory affiliation network, unimodal projection
  • Results

The large clustering coefficient is not due to
the definition of the data-sharing.
The clustering coefficient in the data-sharing
graphs is always larger than predicted
The average degree is always smaller than
predicted.
User requests for files are not random their
preferences are limited to a set of files.
23
Cause of the Characteristics(2)
  • Q4.Are the properties we identified in the
    data-sharing graph, especially the large
    clustering coefficient, an inherent consequence
    of these well-known behaviors?
  • Zipf properties?
  • Time and space locality properties?
  • Way generate random traces
  • Break the user-request association.
  • Synthesize the random traces.
  • ST1 No correlation related to time is maintained
  • ST2 Maintain the request times as in the real
    traces
  • ST3 Maintain the users activity over time as in
    the real traces

24
Answer of Q4
The median weight in the real D0 data-sharing
graphs is 356 and the average is 657.9, while for
synthetic graphs the median is 137 (185 for ST3)
and the average is 13.8 (75.6 for ST3).
25
Answer of Q4
When the similarity criterion varies to a
large number of common requests (say, 1000 in the
D0 case, Figure 19), the synthetic graphs are
much smaller or even disappear.
26
Answer of Q4
The synthetic data-sharing graphs are less
small worlds than their corresponding real
graphs the ratio between the clustering
coefficients is smaller and the ratio between
average path lengths is larger than in real
data-sharing graph (Figure 20). However, these
differences are not major.
27
Answer of Q4
  • Conclusion user preferences for files have
    significant influence on the data-sharing graphs
  • Their properties are not induced (solely) by
    user-independent trace characteristics, but human
    nature has some impact.
  • identifying small-world properties is not a
    sufficient metric to characterize the natural
    interest-based clustering of users

28
Outline
  • Problems
  • Short tutorial
  • Trace analysis
  • Small-World analysis
  • Understand the causes of characteristics
  • Applications

29
Significance of Small-World Data-Sharing Graph
  • Lower level access to same memory locations or
    access to same items in a database.
  • The correlation of program addresses that
    reference the same data and shows that these
    correlations can be used to eliminate load misses
    and partial hits.
  • Higher level identify the structure of an
    organizationbased on the applications its
    members use
  • By identifying interest-based clusters of users
    and then use this information to optimize an
    organizations infrastructure, such as servers or
    network topology.
  • Q1.Are there any patterns in the way scientists
    share resources that could be exploited for
    designing mechanisms?
  • Answer Yes

30
Maybe Hot Topic
  • User requests for files are not random their
    preferences are limited to a set of files. Need
    rigorous understanding.
  • We might need a metric of how small world a
    small-world data-sharing graph is.
  • Design mechanism of the data-sharing graph from
    two perspective its structure (definition) and
    its small-world properties.

31
Questions
Sean_at_wayne.edu and visit http//141.217.17.111/
sean
Write a Comment
User Comments (0)
About PowerShow.com