Title: SmallWorld FileSharing Communities
1Small-World File-Sharing Communities
- Adriana Iamnitchi, Matei Ripeanu, Ian Foster
- Department of Computer Science
- The University of Chicago
- Chicago, IL 60637
- anda, matei, foster_at_cs.uchicago.edu
- Presenter Sean
2Outline
- Problems
- Short tutorial
- Trace analysis
- Small-World analysis
- Understand the causes of characteristics
- Applications
3Problems
- Problems in P2P resource sharing
- Internet-scale, hard management
- Dynamic
- High failure rates
-
- Tradeoff
- No perfect way to solve all problems
simultaneously - Gnutella focus on fast file retrieval
- Freenet Anonymity
-
- How?
- Understand user behavior and request patterns
Goal of the paper
4Importance
- Yes, knowing the user behavior and request
pattern is important for the efficient solution. - Relative users in P2P system are freeriders
- Napster 40
- Gnutella 26
- A Measurement Study of Peer-to-Peer File Sharing
Systems, Stefan Saroiu, P. Krishna Gummadi,
Steven D. Gribble. - Power law in network
- Lada Adamic, Bernardo Huberman, Rajan Lukose, and
Amit Puniyani, Search in power law networks, in
Proceedings of Computing in High-Energy and
Nuclear Physics. - The popularity of web pages follows a Zipf
distribution - Paul Barford, Azer Bestavros, Adam Bradley, and
Mark Crov-ella,Changes in web client access
patterns characteristics and caching
implications, Tech. Rep. 1998. - Lee Breslau, Pei Cao, Li Fan, Graham Phillips,
and Scott Shenker, Web caching and zipf-like
distributions Evidence and implications, in
InfoCom.
5Research Questions
- Have been proved, small-world phenomenon exist in
the scientific collaboration - Analyze the same data
- Use the same software tools
- Read same reference
-
- Questions
- Q1.Are there any patterns in the way scientists
share resources that could be exploited for
designing mechanisms? - Q2.Are these characteristics typical of
scientific communities or are they more general? - Q3.Are the small-world characteristics
consequences of previously documented patterns or
do they reflect a new observation concerning
users preferences in data? - Q4.Are the properties we identified in the
data-sharing graph, especially the large
clustering coefficient, an inherent consequence
of these well-known behaviors? - Conclusion
- small-world patterns exist in diverse
file-sharing communities.
6Outline
- Problems
- Short tutorial
- Trace analysis
- Small-World analysis
- Understand the causes of characteristics
- Applications
7Short Tutorial Small World
- What is a Small-world graph?
- A loosely connected set of highly connected
sub-graphs. - e.g., In Social network, nodes -gt people, edges
-gt relationships - In Web, nodes -gt web pages, edges -gt
hyperlink - Two characteristics of Small-world graphs
- A small average path length
- Larger clustering coefficient for small-world
graphs - (Note clustering coefficient captures amount
of connected nodes neighbors)
Clustering coefficient
8Short Tutorial Power Law
a small number of nodes act as hubs (having a
large degree), while most nodes have a small
degree
- P2P applications (Gnutella)
- WWW
- Internet on router level
- Internet on domain level
- Email contacts
Mapping the Gnutella network Properties of
largescale peer-to-peer systems and implications
for system design. M. Ripeanu, A. Iamnitchi, and
I. Foster
9Short Tutorial - Zipf Distribution
Linear scales on both axes
Logarithmic scales on both axes
Comparing empirical log data from Sun's website
with a theoretical Zipf distribution
- Zipf curves have a tendency to hug the axes of
the diagram when plotted on linear scales. This
is why we usually plot them on double-logarithmic
diagrams, even though most people are not used to
interpret such diagrams. A simple description of
data that follow a Zipf distribution is that they
have - a few elements that score very high (the left
tail in the diagrams) - a medium number of elements with
middle-of-the-road scores (the middle part of the
diagram) - a huge number of elements that score very low
(the right tail in the diagram)
10Outline
- Problems
- Short tutorial
- Trace analysis
- Small-World analysis
- Understand the causes of characteristics
- Applications
11Trace Related Description
- Data-sharing graph A graph in which nodes are
users and an edge connects two users with similar
interests in data. - Three file-sharing communities
- A high-energy physics collaboration (D0)
- The Web as seen from the Boeing traces (Web)
- Kazaa peer-to-peer file-sharing system seen from
a large ISP in Israel (Kazaa) - Traces
12Trace of Three Communities (1)
- The D0 Experiment a High-Energy Physics
Collaboration
13Trace of Three Communities (2)
- The Web, Boeing proxy trace
14Trace of Three Communities (3)
- KaZaA Peer-to-Peer Network
15Outline
- Problems
- Short tutorial
- Trace analysis
- Small-World analysis
- Understand the causes of characteristics
- Applications
16Small-World Analysis
- Reminder Data-sharing graph A graph in which
nodes are users and an edge connects two users
with similar interests in data. - Similarity criteria
- Definition we say that two users have similar
data interests if the size of the intersection of
their request sets is larger than some threshold.
- Two degrees of freedom
- Time interval
- Threshold on the of common requests
- Conclusions
- Not all data-sharing graphs are power law
- All data-sharing graph show small-world
characteristics
17Small-World Distribution of Weights
- Weights
- Order of hundreds or thousands in D0
- 5 in Kazaa.
- Conclusion the sharing in D0 is significantly
more - than in Kazaa
18Small-World Degree Distribution
The Kazaa data-sharing graph is the closest to a
power-law, while D0 graphs clearly are not
power-law.
19Small-World Characteristics
- Watts-Strogatz definition a graph G(V,E) is a
small world if it has small average path length
and large clustering coefficient, much larger
than that of a random graph with the same number
of nodes and edges. - Clustering coefficent
- Average path length measure it over a random
sample of nodes
20Small-World Characteristic
Q2.Are these characteristics typical of
scientific communities or are they more
general? Answer more general
Figures 11, 12, and 13 show the small-world
characteristicslarge clustering coefficient and
small average path lengthremain constant over
time
Figure 14 summarizes the small-world result most
datapoints are concentrated around y 1(same
average path length) and x gt10 (much larger
clustering coefficient).
21Outline
- Problems
- Short tutorial
- Trace analysis
- Small-World analysis
- Understand the causes of characteristics
- Applications
22Cause of the Characteristics(1)
- Is the definition of the data-sharing graph lead
to the characteristics? - Way Compare the theory result with the real
output - Theory affiliation network, unimodal projection
- Results
The large clustering coefficient is not due to
the definition of the data-sharing.
The clustering coefficient in the data-sharing
graphs is always larger than predicted
The average degree is always smaller than
predicted.
User requests for files are not random their
preferences are limited to a set of files.
23Cause of the Characteristics(2)
- Q4.Are the properties we identified in the
data-sharing graph, especially the large
clustering coefficient, an inherent consequence
of these well-known behaviors? - Zipf properties?
- Time and space locality properties?
- Way generate random traces
- Break the user-request association.
- Synthesize the random traces.
- ST1 No correlation related to time is maintained
- ST2 Maintain the request times as in the real
traces - ST3 Maintain the users activity over time as in
the real traces
24Answer of Q4
The median weight in the real D0 data-sharing
graphs is 356 and the average is 657.9, while for
synthetic graphs the median is 137 (185 for ST3)
and the average is 13.8 (75.6 for ST3).
25Answer of Q4
When the similarity criterion varies to a
large number of common requests (say, 1000 in the
D0 case, Figure 19), the synthetic graphs are
much smaller or even disappear.
26Answer of Q4
The synthetic data-sharing graphs are less
small worlds than their corresponding real
graphs the ratio between the clustering
coefficients is smaller and the ratio between
average path lengths is larger than in real
data-sharing graph (Figure 20). However, these
differences are not major.
27Answer of Q4
- Conclusion user preferences for files have
significant influence on the data-sharing graphs - Their properties are not induced (solely) by
user-independent trace characteristics, but human
nature has some impact. - identifying small-world properties is not a
sufficient metric to characterize the natural
interest-based clustering of users
28Outline
- Problems
- Short tutorial
- Trace analysis
- Small-World analysis
- Understand the causes of characteristics
- Applications
29Significance of Small-World Data-Sharing Graph
- Lower level access to same memory locations or
access to same items in a database. - The correlation of program addresses that
reference the same data and shows that these
correlations can be used to eliminate load misses
and partial hits. - Higher level identify the structure of an
organizationbased on the applications its
members use - By identifying interest-based clusters of users
and then use this information to optimize an
organizations infrastructure, such as servers or
network topology. - Q1.Are there any patterns in the way scientists
share resources that could be exploited for
designing mechanisms? - Answer Yes
30Maybe Hot Topic
- User requests for files are not random their
preferences are limited to a set of files. Need
rigorous understanding. - We might need a metric of how small world a
small-world data-sharing graph is. - Design mechanism of the data-sharing graph from
two perspective its structure (definition) and
its small-world properties.
31Questions
Sean_at_wayne.edu and visit http//141.217.17.111/
sean