Title: Characterizing Unstructured Overlay Topologies in Modern P2P FileSharing Systems
1Characterizing UnstructuredOverlay Topologies in
Modern P2P File-Sharing Systems
- Daniel Stutzbach University of Oregon
- Reza Rejaie University of Oregon
- Subhabrata Sen ATT Labs
Internet Measurement Conference Berkeley, CA,
USA October 19th, 2005
2Motivation
- P2P file-sharing systems are very popular in
practice. - Several million simultaneous users collectively.
- 60 of all Internet traffic CacheLogic Research
2005 - Most use an unstructured overlay
- Understanding overlay properties is important
- Understanding how existing P2P systems function
- Developing and evaluating new systems
- Unstructured overlays are not well-understood.
- We studied overlay properties in Gnutella.
- Size one of the largest P2P systems more than 1
million users - Mature In use for several years older studies
for comparisons - Open No reverse-engineering needed
3Defining the Problem
Ultrapeer
Top-level overlay
- Gnutella uses a two-tier overlay.
- Improves scalability.
- Ultrapeers form an unstructured mesh.
- Leaf peers connect to the ultrapeers.
- eDonkey, FastTrack are similar.
- Studying the overlay requires snapshots.
- Snapshots capture the overlay as a graph.
- Individual snapshots reveal graph properties.
- Consecutive snapshots reveal dynamics.
- However, capturing accurate snapshots is
difficult.
Leaf
4Challenges in Capturing Accurate Snapshots
- Snapshots are captured iteratively by a crawler.
- An ideal snapshot is instantaneous.
- But the overlay is large and rapidly changing.
- Therefore, captured snapshots are distorted.
- Sampling
- Partial snapshots are less distorted, but may be
unrepresentative - For some types of analysis, the whole graph is
needed. - Previous studies capture either
- Complete snapshots slowly, or
- Partial snapshots.
5Cruiser a Fast Gnutella Crawler
- Features
- Distributed, highly parallelized implementation
- Dynamic adaptation to bandwidth and CPU
constraints - Cruiser is orders of magnitude faster.
- Captures one million nodes in around 7 minutes
- 140,000 peers/min, compared to 2,500 peers/min
Saroiu 02 - We investigated the effects of speed on
distortion. - Daniel Stutzbach and Reza Rejaie, Capturing
Accurate Snapshots of the Gnutella Network, the
Global Internet Symposium, March, 2005. - 4 node distortion
- 15 edge distortion
6Data Set
- More than 80,000 snapshots, over the past year.
- To examine static properties, we focus on four
- To examine dynamic properties, we use slices
- Each slice is 2 days of 500 back-to-back
snapshots - Captured starting 10/14/04, 10/21/04, 11/25/04,
12/21/04, and 12/27/04
7Summary of Characterizations
- Graph Properties
- Implementation heterogeneity
- Degree Distribution
- Top-level degree distribution
- Ultrapeer-leaf connectivity
- Degree-distance correlation
- Reachability
- Path lengths
- Eccentricity
- Small world properties
- Resiliency
- Dynamic Properties
- Existence of stable core
- Uptime distribution
- Biased connectivity
- Properties of stable core
- Largest connected component
- Path lengths
- Clustering coefficient
8Top-level Degree
Max 30 in most clients
Max 75 in some clients
Custom
- This is the degree distribution among ultrapeers.
- There are obvious peaks at 30 and 70 neighbors.
- A substantial number of ultrapeers have fewer
than 30. - What happened to the power-law seen in prior
studies?
9What happened to power-law?
Ripeanu 02 ICJ
- When a crawl is slow, many short-lived peers
report long-lived peers as neighbors. - However, those neighbors are not all present at
the same time. - Degree distribution from a slow crawl resembles
prior results.
10Shortest-Path Distances
- Distribution of distances among ultrapeers and
among all peers - In the top-level, 70 of distances are exactly 4
hops. - Across all peers, most distances are 5 or 6 hops.
- Shows the effect of the two-tier with multiple
parents - Despite large size, distances are short.
11Is Gnutella a Small World?
- Small worlds arise naturally in many places.
- Movies actors, power grid, co-authors of papers
- They have short distances, but significant
clustering, compared to a similar random graph. - Conclusion Gnutella is a small world.
- Very high clustering adversely affects flooding
queries - But Gnutella isnt clustered enough to affect
performance.
12Clustering coefficient
of edges between neighbors of node i
the maximum possible edges between neighbors of
node i
13Resiliency to Node Failure
- After removing nodes, this figure shows how many
remain connected. - The Gnutella topology is extremely resilient to
random node failure. - Its resilient even when the highest-degree nodes
are removed first. - Complex algorithms are not necessary for ensuring
resilience.
14What about Dynamic Properties?
- Connections (i.e., edges) are constantly
changing. - Dynamics of Neighbor Selection due to protocol
- Dynamics of Peer Participation due to user
- Investigation
- whether a subset of participating peers form a
relatively stable core for the overlay - what properties (such as size, diameter, degree
of connectivity, and clustering) this stable core
exhibits - what underlying factors contribute to the
formation and properties of such a stable core
15What about Dynamic Properties?
- Prior work suggests many peers are short-lived
while others are very long-lived. How do these
nodes interact? - Methodology
- Capture a long series of back-to-back snapshots
- Annotate the last snapshot with the uptime of
each peer - Examine the properties of the annotated topology
- Group peers by uptime
Present for 5 snapshots
Present for 2 snapshots
Departed peer
Newly arrived peer
Time
16Stable Core
gt 20 h
- Most peers are recent arrivals.
- Other peers have been around for a long time.
- We can select a set of peers based on a minimum
uptime threshold. - We call this the stable core.
- Does the longevity of a peer affect who its
neighbors are?
gt 10 h
17Biased Connectivity
- Hypothesis long-lived nodes tend to be more
connected to other long-lived nodes - Rationale Once connected, they stay connected.
- The longer theyre around, the more opportunities
they have to neighbor. - Verification Approach Check for biased
connectivity - Randomize the edges to create a graph without
biased connectivity - compare
- Are there more edges in the observed stable core
compared to random?
18Comparing Randomized with SC
Randomized version
SC version
19Stable Core Edges
- 2040 more edges in the stable core compared
to random. - There is an onion-like bias where long-lived
peers are more likely to be connected to other
long-lived peers. - Peers within the core do not depend on peers
outside the core for reachability - Despite high churn, there is a relatively stable
backbone
20Summary
- Characterizations of recent and accurate
snapshots - Graph properties
- The degree distribution in Gnutella is not power
law. - Gnutella exhibits small world characteristics.
- Gnutella is resilient.
- Dynamic properties
- There is a stable core within the topology
- Peer churn causes the stable core to have an
onion-like shape. - This effect is likely to occur in any
unstructured system.
21Future Work
- Examining long-term trends in Gnutella using many
snapshots. - Characterizing churn
- Characterizing properties of other
widely-deployed P2P systems - Kad (a DHT with more than 1 million users)
- BitTorrent
- Developing sampling techniques for P2P
22Ultrapeer-gtLeaf Degree
LimeWire
BearShare
Other
Custom
- LimeWire ultrapeers have a limit of 30 leaf
peers. - BearShare ultrapeers have a limit of 45 leaf
peers. - There are distinct spikes at those points, with
an even distribution of fewer leaf peers.