On Unbiased Sampling for Unstructured Peer-to-Peer Networks - PowerPoint PPT Presentation

About This Presentation

Title:

On Unbiased Sampling for Unstructured Peer-to-Peer Networks

Description:

On Unbiased Sampling for Unstructured Peer-to-Peer Networks. Daniel Stutzbach University of Oregon. Reza Rejaie University of Oregon ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 20

Provided by: agth

Learn more at: http://mirage.cs.uoregon.edu

Category:

more less

Transcript and Presenter's Notes

Title: On Unbiased Sampling for Unstructured Peer-to-Peer Networks

1
On Unbiased Sampling for Unstructured
Peer-to-Peer Networks

Daniel Stutzbach University of Oregon
Reza Rejaie University of Oregon
Nick Duffield ATT LabsResearch
Subhabrata Sen ATT LabsResearch
Walter Willinger ATT LabsResearch

Internet Measurement Conference Rio de Janeiro,
Brazil October 25th, 2006
2
Motivation

P2P systems are very popular in practice.
Several million simultaneous users collectively.
60 of all Internet traffic CacheLogic Research
2005
Measurement studies aid understanding existing
systems and user behavior.
Capturing a accurate global picture is often
infeasible.
P2P systems are distributed, large, and rapidly
changing.
Capturing a global picture is time-consuming,
resulting in a blurry picture.
Sampling is a natural approach, and has been used
implicitly in most earlier P2P measurement
studies.
But how do we know the samples are
representative?

3
The Problem

We focus on sampling peer properties.
Number of neighbors (degree)
Link bandwidth
Number of shared files
Remaining uptime
Sampling peer properties occurs in two steps
Discover and select peers
Collect measurements from the selected peers
Selecting peers uniformly at random is hard.
Temporal Peer dynamics can introduce bias.
Topological The graph topology can introduce
bias.
We first examine these two problems in isolation.
We then examine them together.

4
Sampling with Dynamics

Define Vt as the set of peers present at time t.
We gather samples over a measurement window of
length ?.
The most common approach is to gather peers from
the set present during the window

5
Bias towards Short-Lived Peers
Time
Long-lived peer
Short-lived peers

Consider a simple two-peer system, containing
One long-lived peer
One rapidly-changing short-lived peer
The common approach over-selects short-lived
peers.

6
Handling Temporal Causes of Bias

The common approach is intuitive but incorrect.
Sampling peers is the wrong goal.
We want to sample peer properties.
Two samples from the same peer, but at different
times, are distinct.
Allow sampling the same peer more than once, at
different points in time.

7
Example of avoiding bias towards Short-Lived
Peers
Time
Long-lived peer
Short-lived peers

Allowing re-selecting a peer solves the problem.
The long-lived peer will be selected half the
time, reflecting the actual state of the system.
How do we select a peer uniformly at random at a
particular moment?

8
Sampling from Static Graphs

Assume for the moment a static graph
Goal Select a peer uniformly from the graph
Discover
Begin with one peer.
Query peers to discover neighbors.
Classic algorithms Breadth-First Search,
Depth-First Search
Select
Choose a subset of discovered peers
Gather samples from the selected peers

9
Advantages of Random Walks

Problems with classic approaches
Peers are correlated by their neighbor
relationship
Peers with higher degree discovered more often
A peer can only be selected once.
Random walks are a promising alternative
The information in the starting location is
lost by repeatedly injecting randomness at
each step.
The results are biased, but the bias is precisely
known.
Random walks can implicitly visit the same peer
twice.

10
Random walks, formally

Random walks can be described with a transition
matrix, P(x,y).
P(x,y) is the probability of moving from x to y
P r(x,y) is the probability of moving from x to y
after r moves
Random walks converge to a stationary
distribution
Problem we want a uniform distribution

11
The MetropolisHastings Method

The MetropolisHastings method modifies the
transition matrix to yield the desired
distribution
Proven for static graphs
Plugging in our P(x,y) and µ(x)
Select a neighbor y of x uniformly at random
Transition to y with probability deg(x) / deg(y)
Otherwise, self-transition to x.

12
Sampling from Dynamic Graphs

Adapting to vanishing peers
We maintain a stack of visited peers
If a query times out, go back in the stack
Hypothesis A Metropolized random walk will yield
approximately unbiased samples in practice.
Trivially valid for extremely slowly changing
graphs
Trivially false for extremely rapidly changing
graphs
Where is the transition?
Methodology
Session-level simulations of a wide variety of
situations
Determine what conditions lead to biased samples
Do those conditions arise in practice?

13
Metrics Fundamental properties

We focus on three fundamental properties that
affect the walk
Degree
Session length
Query latency (in paper only)
We compute the KS statistic (D) for each
distribution versus a snapshot from an oracle.
We evaluate these metrics under a variety of
conditions
Several models of churn
Several models of degree distribution
Four different peer discovery mechanisms

14
Base case

Base case
Session length distribution is Weibull (k0.59,
?40)
Maximum degree 30
Target degree 15
Peer discovery mechanism FIFO rendezvous point
Sampled and expected distributions are visually
indistinguishable.
Very low KS statistic D lt 0.004

15
Varying churn

Each point represents a simulation y-axis show
KS statistic (D)
Error is low over a wide range of session lengths
Becomes significant for median lt 2 min
High for median lt 30 s
Type of distribution does not have a large impact

16
Varying topology

Little bias when target degree gt 2
Degree 2 means network fragmentation
History mechanism bias is due to 2 of peers
with no neighbors.
More simulation results in the paper

17
Empirical results

We developed the technique into a tool called
ion-sampler.
Available from our website

bash ./ion-sampler gnutella --hops 25 -n 10
10.8.65.1716348 10.199.20.1835260 10.8.4
5.10334717 10.21.0.296346 10.32.170.2006346 10.
201.162.4930274 10.222.183.12947272 10.245.64.85
6348 10.79.198.4436520 10.216.54.16944380
18
Empirical validation

Empirical validation is tricky because there is
no perfect baseline for comparison.
Full crawling performed by Cruiser Stutzbach 05
IMC
The full crawl may be slightly biased towards
higher degree
Ion-sampler records slightly fewer higher degree
peers than a full crawl
Conclusion ion-sampler is close to a full crawl
in accuracy, and may even be more accurate!

19
Conclusions and Future Work

Summary
Temporal and topological bias can lead to
sampling error.
We present the Metropolized Random Walk with
Backtracking technique.
Extensive simulations show that it gathers nearly
unbiased samples in a wide variety of
circumstances.
Ion-sampler is a tool for gathering nearly
unbiased samples from real P2P systems.
Future work
Explore improving sampling efficiency for
uncommon events.
Evaluate MRWB under flash crowd scenarios.
Develop additional plug-ins for ion-sampler.