On Unbiased Sampling for Unstructured Peer-to-Peer Networks - PowerPoint PPT Presentation

About This Presentation
Title:

On Unbiased Sampling for Unstructured Peer-to-Peer Networks

Description:

On Unbiased Sampling for Unstructured Peer-to-Peer Networks. Daniel Stutzbach University of Oregon. Reza Rejaie University of Oregon ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 20
Provided by: agth
Category:

less

Transcript and Presenter's Notes

Title: On Unbiased Sampling for Unstructured Peer-to-Peer Networks


1
On Unbiased Sampling for Unstructured
Peer-to-Peer Networks
  • Daniel Stutzbach University of Oregon
  • Reza Rejaie University of Oregon
  • Nick Duffield ATT LabsResearch
  • Subhabrata Sen ATT LabsResearch
  • Walter Willinger ATT LabsResearch

Internet Measurement Conference Rio de Janeiro,
Brazil October 25th, 2006
2
Motivation
  • P2P systems are very popular in practice.
  • Several million simultaneous users collectively.
  • 60 of all Internet traffic CacheLogic Research
    2005
  • Measurement studies aid understanding existing
    systems and user behavior.
  • Capturing a accurate global picture is often
    infeasible.
  • P2P systems are distributed, large, and rapidly
    changing.
  • Capturing a global picture is time-consuming,
    resulting in a blurry picture.
  • Sampling is a natural approach, and has been used
    implicitly in most earlier P2P measurement
    studies.
  • But how do we know the samples are
    representative?

3
The Problem
  • We focus on sampling peer properties.
  • Number of neighbors (degree)
  • Link bandwidth
  • Number of shared files
  • Remaining uptime
  • Sampling peer properties occurs in two steps
  • Discover and select peers
  • Collect measurements from the selected peers
  • Selecting peers uniformly at random is hard.
  • Temporal Peer dynamics can introduce bias.
  • Topological The graph topology can introduce
    bias.
  • We first examine these two problems in isolation.
  • We then examine them together.

4
Sampling with Dynamics
  • Define Vt as the set of peers present at time t.
  • We gather samples over a measurement window of
    length ?.
  • The most common approach is to gather peers from
    the set present during the window

5
Bias towards Short-Lived Peers
Time
Long-lived peer
Short-lived peers
  • Consider a simple two-peer system, containing
  • One long-lived peer
  • One rapidly-changing short-lived peer
  • The common approach over-selects short-lived
    peers.

6
Handling Temporal Causes of Bias
  • The common approach is intuitive but incorrect.
  • Sampling peers is the wrong goal.
  • We want to sample peer properties.
  • Two samples from the same peer, but at different
    times, are distinct.
  • Allow sampling the same peer more than once, at
    different points in time.

7
Example of avoiding bias towards Short-Lived
Peers
Time
Long-lived peer
Short-lived peers
  • Allowing re-selecting a peer solves the problem.
  • The long-lived peer will be selected half the
    time, reflecting the actual state of the system.
  • How do we select a peer uniformly at random at a
    particular moment?

8
Sampling from Static Graphs
  • Assume for the moment a static graph
  • Goal Select a peer uniformly from the graph
  • Discover
  • Begin with one peer.
  • Query peers to discover neighbors.
  • Classic algorithms Breadth-First Search,
    Depth-First Search
  • Select
  • Choose a subset of discovered peers
  • Gather samples from the selected peers

9
Advantages of Random Walks
  • Problems with classic approaches
  • Peers are correlated by their neighbor
    relationship
  • Peers with higher degree discovered more often
  • A peer can only be selected once.
  • Random walks are a promising alternative
  • The information in the starting location is
    lost by repeatedly injecting randomness at
    each step.
  • The results are biased, but the bias is precisely
    known.
  • Random walks can implicitly visit the same peer
    twice.

10
Random walks, formally
  • Random walks can be described with a transition
    matrix, P(x,y).
  • P(x,y) is the probability of moving from x to y
  • P r(x,y) is the probability of moving from x to y
    after r moves
  • Random walks converge to a stationary
    distribution
  • Problem we want a uniform distribution

11
The MetropolisHastings Method
  • The MetropolisHastings method modifies the
    transition matrix to yield the desired
    distribution
  • Proven for static graphs
  • Plugging in our P(x,y) and µ(x)
  • Select a neighbor y of x uniformly at random
  • Transition to y with probability deg(x) / deg(y)
  • Otherwise, self-transition to x.

12
Sampling from Dynamic Graphs
  • Adapting to vanishing peers
  • We maintain a stack of visited peers
  • If a query times out, go back in the stack
  • Hypothesis A Metropolized random walk will yield
    approximately unbiased samples in practice.
  • Trivially valid for extremely slowly changing
    graphs
  • Trivially false for extremely rapidly changing
    graphs
  • Where is the transition?
  • Methodology
  • Session-level simulations of a wide variety of
    situations
  • Determine what conditions lead to biased samples
  • Do those conditions arise in practice?

13
Metrics Fundamental properties
  • We focus on three fundamental properties that
    affect the walk
  • Degree
  • Session length
  • Query latency (in paper only)
  • We compute the KS statistic (D) for each
    distribution versus a snapshot from an oracle.
  • We evaluate these metrics under a variety of
    conditions
  • Several models of churn
  • Several models of degree distribution
  • Four different peer discovery mechanisms

14
Base case
  • Base case
  • Session length distribution is Weibull (k0.59,
    ?40)
  • Maximum degree 30
  • Target degree 15
  • Peer discovery mechanism FIFO rendezvous point
  • Sampled and expected distributions are visually
    indistinguishable.
  • Very low KS statistic D lt 0.004

15
Varying churn
  • Each point represents a simulation y-axis show
    KS statistic (D)
  • Error is low over a wide range of session lengths
  • Becomes significant for median lt 2 min
  • High for median lt 30 s
  • Type of distribution does not have a large impact

16
Varying topology
  • Little bias when target degree gt 2
  • Degree 2 means network fragmentation
  • History mechanism bias is due to 2 of peers
    with no neighbors.
  • More simulation results in the paper

17
Empirical results
  • We developed the technique into a tool called
    ion-sampler.
  • Available from our website

bash ./ion-sampler gnutella --hops 25 -n 10
10.8.65.1716348 10.199.20.1835260 10.8.4
5.10334717 10.21.0.296346 10.32.170.2006346 10.
201.162.4930274 10.222.183.12947272 10.245.64.85
6348 10.79.198.4436520 10.216.54.16944380
18
Empirical validation
  • Empirical validation is tricky because there is
    no perfect baseline for comparison.
  • Full crawling performed by Cruiser Stutzbach 05
    IMC
  • The full crawl may be slightly biased towards
    higher degree
  • Ion-sampler records slightly fewer higher degree
    peers than a full crawl
  • Conclusion ion-sampler is close to a full crawl
    in accuracy, and may even be more accurate!

19
Conclusions and Future Work
  • Summary
  • Temporal and topological bias can lead to
    sampling error.
  • We present the Metropolized Random Walk with
    Backtracking technique.
  • Extensive simulations show that it gathers nearly
    unbiased samples in a wide variety of
    circumstances.
  • Ion-sampler is a tool for gathering nearly
    unbiased samples from real P2P systems.
  • Future work
  • Explore improving sampling efficiency for
    uncommon events.
  • Evaluate MRWB under flash crowd scenarios.
  • Develop additional plug-ins for ion-sampler.
Write a Comment
User Comments (0)
About PowerShow.com