Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload


1
Measurement, Modeling, and Analysis of a
Peer-2-Peer File-Sharing Workload
  • Presented For
  • Cs294-4 Fall 2003
  • By Jon Hess

2
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Goal - Overview
  • Determine if the KaZaA search space is queried in
    such a way that a group of 25,000 clients can
    satisfy most of their own requests.

3
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Goals - Details
  • Capture an extensive trace
  • Utilize that trace to understand file-sharing
    traffic flows
  • Model user and object activity
  • Determine inefficiencies in the distribution
    model
  • Propose solutions to inefficiencies

4
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Motivations
  • Beginning in 1999-2000 file-sharing traffic began
    to exceed HTTP traffic in terms of aggregate
    bandwidth consumed
  • File-sharing traffic is much less understood than
    HTTP traffic even though it represents such a
    large segment of bandwidth usage
  • Bandwidth is expensive

2000
2002
HTTP Traffic
P2P Traffic
5
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • The Trace
  • 2 Machines
  • 203 days 5 hours and 6 minutes
  • 22.7TB of KaZaA file transfer traffic
  • Captured seasonal variations
  • End of spring
  • Summer
  • Fall semester

6
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Trace Conclusions
  • Users are patient
  • 30 minutes to retrieve a small object
  • Up to 1 week to retrieve a large object
  • Users consume less as they age

7
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Trace Conclusions
  • Users machines are not very active
  • A session is an unbroken length of time where a
    client has one or more file transfers in
    progress.
  • Average sessions are only 2 minutes
  • 90th percentile 28 minutes
  • Over the life of a client, it is only active
    5.54 of the time or 0.20 of the trace period
  • 90th percentile clients are active most of their
    life, and 4.15 of the trace
  • Without control traffic analysis, is this
    meaningful?

8
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
Session Lengths
9
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Trace Conclusions Objects
  • Most requests are for small objects 91
  • Most bytes transferred are part of large objects
    65
  • There are many small objects
  • There are few large objects
  • Small Objects popularity is subject to heavy
    churn
  • No small object was in the top 10 for all 6
    months
  • Only 1 large object lived in the top 10 for 6
    months
  • 44 large files remained in the top 100 for 6
    months
  • The most popular small objects are new objects
  • Most requests are for old objects

10
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Fetch-at-most-once
  • Once a KaZaA user obtains an object, they will
    not need to retrieve another copy
  • 94 of Objects are fetched once per user
  • 99 are fetched less than twice per user
  • Stems from the fact that media files are
    immutable and never stale
  • You may refresh slashdot.org three times a day,
    but there is no point download thriller.mpeg
    seventeen times.
  • This keeps KaZaA workload from following a Zipf
    curve even though object popularity does.

11
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Workload Modeling
  • Create a set of objects and give them popularity
    based on a zipf distribution
  • Create a set of clients that requests objects in
    proportion to there popularity
  • Have each client fetch-at-most-once
  • Measure the distribution of transfers
  • Does it follow a zipf curve
  • How many big-object requests can a population of
    size N satisfy

12
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
Popular objects are not requested as curve would
predict
13
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Would a proxy cache help?
  • At first the proxy will cache the popular objects
    and succeed.
  • But as fetch-at-most-once draws clients away
    from the Zipf curve and the proxy begins to fail.
  • What happens if we increase density of
    popularity?
  • Curve starts higher and falls faster

14
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Previous model did not insert new objects.
  • New popular objects tend to correct the work
    load.
  • Through providing locality
  • New clients however do not help, they contribute
    to keeping old objects popular and destroy
    locality

15
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Validating The Model
  • Capture parameters that are inputs to the model
    from the trace
  • Number of clients
  • Number of objects
  • User request rate
  • Probability user requests given file - Guess
  • Probability of popularity of new objects - Guess
  • Object arrival rates Guess
  • Run simulation with harvested parameters
  • See if simulation predicts what actually happened

16
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
Simulation seems to successfully predict reality.
But with three free variables used to tune
results, is this fair?
17
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • What inefficiencies can we eliminate?
  • Analysis against the trace shows
  • 86 of object transfers were from external
    sources when an internal source possessed the
    object.
  • A traditional proxy, given the resources, could
    cut bandwidth utilization by 86
  • Would have to host pirated data
  • Could use a proxy redirector instead. Must know
    the availability of the objects
  • Control traffic is obfuscated
  • Build locality into the protocol
  • Does this sacrifice anonymity?

18
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • How successful would a locality aware protocol
    be?
  • Assume that a client is available for periods the
    trace shows it as active
  • During a file transfer - extremely conservative

19
Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
  • Questions?
  • Will increasing efficiency decrease load as the
    authors would like? Or simply increase work
    achieved per dollar? Do clients have insatiable
    appetites?
  • Are you worried that a large number of queries
    might have already been locally satisfied?
Write a Comment
User Comments (0)
About PowerShow.com