Title: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
1Measurement, Modeling, and Analysis of a
Peer-2-Peer File-Sharing Workload
- Presented For
- Cs294-4 Fall 2003
- By Jon Hess
2Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Goal - Overview
- Determine if the KaZaA search space is queried in
such a way that a group of 25,000 clients can
satisfy most of their own requests.
3Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Goals - Details
- Capture an extensive trace
- Utilize that trace to understand file-sharing
traffic flows - Model user and object activity
- Determine inefficiencies in the distribution
model - Propose solutions to inefficiencies
4Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Motivations
- Beginning in 1999-2000 file-sharing traffic began
to exceed HTTP traffic in terms of aggregate
bandwidth consumed - File-sharing traffic is much less understood than
HTTP traffic even though it represents such a
large segment of bandwidth usage - Bandwidth is expensive
2000
2002
HTTP Traffic
P2P Traffic
5Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- The Trace
- 2 Machines
- 203 days 5 hours and 6 minutes
- 22.7TB of KaZaA file transfer traffic
- Captured seasonal variations
- End of spring
- Summer
- Fall semester
6Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Trace Conclusions
- Users are patient
- 30 minutes to retrieve a small object
- Up to 1 week to retrieve a large object
- Users consume less as they age
7Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Trace Conclusions
- Users machines are not very active
- A session is an unbroken length of time where a
client has one or more file transfers in
progress. - Average sessions are only 2 minutes
- 90th percentile 28 minutes
- Over the life of a client, it is only active
5.54 of the time or 0.20 of the trace period - 90th percentile clients are active most of their
life, and 4.15 of the trace - Without control traffic analysis, is this
meaningful?
8Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
Session Lengths
9Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Trace Conclusions Objects
- Most requests are for small objects 91
- Most bytes transferred are part of large objects
65 - There are many small objects
- There are few large objects
- Small Objects popularity is subject to heavy
churn - No small object was in the top 10 for all 6
months - Only 1 large object lived in the top 10 for 6
months - 44 large files remained in the top 100 for 6
months - The most popular small objects are new objects
- Most requests are for old objects
10Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Fetch-at-most-once
- Once a KaZaA user obtains an object, they will
not need to retrieve another copy - 94 of Objects are fetched once per user
- 99 are fetched less than twice per user
- Stems from the fact that media files are
immutable and never stale - You may refresh slashdot.org three times a day,
but there is no point download thriller.mpeg
seventeen times. - This keeps KaZaA workload from following a Zipf
curve even though object popularity does.
11Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Workload Modeling
- Create a set of objects and give them popularity
based on a zipf distribution - Create a set of clients that requests objects in
proportion to there popularity - Have each client fetch-at-most-once
- Measure the distribution of transfers
- Does it follow a zipf curve
- How many big-object requests can a population of
size N satisfy
12Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
Popular objects are not requested as curve would
predict
13Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Would a proxy cache help?
- At first the proxy will cache the popular objects
and succeed.
- But as fetch-at-most-once draws clients away
from the Zipf curve and the proxy begins to fail. - What happens if we increase density of
popularity? - Curve starts higher and falls faster
14Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Previous model did not insert new objects.
- New popular objects tend to correct the work
load.
- Through providing locality
- New clients however do not help, they contribute
to keeping old objects popular and destroy
locality
15Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Validating The Model
- Capture parameters that are inputs to the model
from the trace - Number of clients
- Number of objects
- User request rate
- Probability user requests given file - Guess
- Probability of popularity of new objects - Guess
- Object arrival rates Guess
- Run simulation with harvested parameters
- See if simulation predicts what actually happened
16Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
Simulation seems to successfully predict reality.
But with three free variables used to tune
results, is this fair?
17Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- What inefficiencies can we eliminate?
- Analysis against the trace shows
- 86 of object transfers were from external
sources when an internal source possessed the
object. - A traditional proxy, given the resources, could
cut bandwidth utilization by 86 - Would have to host pirated data
- Could use a proxy redirector instead. Must know
the availability of the objects - Control traffic is obfuscated
- Build locality into the protocol
- Does this sacrifice anonymity?
18Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- How successful would a locality aware protocol
be? - Assume that a client is available for periods the
trace shows it as active - During a file transfer - extremely conservative
19Measurement, Modeling, and Analysis
of a Peer-2-Peer File-Sharing Workload
- Questions?
- Will increasing efficiency decrease load as the
authors would like? Or simply increase work
achieved per dollar? Do clients have insatiable
appetites? - Are you worried that a large number of queries
might have already been locally satisfied?