Title: Measurement, Modeling, and Analysis of a PeertoPeer FileSharing Workload
1Measurement, Modeling, and Analysis of a
Peer-to-Peer File-Sharing Workload
- K.P. Gummadi, R. J. Dunn, et al
- SOSP03
- Presented by Lu-chuan Kung kung_at_uiuc.edu
2Outline
- Trace methodology and analysis
- User characteristics
- Client activities
- Object dynamics
- Analyze why Kazaa workload is not Zipf
- A model of P2P file-sharing workloads
- A study of bandwidth-saving techniques
- Conclusion
3Trace Methodology
- Passively collect Kazaa traffic at the border of
campus network and internet - Query traffic was not captured b/c of encryption.
File transfers are HTTP transfers w/
Kazaa-specific header - Summary statistics of the trace
4Kazaa Users Are Patient
- Transfer time the difference between the start
time and the end time of a request - Small objects
- Large objects 100MB (typically video files)
5User Slow Down As They Age
- Do people become hungrier for content as they
gain experience with Kazaa? - Older clients requested fewer bytes b/c
- Attrition population declines as clients age
- Slowing down older clients ask for less
6Client Activity
- Its difficult to quantify the availability of
clients in a p2p system - Client activity includes
- Activity fraction time spent in transfers /
duration of lifetime. Lower bound on availability - Average session length typical duration length
7Object Characteristics
- Kazaa is not one workload
- Kazaa is a blend of workloads of different
properties - 3 ranges of objects small ((10MB100GB), and large (100GB)
- Majority of requests are for smaller objects
- Most bytes transferred are due to large objects
8Kazaa Object Dynamics
- Multimedia objects are immutable, therefore
affect object dynamics - Kazaa clients fetch objects at most once
- Kazaa client requests an object once 94 of time
- Kazaa client requests an object twice 99 of
time - Most requests are for old (repeated) objects
- An object is old if at least one month has passed
since the first request of the object - 72 of requests for large objects are old
- 52 of requests for small objects are old
9Kazaa Object Dynamics
- The popularity of Kazaa objects is often
short-lived - The most popular pages remains stable for the Web
- Popularity is fleeting in Kazaa
- Audio files lose popularity faster than popular
video files - The most popular Kazaa objects tend to be
recently born objects - Newly born objects did not receive any requests
during the first month of the trace
10Kazaa Is Not Zipf
- Zipfs law
- The popularity of ith-most popular object is
proportional to i-a, a Zipf coefficient - Kazaa is not Zipf
- Most popular objects are less popular than Zipf
would predict
11Why Kazaa Is Not Zipf
- Fetch-repeatly vs. fetch-at-most-once
- Simulate the two cases based on the same Zipf
distribution - The result of fetch-at-most-once is similar to
Kazaa. - Non-Zipf workloads are also observed in web proxy
caches and VoD servers
12A Model of P2P File-Sharing Workloads
- Hypothesis underlying popularity of objects in a
fetch-at-most-once system is driven by Zipfs law - A client requests 2 objects per day. Choose which
object to fetch from Zipf(1) - An object is born with rate ?o , its popularity
rank is selected from Zipf(1) - Total object population cannot be observed from
the trace. Use back-inference given 18,000
distinct objects are requested in the trace,
whats the total number of objects? Ans 40,000
13Model Structure and Notation
- Parameter value are chosen to reflect the
measured data from the trace
14File-Sharing Effectiveness
- How should organization exepect bandwidth demand
to change over time, given a shared proxy server? - Hit rate of the proxy cache decreases in the
fetch-at-most-once case - Fetch-at-most-once clients consume the most
popular objects early
15New Object Arrivals Improve Hit Rate
- Object updates in Web lower the hit rate
- New objects arrivals are beneficial in P2P system
- Arrivals of popular objects increase hit rate
- If no arrivals, clients are forced to choose from
the remaining unpopular objects
16New Clients Cannot Stabilize Performance
- The infusion of new clients at a constant rate
cannot compensate for the increasing number of
old clients - If we want to keep hit rate as a constant, we
need exponential client arrival rate
17Model Validation
- Underlying Zipf assumption cannot be validated
directly. - Use the proposed model to replicate the object
popularity distribution in the trace - Estimate various parameters
- Arrival rate of new objects is chosen to fit the
measured data. ?o 5,475 objects per year
18Exploring Locality-aware Request Routing
- A significant fraction of Internet bandwidth is
consumed by Kazaa - How would exploitation of locality help to save
bandwidth? - Different ways to exploit locality
- A centralized proxy cache placed at organization
border - Request redirection favor organization-internal
peers - Centralized request redirection
- Decentralized request redirection
19An Ideal Proxy Cache
- Assume an ideal proxy infinite capacity and
bandwidth - 86 of external bandwidth would be saved
- However, some may not want to store P2P
file-sharing content in a proxy server due to
legal issues
20Benefits of Locality-Awareness
- Trace-based simulation
- Infinite storage capacity
- At most 12 concurrent downloads
- Upload bandwidth 500 Kb/s
- External bandwidth 100 Kb/s
- Clients are available only when theyre
transferring (a very conservative assumption) - Cold misses objects cannot be found in peers
- Busy misses objects found but the peer is
unavailable due to concurrent transfers
21Benefits of Locality-Awareness
- Locality awareness obtained 68 byte hit rate for
large objects and 37 byte hit rate for small
objects - A substantial number of miss bytes (62 of large
objects, 43 of small objects) are due to
unavailable clients
22Benefits of Increased Availability
- Most of bytes served and consumed come from
highly available peers - Adding availability to the most available hosts
earns a higher hit rate than adding to the least
available host
23Conclusion
- P2P file-sharing workloads are different to Web
workloads - User are patient
- Aged clients demand less
- Fetch-at-most once
- The proposed model suggests that client births
and object births are the fundamental forces
driving P2P workloads - Theres significant locality in the Kazaa
workload - Locality-aware peers would save 63 external
transfers even under conservation assumption
24Comments
- Some of the observed characteristic may be
related to the design of Kazaa and the measuring
methodology and thus cannot be generalized - The lack of portal sites in P2P system may also
be a reason that most popular objects in P2P are
less popular than Zipfs law would predict
25Assessing the Quality of Voice Communications
Over Internet Backbones
- A.P. Markopoulou, F.A. Tobagi, M.J. Karam
- Tran. on Networking v11 no5 Oct 2003
- Presented by Lu-chuan Kung
26Outline
- VoIP System
- Playout schemes
- Voice Impairment in Networks
- Internet measurements
- Numerical results
- Discussion
27VoIP System
28VoIP System
- Speech signal
- Talkspurts have mean 352ms
- Silence periods have mean 650ms
- Encoding schemes
- Packetizer add headers for different protocols
- Playout buffer packets are held for a later
playout time in order to smooth playout - Decoder reconstruct the speech signal
29Playout Schemes
- Two types fixed and adaptive
- Fixed playout scheme
- End-to-end delay p is the same for all packets
- Large delay decreases packet loss due to late
arrivals, but also decreases interactivity - Adaptive playout scheme
- Estimate p based on delay dav and delay variation
v - p dav 4v
- Estimate p
- Talkspurt by talkspurt
- Packet by packet
30Voice Impairment in Networks
- Quality of voice is affected by
- Encoding
- Packet loss
- Network delay jitter
- End-to-end delay
- Echo
- End-to-end delay consists of
- Encoding delay
- Packetization delay
- Network delay
- Playout bufferring delay
- Decoding delay
31Assessment of Voice Communication in Packet
Networks
- Mean Opinion Score (MOS) a subjective rating
given by listeners, given on a scale of 1-5 - Intrinsic quality MOSintr quality after
compression
32Degradation Due to Loss
- PLC Packet Loss Concealment
- Convert loss rate to MOS
33Loss of Interactivity
- Loss of interactivity due to large end-to-end
delay - NTT study
- 6 conversation modes (tasks), task 1 is the
hardest, task 6 is the most relaxed type
34Echo Impairment
- Echo can cause major quality degradation
- The effect of echo is a function of delay and
echo losses
35Emodel
- Published by ITU-T. Provide formulas to predict
MOS of voice quality - R (R0 Is) Id Ie A
- R0 basic SNR
- Is impairment of signal, eg. sidetone and PCM
- Id impairment due to delay (echo
interactivity) - Ie impairment due to distortion (loss)
- A advantage factor (lenient users)
36Internet Measurements
- Probe measurement
- 5 major U.S. cities
- 43 paths in total
- 7 providers P1,P2,,P7
- 50 bytes probes sent every 10 ms
37Observations on the Traces
- Duration of the trace 3 days
- Network loss
- 6 out of 7 providers have outages
- Outages happened at least once per day
- Delay characteristics
- Delay spikes
- Alternation between high and low states
- Periodic clustered delay spikes
38Delay Characteristics
39Consistent Characteristics Per Provider
40One Example Call
- Apply emodel to the traces using different
playout buffer scheme - Example of a 15-min call
41One Example Call
- Fixed playout incurs many losses in the last 5
mins
42How to Choose p for Fixed Scheme
- Tradeoff between loss and delay
- There is a optimal value of delay to achieve
maximum MOS value
43Example Path Many Calls
- Random calls uniformly spread over an hour
- 150 short (3.5-min) and 50 long (10-min) calls
- Plot CDF vs. MOS
Fixed Playout
Adaptive Playout
44Discussion
- Backbone networks have a wide range of
performance - Some are already able to support high quality
voice communications - Some are barely able to provide acceptable VoIP
service (MOS 3.6) - Reliability problems are more serious than QoS
service mechanisms
45Comments
- How representative are the chosen paths among the
typical paths on Internet? - End host to end host paths have larger delay
-