Measurement, Modeling, and Analysis of a PeertoPeer FileSharing Workload PowerPoint PPT Presentation

presentation player overlay
1 / 45
About This Presentation
Transcript and Presenter's Notes

Title: Measurement, Modeling, and Analysis of a PeertoPeer FileSharing Workload


1
Measurement, Modeling, and Analysis of a
Peer-to-Peer File-Sharing Workload
  • K.P. Gummadi, R. J. Dunn, et al
  • SOSP03
  • Presented by Lu-chuan Kung kung_at_uiuc.edu

2
Outline
  • Trace methodology and analysis
  • User characteristics
  • Client activities
  • Object dynamics
  • Analyze why Kazaa workload is not Zipf
  • A model of P2P file-sharing workloads
  • A study of bandwidth-saving techniques
  • Conclusion

3
Trace Methodology
  • Passively collect Kazaa traffic at the border of
    campus network and internet
  • Query traffic was not captured b/c of encryption.
    File transfers are HTTP transfers w/
    Kazaa-specific header
  • Summary statistics of the trace

4
Kazaa Users Are Patient
  • Transfer time the difference between the start
    time and the end time of a request
  • Small objects
  • Large objects 100MB (typically video files)

5
User Slow Down As They Age
  • Do people become hungrier for content as they
    gain experience with Kazaa?
  • Older clients requested fewer bytes b/c
  • Attrition population declines as clients age
  • Slowing down older clients ask for less

6
Client Activity
  • Its difficult to quantify the availability of
    clients in a p2p system
  • Client activity includes
  • Activity fraction time spent in transfers /
    duration of lifetime. Lower bound on availability
  • Average session length typical duration length

7
Object Characteristics
  • Kazaa is not one workload
  • Kazaa is a blend of workloads of different
    properties
  • 3 ranges of objects small ((10MB100GB), and large (100GB)
  • Majority of requests are for smaller objects
  • Most bytes transferred are due to large objects

8
Kazaa Object Dynamics
  • Multimedia objects are immutable, therefore
    affect object dynamics
  • Kazaa clients fetch objects at most once
  • Kazaa client requests an object once 94 of time
  • Kazaa client requests an object twice 99 of
    time
  • Most requests are for old (repeated) objects
  • An object is old if at least one month has passed
    since the first request of the object
  • 72 of requests for large objects are old
  • 52 of requests for small objects are old

9
Kazaa Object Dynamics
  • The popularity of Kazaa objects is often
    short-lived
  • The most popular pages remains stable for the Web
  • Popularity is fleeting in Kazaa
  • Audio files lose popularity faster than popular
    video files
  • The most popular Kazaa objects tend to be
    recently born objects
  • Newly born objects did not receive any requests
    during the first month of the trace

10
Kazaa Is Not Zipf
  • Zipfs law
  • The popularity of ith-most popular object is
    proportional to i-a, a Zipf coefficient
  • Kazaa is not Zipf
  • Most popular objects are less popular than Zipf
    would predict

11
Why Kazaa Is Not Zipf
  • Fetch-repeatly vs. fetch-at-most-once
  • Simulate the two cases based on the same Zipf
    distribution
  • The result of fetch-at-most-once is similar to
    Kazaa.
  • Non-Zipf workloads are also observed in web proxy
    caches and VoD servers

12
A Model of P2P File-Sharing Workloads
  • Hypothesis underlying popularity of objects in a
    fetch-at-most-once system is driven by Zipfs law
  • A client requests 2 objects per day. Choose which
    object to fetch from Zipf(1)
  • An object is born with rate ?o , its popularity
    rank is selected from Zipf(1)
  • Total object population cannot be observed from
    the trace. Use back-inference given 18,000
    distinct objects are requested in the trace,
    whats the total number of objects? Ans 40,000

13
Model Structure and Notation
  • Parameter value are chosen to reflect the
    measured data from the trace

14
File-Sharing Effectiveness
  • How should organization exepect bandwidth demand
    to change over time, given a shared proxy server?
  • Hit rate of the proxy cache decreases in the
    fetch-at-most-once case
  • Fetch-at-most-once clients consume the most
    popular objects early

15
New Object Arrivals Improve Hit Rate
  • Object updates in Web lower the hit rate
  • New objects arrivals are beneficial in P2P system
  • Arrivals of popular objects increase hit rate
  • If no arrivals, clients are forced to choose from
    the remaining unpopular objects

16
New Clients Cannot Stabilize Performance
  • The infusion of new clients at a constant rate
    cannot compensate for the increasing number of
    old clients
  • If we want to keep hit rate as a constant, we
    need exponential client arrival rate

17
Model Validation
  • Underlying Zipf assumption cannot be validated
    directly.
  • Use the proposed model to replicate the object
    popularity distribution in the trace
  • Estimate various parameters
  • Arrival rate of new objects is chosen to fit the
    measured data. ?o 5,475 objects per year

18
Exploring Locality-aware Request Routing
  • A significant fraction of Internet bandwidth is
    consumed by Kazaa
  • How would exploitation of locality help to save
    bandwidth?
  • Different ways to exploit locality
  • A centralized proxy cache placed at organization
    border
  • Request redirection favor organization-internal
    peers
  • Centralized request redirection
  • Decentralized request redirection

19
An Ideal Proxy Cache
  • Assume an ideal proxy infinite capacity and
    bandwidth
  • 86 of external bandwidth would be saved
  • However, some may not want to store P2P
    file-sharing content in a proxy server due to
    legal issues

20
Benefits of Locality-Awareness
  • Trace-based simulation
  • Infinite storage capacity
  • At most 12 concurrent downloads
  • Upload bandwidth 500 Kb/s
  • External bandwidth 100 Kb/s
  • Clients are available only when theyre
    transferring (a very conservative assumption)
  • Cold misses objects cannot be found in peers
  • Busy misses objects found but the peer is
    unavailable due to concurrent transfers

21
Benefits of Locality-Awareness
  • Locality awareness obtained 68 byte hit rate for
    large objects and 37 byte hit rate for small
    objects
  • A substantial number of miss bytes (62 of large
    objects, 43 of small objects) are due to
    unavailable clients

22
Benefits of Increased Availability
  • Most of bytes served and consumed come from
    highly available peers
  • Adding availability to the most available hosts
    earns a higher hit rate than adding to the least
    available host

23
Conclusion
  • P2P file-sharing workloads are different to Web
    workloads
  • User are patient
  • Aged clients demand less
  • Fetch-at-most once
  • The proposed model suggests that client births
    and object births are the fundamental forces
    driving P2P workloads
  • Theres significant locality in the Kazaa
    workload
  • Locality-aware peers would save 63 external
    transfers even under conservation assumption

24
Comments
  • Some of the observed characteristic may be
    related to the design of Kazaa and the measuring
    methodology and thus cannot be generalized
  • The lack of portal sites in P2P system may also
    be a reason that most popular objects in P2P are
    less popular than Zipfs law would predict

25
Assessing the Quality of Voice Communications
Over Internet Backbones
  • A.P. Markopoulou, F.A. Tobagi, M.J. Karam
  • Tran. on Networking v11 no5 Oct 2003
  • Presented by Lu-chuan Kung

26
Outline
  • VoIP System
  • Playout schemes
  • Voice Impairment in Networks
  • Internet measurements
  • Numerical results
  • Discussion

27
VoIP System
28
VoIP System
  • Speech signal
  • Talkspurts have mean 352ms
  • Silence periods have mean 650ms
  • Encoding schemes
  • Packetizer add headers for different protocols
  • Playout buffer packets are held for a later
    playout time in order to smooth playout
  • Decoder reconstruct the speech signal

29
Playout Schemes
  • Two types fixed and adaptive
  • Fixed playout scheme
  • End-to-end delay p is the same for all packets
  • Large delay decreases packet loss due to late
    arrivals, but also decreases interactivity
  • Adaptive playout scheme
  • Estimate p based on delay dav and delay variation
    v
  • p dav 4v
  • Estimate p
  • Talkspurt by talkspurt
  • Packet by packet

30
Voice Impairment in Networks
  • Quality of voice is affected by
  • Encoding
  • Packet loss
  • Network delay jitter
  • End-to-end delay
  • Echo
  • End-to-end delay consists of
  • Encoding delay
  • Packetization delay
  • Network delay
  • Playout bufferring delay
  • Decoding delay

31
Assessment of Voice Communication in Packet
Networks
  • Mean Opinion Score (MOS) a subjective rating
    given by listeners, given on a scale of 1-5
  • Intrinsic quality MOSintr quality after
    compression

32
Degradation Due to Loss
  • PLC Packet Loss Concealment
  • Convert loss rate to MOS

33
Loss of Interactivity
  • Loss of interactivity due to large end-to-end
    delay
  • NTT study
  • 6 conversation modes (tasks), task 1 is the
    hardest, task 6 is the most relaxed type

34
Echo Impairment
  • Echo can cause major quality degradation
  • The effect of echo is a function of delay and
    echo losses

35
Emodel
  • Published by ITU-T. Provide formulas to predict
    MOS of voice quality
  • R (R0 Is) Id Ie A
  • R0 basic SNR
  • Is impairment of signal, eg. sidetone and PCM
  • Id impairment due to delay (echo
    interactivity)
  • Ie impairment due to distortion (loss)
  • A advantage factor (lenient users)

36
Internet Measurements
  • Probe measurement
  • 5 major U.S. cities
  • 43 paths in total
  • 7 providers P1,P2,,P7
  • 50 bytes probes sent every 10 ms

37
Observations on the Traces
  • Duration of the trace 3 days
  • Network loss
  • 6 out of 7 providers have outages
  • Outages happened at least once per day
  • Delay characteristics
  • Delay spikes
  • Alternation between high and low states
  • Periodic clustered delay spikes

38
Delay Characteristics
39
Consistent Characteristics Per Provider
40
One Example Call
  • Apply emodel to the traces using different
    playout buffer scheme
  • Example of a 15-min call

41
One Example Call
  • Fixed playout incurs many losses in the last 5
    mins

42
How to Choose p for Fixed Scheme
  • Tradeoff between loss and delay
  • There is a optimal value of delay to achieve
    maximum MOS value

43
Example Path Many Calls
  • Random calls uniformly spread over an hour
  • 150 short (3.5-min) and 50 long (10-min) calls
  • Plot CDF vs. MOS

Fixed Playout
Adaptive Playout
44
Discussion
  • Backbone networks have a wide range of
    performance
  • Some are already able to support high quality
    voice communications
  • Some are barely able to provide acceptable VoIP
    service (MOS 3.6)
  • Reliability problems are more serious than QoS
    service mechanisms

45
Comments
  • How representative are the chosen paths among the
    typical paths on Internet?
  • End host to end host paths have larger delay
Write a Comment
User Comments (0)
About PowerShow.com