An Analysis of Internet Content Delivery Systems - PowerPoint PPT Presentation

About This Presentation
Title:

An Analysis of Internet Content Delivery Systems

Description:

Kazaa 2x outbound requests as inbound ... Kazaa traffic more evenly distributed ... Kazaa file distribution is very heavy-tailed as is bandwidth ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 29
Provided by: gmcg
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: An Analysis of Internet Content Delivery Systems


1
An Analysis of Internet Content Delivery Systems
  • http//www.cs.washington.edu/research/networking/w
    ebsys/pubs/osdi_2002/osdi.html
  • Stefan Saroiu, Krishna Gummadi, Richard Dunn,
    Steven Gribble, Henry Levy
  • U. Washington

2
HTTP traffic distribution at U.W.
3
Bandwidth use at U. Wash.
  • Bandwidth use (bidirectional) over time
  • Daily pattern noon peaks, 4 am nadirs.

4
What is being downloaded?
5
Is content type evenly distributed among delivery
schemes?
6
Content Delivery Systems
  • WWW
  • Content Delivery Networks (CDNs)
  • Peer-to-peer file sharing
  • Note that all 3 use HTTP for file transfer,
    though P2P uses an additional protocol for
    indexing/searching.
  • Study ignores streaming protocols, legacy
    protocols (FTP, email)

7
Traditional Web services
  • Client/Server model
  • Server (or farm) has a single location
  • Every client gets files from same place
    regardless of location
  • Zipfs law 80/20 rule
  • Use caching to gain efficiency, usually at
    entrance to network

8
Zipfs Law
  • According to NIST (http//www.nist.gov/dads/HTML/z
    ipfslaw.html )
  • Definition The probability of occurrence of
    words or other items starts high and tapers off.
    Thus, a few occur very often while many others
    occur rarely.
  • Formal Definition Pn 1/na, where Pn is the
    frequency of occurrence of the nth ranked item
    and a is close to 1.
  • See also Zipfian distribution, Lotka's law,
    Benford's law, Bradford's law.
  • Note In the English language words like "and,"
    "the," "to," and "of" occur often while words
    like "undeniable" are rare. This law applies to
    words in human or computer languages, operating
    system calls, colors in images, etc., and is the
    basis of many (if not, all!) compression
    approaches.
  • Named for George Kingsley Zipf.
  • Summarized in a large data sample, 80 of the
    accesses refer to 20 of the objects

9
Content Delivery Networks
  • Akamai
  • A shadow network to provide content which is as
    (topologically) close to the client as possible.
  • Requests are redirected to nearest server based
    on user location (usually from IP address)
  • Similar to web caching
  • Low latency due to locality

10
Peer-to-Peer Networks
  • Napster, Kazaa, Gnutella, BitTorrent
  • Files distributed evenly across all nodes
  • Replication for high availability
  • To access a file, first must search to find host,
    then use a file transfer protocol to retrieve
    file
  • Often use non-standard TCP ports to evade proxys
    and firewall policies
  • Files sometimes broken into blocks across
    different peers

11
Method
  • Snoop all traffic at network edge, looking for
    HTTP, regardless of port
  • Categorized by TCP port and server domain
  • This places P2P search traffic (but not data
    xfer) in the misc bin
  • Does not capture local traffic or remote
    server-server traffic

12
Open Questions
  • Is sample data representative of trends or
    internet at large?
  • One site only
  • Vast majority of users are aged 17-21at a
    university campus
  • 9 sequential days does time of year change
    patterns?
  • Identifies a trend, but results not precise
  • Some results presented orthogonally
  • Useful to see of bytes compared by of
    clients, of objects not raw numbers of each.
  • Useful to see of bytes vs of clients not of
    bytes. (How much does each new user add to bw
    load?)

13
Results
  • 97 of traffic bps is TCP
  • 43 of TCP bps is misc
  • 43 of TCP bps is P2P file xfer
  • 14 is WWW

14
Results
  • Site is a net traffic provider
  • WWW traffic is 21 provider on average, but peak
    traffic is symmetrical
  • Kazaa traffic is 7.61 provider on average
  • Cant tell ratios of locally contained traffic to
    remote
  • 15 of outgoing HTTP bps is WWW, 85 is P2P
  • Assuming outgoing WWW traffic is university
    sponsored and P2P traffic is not, 85 of outgoing
    HTTP is NOT university sponsored.

15
Results
  • Kazaa traffic (incoming)
  • 79 video (AVIMPG)
  • 13.6 MP3
  • 7? hashed (probably encrypted premium content)
  • Negligible text still images
  • WWW Akamai breakdown is mostly text images
  • Content mix has changed since 1999
  • Less HTML, GIF, JPG
  • Much more Video, MP3

16
Results object size
  • P2P services providing more large files
  • Heavy tail has more volume

17
Examining where the bandwidth goes
Half of Akamai Kazaa traffic comes from the
1000 most popular objects WWW more evenly
distributed Gnutella sample size too small to
compare
WWW Akamai small popular files large
unpopular files Kazaa Very large files rarely
downloaded
18
Who is using the bandwidth?
  • A few Kazaa nodes cause a lot of incoming
    traffic. Biggest users cause lots of impact.

19
Who is using the bandwidth?
  • WWW fewer inbound requests than outbound
  • Outbound WWW data rate still double inbound due
    to object size
  • Kazaa 2x outbound requests as inbound
  • Small rate of Kazaa requests overwhelming large
    rate of WWW requests

20
Who is using the bandwidth?
  • Kazaa xfers take so long (130s vs 120ms) that
    of concurrent flows is double that of WWW

21
Where does the network load come from?
  • Most WWW load comes from a small number of
    servers
  • Kazaa traffic more evenly distributed
  • A small number of Kazaa servers consumes
    bandwidth very quickly

22
Where does the network load come from?
  • Kazaa distribution is flatter than WWW (no
    surprise)
  • Akamai has VERY sharp curve, out of only 350
    servers (no surprise)
  • Gnutella has sharper distribution smaller user
    community may skew results
  • Would expect P2P curves to be flatter still

23
Where does the network load come from?
  • P2P download error rates dwarf success rates,
    while WWW is mostly successful
  • Byte fractions are still comparable

24
Caching WWW traffic
  • WWW cacheability is still reasonably good 35
  • Cache hits for Akamai content are very good 50
  • Caching Akamai traffic could reduce need for
    Akamai server
  • Without knowing more about their simulated
    caching technique I doubt this, since CDN is
    already a form of cache. Is Akamai traffic
    caching tested as part of ALL HTTP? How do we
    know it stays in cache w/ bigger sample set?

25
Caching P2P traffic
  • Idealized outbound Kazaa cache warms after 6-7
    days, levels at 85 hit rate
  • Greater than idealized WWW, comparable to
    idealized Akamai
  • Unknown if there is difference between ideal
    practical
  • Inbound cache warms more slowly, only at 35
    after 9 days, still growing. (Cant be
    extrapolated from their data set)

26
Caching P2P traffic
  • Effectiveness of caching P2P grows with remote
    client population (many clients fetching same
    files over and over)

27
Misc questions
  • Too much undifferentiated traffic
  • How much is Kazaa/Gnutella search traffic?
  • How much WWW is napster or other P2P?
  • Akamai is the only CDN extracted
  • Some data is hard to compare apples to apples
  • Some stats are traffic, some are of TCP some
    only include HTTP.
  • Why is P2P so asymmetric?
  • Nodes on LAN more likely to serve files than
    dial-up nodes?

28
Conclusions
  • P2P traffic has grown tremendously over past few
    years, exceeding traditional WWW three-fold
  • Cause is huge file size
  • Kazaa file distribution is very heavy-tailed as
    is bandwidth
  • A cache for outgoing http traffic should greatly
    help save network bandwidth.
  • A small number of P2P nodes adds tremendously to
    traffic load.
  • The top downloaders chew up a large chunk of
    incoming bandwidth due to large files accessed
  • P2P distribution of serving load is not very fair
  • P2P does not appear to scale well within the I/O
    capacity of a campus environment. 90 x WWW
    client needs.
Write a Comment
User Comments (0)
About PowerShow.com