Web Server Workload Characterization The Search for Invariants - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Web Server Workload Characterization The Search for Invariants

Description:

... the biggest contributor to packet and byte traffic on the Internet (1996) WWW is growing at an astonishing rate. 1992 74 MB of traffic. 1996 3.2 TB of traffic ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 23
Provided by: johnOles
Category:

less

Transcript and Presenter's Notes

Title: Web Server Workload Characterization The Search for Invariants


1
Web Server Workload Characterization The Search
for Invariants
  • John Oleszkiewicz
  • CSE 807
  • January 14th, 2003

2
Problem
  • WWW is the biggest contributor to packet and byte
    traffic on the Internet (1996)
  • WWW is growing at an astonishing rate
  • 1992 74 MB of traffic
  • 1996 3.2 TB of traffic
  • We want to reduce traffic and response latency
  • We need a workload characterization

3
Data
  • Websites
  • University of Waterloo CS department
  • University of Calgary CS department
  • University of Saskatchewan University website
  • NASA's Kennedy Space Center
  • ClarkNet (ISP)
  • National Center for Supercomputing Applications

4
Data
  • Ranges
  • Traffic
  • 776 requests per day
  • 355,787 requests per day
  • Time
  • 1 week
  • 1 year

5
Data
  • File access log
  • Name of client
  • Time of request
  • Name of document
  • Server's response
  • Number of bytes transferred

6
Invariants
  • What observations apply across all data sets
    studied?

7
Invariant 1
  • Success rate for server file lookups is 88
  • Success is one of four possibilities
  • Successful
  • Not Modified
  • Found
  • Unsuccessful
  • Not modified came in second at 8

8
Invariant 2
  • HTML and image files account for 90-100 of
    requests
  • Documents were classed by file extension
  • Other file types include sound and video
  • Result is consistent with other studies

9
Invariant 3
  • Mean transfer size is less than 21 KB
  • Most files are small

10
Invariant 4
  • Among all server requests, less than 3 of the
    requests are for separate files
  • Range 0.3 to 2.1

11
Invariant 5
  • Approximately 1/3 of the files accessed are
    accessed only once in the log

12
Invariant 6
  • File size distribution is Pareto distribution
  • Most files are between 100 and 100,000 bytes
  • 10 of files are larger than 100,000 bytes.
  • Consistent with other studies and literature.

13
Invariant 7
  • 10 of files accessed account for
  • 90 of server requests
  • 90 of bytes transferred
  • Example sound and video account for
  • 0.1 to 1.2 of requests
  • 0.2 to 30.8 of data

14
Invariant 8
  • File inter-reference times are exponentially
    distributed and independent
  • Inter-reference time between accesses
  • Mean depends on workload

15
Invariant 9
  • Remote sites account for
  • More than 70 of accesses
  • More than 60 of bytes transferred

16
Invariant 10
  • Web servers are accessed by thousands of domains
  • 10 of domains account for more than 75 of usage

17
Other findings
  • No temporal locality
  • Self-similarity on some workloads

18
Recommendations
  • Server-side caching
  • Large number of references to a small number of
    files
  • Large number of bytes transferred from a small
    number of files

19
Caching Strategies
  • Cache large number of small files
  • Allows more requests to be handled
  • 76 of all file references are less than 10KB in
    size
  • Reduces CPU load
  • Cache small number of large files
  • Reduces bytes transferred by server
  • Only 26 of byte traffic is generated by files
    smaller than 10Kb
  • Reduces disk load

20
Other Issues
  • One-time file referencing
  • Means one-third of server cache can be cluttered
    with useless files
  • No temporal locality
  • LFU cache replacement policy is attractive
  • LRU cache replacement policy is not

21
Conclusion
  • Server-side caching can yield significant
    performance increases
  • Which caching scheme is used depends on system
    bottlenecks (disk, CPU)

22
THE END
  • Thanks for listening!
Write a Comment
User Comments (0)
About PowerShow.com