Title: Workload Characterization in Web Caching Hierarchies
1Workload Characterizationin Web Caching
Hierarchies
- Guangwei Bai
- Carey Williamson
- Department of Computer Science
- University of Calgary
2Talk Outline
- Problem Statement
- Experimental Methodology
- Simulation Results
- Modeling Results
- Summary and Conclusions
31. Introduction
- World Wide Web One of the most
- popular applications on todays Internet
A technique used for improving performance and
scalability of the Internet
4Illustration of Web Proxy Cache Filtering Effect
Internet
Filtered Request Stream
Web Proxy Caching System
Original Request Stream
Web Clients
5Example of Web cache filter effect
Arriving Request Stream
Filtered Request Stream
Time ID 0.001 A 0.025 B 0.150 C 0.689
A 0.890 D 1.358 B 1.777 B 2.190
A 2.460 E
Time ID 0.001 A 0.025 B 0.150 C 0.890
D 1.358 B 2.460 E
Web Proxy Cache
6Example of Web cache filter effect
Arriving Request Stream
Filtered Request Stream
Time ID 0.001 A 0.025 B 0.150 C 0.689
A 0.890 D 1.358 B 1.777 B 2.190
A 2.460 E
Time ID 0.001 A 0.025 B 0.150 C 0.890
D 1.358 B 2.460 E
Web Proxy Cache
Frequency-domain effect
7Example of Web cache filter effect
Arriving Request Stream
Filtered Request Stream
Time ID 0.001 A 0.025 B 0.150 C 0.689
A 0.890 D 1.358 B 1.777 B 2.190
A 2.460 E
Time ID 0.001 A 0.025 B 0.150 C 0.890
D 1.358 B 2.460 E
Web Proxy Cache
Time-domain effect
8Goal of this Work
Time-domain analysis of cache filter effects in
Web caching hierarchies
- Study impact of a cache on the structural
- characteristics of Web request workload
- (mean, peak, variance, self-similarity)
- Sensitivity of filter effect to cache
configuration - (cache size and cache replacement policy)
- Characterizing aggregate Web request streams
- in a multi-level Web proxy caching hierarchy
9Multi-Level Web Proxy Caching System
10Experimental Methodology
- Trace-driven simulation
- Web proxy cache simulator
- Synthetic Web proxy workloads
- Controllable characteristics
- Trace length about 1M requests
- Zipf slope -0.75, -0.8
- Request arrival process
- Deterministic, Poisson, Self-Similar
11- General Observations Filter Effects
Arrival Counts
Cache Hit Ratio
1600
1530
1230
1200
1600
1530
1230
1200
20000
1
16000
0.8
Requests per 5-minute Interval
12000
0.6
Hit Ratio
8000
0.4
4000
0.2
0
0
0
0
4000
8000
6000
2000
12000
14000
10000
2000
4000
6000
8000
12000
10000
14000
Time (sec)
Time (sec)
12- Effect of Cache Configuration
- Experimental factors
- Cache size determines the maximum
- number of Web Content bytes that can
- be held in the cache at one time
- Cache Replacement Policy determines what
- object(s) to remove from the cache when more
- space is needed to store an incoming object
- (e.g. RAND, FIFO, LRU, LFU, GDS)
- (Assumption arrival process is Poisson)
13Effect of Cache Size on Traffic Structure
Marginal Distribution Plot (pdf)
14Effect of Cache Replacement Policy
(8 KB)
15- Input Deterministic Arrival Process
Cache Size (MB)
Before Cache
Statistics
4
16
64
256
1024
1
60.00
36.88
31.45
28.71
27.31
25.37
23.03
Mean
Standard Deviation
0.00
4.84
4.60
4.01
4.00
4.31
4.78
38.8
47.8
52.7
55.5
59.1
62.7
Hit Ratio
- Main Observations
- Reduces mean arrival rate of filtered request
stream - Increases variance of the filtered request stream
16- Input Poisson Arrival Process
Cache Size (MB)
Before Cache
Statistics
4
16
64
256
1024
1
60.10
36.81
31.38
28.65
27.26
25.33
23.00
Mean
Standard Deviation
7.82
6.77
6.07
5.43
5.31
5.39
5.62
38.8
47.8
52.7
55.5
59.1
62.7
Hit Ratio
- Main Observations
- Large impact on mean little impact on variance
- Variance-to-mean ratio increases with cache size
- For small cache sizes, the filtered stream is
- well-characterized as a Poisson process.
17 Input Self-Similar Arrival Process
Cache Size (MB)
Before Cache
Statistics
4
16
64
256
1024
1
62.87
38.50
32.79
29.88
28.27
26.05
23.49
Mean
Standard Deviation
12.24
9.03
7.98
7.12
6.94
7.02
7.14
38.8
47.8
52.7
55.5
59.1
62.7
Hit Ratio
- Main Observations
- Large impact on mean little impact on variance
- Variance-to-mean ratio increases with cache size
- Filtered request stream retains self-similar
structure
18- Background Self-Similar Traffic
- Network traffic self-similarity
- The statistical characterization of the traffic
- is essentially invariant with time scale.
- Main measure
- Hurst parameter 0.5 lt H lt 1
- Examination
- autocorrelation (long-range dependence)
- variance-time plot
- rescaled adjusted range statistic (R/S)
19Traffic Characterization in a Web Proxy Caching
Hierarchy
- Filter effects of the first-level cache
- on Web workload
- Statistical multiplexing of filtered
- Web request streams after the
- first-level cache
- Modeling aggregate request stream
- offered to the second-level cache
20Multi-Level Web Proxy Caching System
21Synthetic Self-Similar Workload Traces
offered to the first-level cache
Trace 1 (H0.70, Zipf slope0.75)
Trace 2 (H0.80, Zipf slope0.80)
22Evidence of Self-Similar Request Arrival Process
for Filtered Web Proxy Workload
1
1
H0.699
23Superposition of Web Workload in
time-domain
24H0.76
25Modeling of Aggregate Workload
26Modeling of Aggregate Workload
27Summary and Conclusions
- Recap Trace-driven simulation of Web proxy
- caching hierarchy, with synthetic Web
workloads - Cache reduces peak and mean request arrival rate
- Cache filter effect does not remove
self-similarity - Superposition of Web request streams results in
- a bursty aggregate request stream
- Gamma distribution a flexible and robust means
- to characterize request arrival count
distribution - at different stages in a Web caching hierarchy
28Future Work
- Bigger traces, more general workloads
- Studying the mathematical relationships between
gamma (shape) and beta (scale) parameters versus
cache size and hit ratio - For more information
- Email bai,carey_at_cpsc.ucalgary.ca
- http//www.cpsc.ucalgary.ca/carey