On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics - PowerPoint PPT Presentation

About This Presentation
Title:

On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics

Description:

On the Sensitivity of Web Proxy Cache Performance to Workload ... Constructed synthetic Web proxy workload generation tool (ProWGen) that captures ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 36
Provided by: careywil
Category:

less

Transcript and Presenter's Notes

Title: On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics


1
On the Sensitivity of Web Proxy Cache Performance
to Workload Characteristics
  • Mudashiru Busari
  • Carey Williamson
  • Department of Computer Science
  • University of Saskatchewan

2
Talk Outline
  • Introduction and Motivation
  • ProWGen Proxy Workload Generator
  • Tool for Synthetic Web Proxy Workloads
  • Simulation Study
  • Simulation Evaluation of Web Proxy Caches
  • Conclusions and Future Work

3
Introduction
  • The Web is both a blessing and a curse
  • Blessing
  • Internet available to the masses
  • Seamless exchange of information
  • Curse
  • Internet available to the masses
  • Stress on networks, protocols, servers, users
  • Motivation techniques to improve the performance
    and scalability of the Web

4
Why is the Web so slow?
  • Client-side bottlenecks (PC, modem)
  • Solution better access technologies
  • Server-side bottlenecks (busy Web site)
  • Solution faster, scalable server designs
  • Network bottlenecks (Internet congestion)
  • Solutions caching, replication improved
    protocols for client-server communication

5
Our Previous Work
  • Evaluation of Canadas national Web caching
    infrastructure for CANARIEs CAnet II backbone
  • Workload characterization and evaluation of
    CAnet II Web caching hierarchy
    (IEEE Network, May/June 2000)
  • Developed Web proxy caching simulator for
    trace-driven simulation evaluation of Web proxy
    caching architectures

6
CAnet II Web Caching Hierarchy (Dec 1998)
(selected measurement points for our traffic
analyses 3-6 months of data
from each)
USask
CANARIE (Ottawa)
To NLANR
7
Caching Hierarchy Overview
Top-Level/International (20-50 GB)
Cache Hit Ratios
Proxy
5-10
(empirically observed)
Proxy
National (10-20 GB)
Proxy
15-20
Regional/Univ. (5-10 GB)
Proxy
Proxy
Proxy
30-40
...
...
C
C
C
C
C
C
C
8
Overview of This Paper
  • Constructed synthetic Web proxy workload
    generation tool (ProWGen) that captures the
    salient characteristics of empirical Web proxy
    workloads
  • Use ProWGen to evaluate sensitivity of proxy
    caches to selected Web proxy workload
    characteristics

9
Research Methodology
  • Design, construction, and parameterization of
    aggregate workload models, based on empirical
    traces (Web proxy access logs)
  • Validation of ProWGen (statistically, and versus
    empirical workloads)
  • Simulation evaluation of single-level caches
  • Sensitivity to workload characteristics
  • Effect of cache size
  • Effect of cache replacement policy

10
ProWGenKey Workload Characteristics
  • One-timers (60-70 docs are useless!!!)
  • Zipf-like document referencing popularity
  • Heavy-tailed file size distribution (i.e., most
    files small, but most bytes are in big files)
  • Correlations (if any) between document size and
    document popularity (debate!)
  • Temporal locality (temporal correlation between
    recent past and near future references) Mahanti
    et al. Perf.Eval. 2000

11
ProWGen (Conceptual View)
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
12
ProWGen (Conceptual View)
Zipf
P
r
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
13
ProWGen (Conceptual View)
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
14
ProWGen (Conceptual View)
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
15
ProWGen (Conceptual View)
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
C
L
16
ProWGen Workload Modeling Details
  • Modeled workload characteristics
  • One-time referencing
  • Zipf-like referencing behaviour (Zipfs Law)
  • File size distribution
  • Body lognormal distribution
  • Tail Pareto Distribution
  • Correlation between file size and popularity
  • Temporal locality
  • Static probabilities in finite-size LRU stack
    model
  • Dynamic probabilities in finite-size LRU stack
    model

17
Validation of ProWGen
  • To establish that the synthetic workloads possess
    the desired characteristics (quantitative and
    qualitative), and that the characteristics are
    similar to those in empirical workloads
  • Example analyze 5 million requests from a proxy
    server trace and parameterize ProWGen to generate
    a similar workload

18
Workload Synthesis
19
Zipf-like Referencing Behaviour
Empirical Trace Slope 0.81
Synthetic Trace Slope 0.83
20
Transfer Size Distribution
21
Simulation Evaluation ofSingle-Level Web Proxy
CachesSome Research Questions
  • In a single-level proxy cache, how sensitive is
    Web proxy caching performance to certain workload
    characteristics (one-timers, Zipf slope,
    heavy-tail index)?
  • How does the degree of sensitivity change
    depending on the cache replacement policy?

22
Simulation Model
Web Servers
Web Clients
23
Experimental Design Factors and Levels
  • Cache size
  • 1 MB to 32 GB
  • Cache Replacement Policy
  • Recency-based LRU
  • Frequency-based LFU-Aging
  • Size-based GD-Size
  • Workload Characteristics
  • One-timers, Zipf slope, tail index, correlation,
    temporal locality model

24
Performance Metrics
  • Document Hit Ratio
  • Percent of requested docs found in cache (HR)
  • Byte Hit Ratio
  • Percent of requested bytes found in cache (BHR)

25
Simulation Results (Preview)
  • Cache performance is very sensitive to
  • Slope of Zipf-like doc referencing popularity
  • Temporal locality property
  • Correlations between size and popularity
  • Cache performance relatively insensitive to
  • One-timers
  • Tail index of heavy-tailed file size distribution

26
Sensitivity to One-timers (LRU)
(a) Doc Hit Ratio
(a) Byte Hit Ratio
27
Sensitivity to Zipf Slope (LRU)
Difference of 0.2 in Zipf slope impacts
performance by as much as 10-15 in hit ratio
and byte hit ratio
(a) Hit Ratio
(b) Byte Hit Ratio
28
Sensitivity to Heavy Tail Index (LRU Replacement
Policy)
(a) Doc Hit Ratio
(b) Byte Hit Ratio
29
Sensitivity to Heavy Tail Index (GD-Size
Replacement Policy)
Difference of 0.2 in heavy tail index impacts
performance by less than 3
(a) Hit Ratio
(a) Byte Hit Ratio
30
Sensitivity to Correlation (LRU)
(a) Doc Hit Ratio
(a) Byte Hit Ratio
31
Sensitivity to Temporal Locality (LRU)
(a) Doc Hit Ratio
(b) Byte Hit Ratio
32
Summary Single-Level Caches
  • Cache performance is sensitive to
  • Slope of Zipf-like document referencing
    popularity (steeper slope implies better caching)
  • Temporal locality
  • Correlation between size and popularity
  • Cache Performance is insensitive to
  • One-timers
  • Tail index of heavy-tailed file size
    distribution

33
Conclusions
  • ProWGen is a useful tool for the generation of
    synthetic Web proxy workloads for the evaluation
    of Web proxy caches and Web proxy caching
    architectures
  • Web proxy cache performance is quite sensitive to
    Zipf slope, temporal locality, and correlations
    (if any) between document size and document
    popularity

34
Future Work
  • Extend and improve ProWGen
  • Request arrival process (timestamps)
  • File modifications, types, and lifetimes
  • Web page structure (spatial locality)
  • Scaling the workload model(s)...
  • Evaluate multi-level Web proxy caches
  • Port to network emulation testbed

35
For More Information...
  • M. Busari, Simulation Evaluation of Web Caching
    Hierarchies, M.Sc. Thesis, Dept of Computer
    Science, U. Saskatchewan, June 2000
  • ProWGen tool
  • http//www.cs.usask.ca/faculty/carey/software/
  • Email carey_at_cs.usask.ca
  • http//www.cs.usask.ca/faculty/carey/
Write a Comment
User Comments (0)
About PowerShow.com