Title: Web%20Cache%20Replacement%20Policies:%20Properties,%20Limitations%20and%20Implications
1Web Cache Replacement Policies Properties,
Limitations and Implications
- Fabrício Benevenuto, Fernando Duarte, Virgílio
Almeida, Jussara Almeida
Computer Science Department Federal University of
Minas Gerais Brazil
2Summary
- Introduction to Web caching
- Motivations and goals
- Evaluation methodology
- Performance metrics
- Workload description
- Caching system simulator
- Experimental results
- Conclusions and future work
3Web Caching
- Dramatic growth of the WWW in terms of content,
users, servers and complexity - Web caching is a common strategy used to
- reduce the traffic over Internet
- increase server scalability
- diminish the latency in the network
- Use of caching by the deployment of Web Proxies
4Web Caching
- Web proxies can be seen as intermediaries of the
traffic between the HTTP clients and servers - Nowadays the Web has a hierarchical topology
5Web Caching
- Cache replacement is one of the issues that a
proxy should be able to manage - As the cache has finite size, when it is full,
how does a proxy choose a page to remove from its
cache? - A lot of research has been done to address this
question and several cache replacement policies
can be found in the literature - Key questions
- Is the design of new cache replacement policies
needed? - What are the properties that new policies should
take advantage of to improve a caching system?
6Goals
- Investigate how much a new caching policy could
improve cache system performance - Explore the main causes of periods of poor and
high performance in caching systems
7Evaluation Methodology
- Evaluation of different metrics over time
- Hit Ratio
- Percentage of first-timers
- Maximum improvement
- Entropy
- Time intervals of 1, 10 and 100 minutes
- Use of real workloads
8Performance Metric Hit Ratio
- Hit ratio is the percentage of requests satisfied
by the cache - It is most general metric used to evaluate the
effectiveness of a caching policy - Measuring hit ratio over time to detect periods
of variations of performance
9Performance Metric Percentage of First-Timers
- First-timer is the first request for an object of
the trace.
- Caching policies cannot satisfy first-timers
- the first-timer has never been requested in the
past
10Performance Metric Maximum Improvement
- The maximum improvement MI is defined as
- Maximum improvement over LRU
- We evaluate the maximum hit ratio a new caching
policy can improve over the simple LRU policy
11Performance Metric Entropy
- Taking n distinct objects with probability pi of
occurrence, the entropy H(X) of a request stream
is calculated as
- Entropy measures the concentration of popularity
of a request stream - The higher the value of the entropy, the lower
the concentration of popularity - Caching policies should keep objects with high
probability of being referenced in the near
future
12Performance Metric Entropy
- Entropy depends on the number of distinct objects
- Use of the normalized entropy HN
- Investigate the influence of popularity on
caching performance
13Experiment Setup
- Real traces from proxy caches located at two
points of the Web topology - Closer to clients
- Federal University of Minas Gerais (UFMG)
- Closer to servers
- National Laboratory for Applied Network
Research (NLANR) - Cache Size 10 of the number of distinct objects
- Replacement caching policy Simple LRU
14Workload Description
Name University 1 University 2 NLANR 1 NLANR 2
start date 01-10-2004 01-12-2004 01-18-2005 01-20-2005
days 2 10 2 11
requests 1,004,747 3,459,549 1,207,075 3,427,391
distinct objects 299,367 623,164 891,906 2,350,215
normalized entropy 0.8532 0.8268 0.9482 0.9329
- Traces used
- Cache warming University 1, NLANR 1
- Performance evaluation University 2, NLANR 2
- Higher concentration of popularity on university
traces (lower entropy) - Larger fraction of different objects in the NLANR
traces, what diminish significantly the caching
performance
15Experimental Results Hit Ratio
proxy closer to clients
proxy closer to servers
- Higher hit ratio for University trace
- Strong variation along the time
- What are the factors that causes the variations
on hit ratio?
16Experimental Results Percentage of First-Timers
proxy closer to clients
proxy closer to servers
- Smaller of first-timers at the proxy closer to
clients - Correlation coefficient between hit ratio and the
percentage of first-timers - -0.857 for the NLANR and -0.962 for the
university - Caching policies cannot satisfy first-timers, the
most important factor for poor and good
performance in the analyzed traces
17Experimental Results Entropy
proxy closer to clients
proxy closer to servers
- Proxy closer to clients lower entropy ? higher
concentration of popularity - LRU policy does not take advantage of all
locality of reference - Correlation coefficient between hit ratio and
entropy - -0.787 for the NLANR and -0.453 for the
university - If we had a caching policy able to filter all the
locality (entropy 1), how much could hit ratio
be improved?
18Experimental Results Maximum Improvement
proxy closer to clients
proxy closer to servers
- The hit ratio cannot be significantly improved
for the trace closer to clients - High number of first-timers diminishing the hit
ratio - Improving caching performance
- Reorganization of the hierarchy of caches (cache
placement) - Caching system able to deal with the first-timers
19Conclusions and Future Work
- Summary of main findings
- Strong variation of hit ratio along the time
- High number of first-timers (higher close to
servers) - Main cause of low hit ratio
- LRU policy is not able to filter the entire
locality of a stream - Small correlation with hit ratio
- The maximum improvement we could obtain over LRU
- less than 5 percent closer to clients
- In average 25 percent closer to servers
- Results suggest reorganization of cache topology
and a caching system able to deal with the higher
number of first-timers - Future work
- Cache placement find the optimal cache
organization in order to improve the overall
system performance - Auto-adaptive cache system able to minimize
periods of poor performance
20Questions?
- Fabricio Benevenuto, Fernando Duarte,
- Virgilio Almeida, Jussara Almeida
- fabricio, fernando, virgilio, jussara_at_dcc.ufmg.b
r