Title: Ismail Ari
1Who is more adaptive? A.C.M.E.(Adaptive Caching
using Multiple Experts)
WDAS, March 2002
- Ismail Ari, Ahmed Amer
- Ethan L. Miller, Scott Brandt, Darrell D. E. Long
2Introduction
- We describe an adaptive caching technique that
can find the current best policy or mixture of
policies for a workload - Requires no manual tuning
- Allows exclusive caching without message
exchanges - Scalable distributed caching clusters can be
formed - Scalable distributed data access achieved
3Motivations for Proxy Caching
- Exponentially growing number of clients on the
Internet - These clients (We) access data/objects
distributed all over the world - Everybody wants fast response times
- Latency
- some objects are far away,
- some objects are very popular and create hot
spots of networks and servers - Distributed caches bring data closer to the
clients and enable data sharing - They reduce latency, network load and server load
4Summarizing Proverb
With twenty five years of Internet experience
weve learned one way to deal with exponential
growth-Caching. - Van Jacobson
5Problems with Caching
- Cache sizes are dwarfed when compared to the
unique document space accessed by clients - Only the most valuable objects are to be cached
- What defines the most valuable?
- Different cache replacement policies have
different values or priorities - They combine multiple criteria (recency of
access, popularity) into a Priority Key that
describes their ordering of objects
6Which Criteria or Policy to use?
FIFO First in First Out LRU Least Recently
Used LFU Least Freq Used
GDSF Greedy Dual Size with Frequency
Cao97,Arlitt99,Jin00 LRV Lowest Relative Value
Rizzo97
7Which Criteria or Policy to use?
FIFO First in First Out LRU Least Recently
Used LFU Least Freq Used
GDSF Greedy Dual Size with Frequency
Cao97,Arlitt99,Jin00 LRV Lowest Relative Value
Rizzo97
8Which Criteria or Policy to use?
- Policies are statically embedded in systems
a-priori - But we dont know where and how the system will
be used. - Their performance is workload dependent
- Some policies are better than others in certain
workload - Request streams may have sub-streams that favor
other policies - As the workload changes over time (hour, day,
year) the performance of static policies degrade - As the network topology changes the workload
changes
9Solution Be adaptive
- Systems and workloads are complex and are under
continuous change - Manual tuning and monitoring is tedious, if at
all possible. - Systems that adapt by message and database
exchanges are not scalable. - Choose all policies !!
- Automatically adjusts to the best mixture of
policies that current conditions require - How to mix cache policies?
10How to mix? Biological Motivations
- Imagine all cache policies as species competing
for food (documents or objects) in habitat
(cache) - The fitness of a specie is based on how well it
eats - Fitness of a policy is its Hit Rate or Byte-Hit
Rate - The population share (or frequency) of a specie
depends on its fitness - Highly fit species may starve the others and if
conditions change the whole system collapses
11How to mix? Biological Motivations
- Predators (probabilistically) prey on the most
frequent (or easy) species - Predators protect diversity (mixing) among
species - By avoiding the most fit specie from starving the
others - Our predator implementation (resource manager)
both assigns the objects to caches and manages
cache spaces - Problem Cannot allow duplications?assign
probabilistically - Problem Lucky draws (unfairness) can cause bad
policies to gain fitness
Predator
12How to mix? Virtual Caches
- We define a pool of virtual caches each
simulating a single cache policy and object
ordering - Virtual caches act as if they have the whole
cache, but they only keep object header
information not the actual data
13Weighted Experts
- In Machine learning terminology experts are
algorithms (e.g. LRU) that make predictions
denoted by vector (xt) - Weights of experts (wt) represent their the
quality of predictions - The Master Algorithm predicts with a weighted
average of the experts predictions - ýt wt . xt
- Depending on the true outcome (yt) (hit/miss) we
incur loss and later update weights - e.g. Loss( ýt, yt ) (1-0)2 1
14Fitting Caching into Expert Framework
- In our paper we describe
- A pool of virtual cache policies voting for
objects - The highest vote objects stay in the real cache
- After the hits/misses their weights are updated
proportional to their vote - In current implementation
- Virtual caches tell whether they had hits or
misses - This is their prediction or vote
- Their predictions are compared to the true
outcome - Virtual policies that predict workload well are
rewarded with a weight increase or vice versa - Real cache looks like the virtual cache with the
highest weight, but is still a mixture of
multiple policies
15Weight Updates of Virtual Caches
- Discrete Loss
- Size based loss (Not all misses are equal!)
- e.g. f(object size) log(size)
16Machine Learning Algorithms
- Loss Update (Weighted Majority Algorithm LW89)
- wt1,i wt,i . e ?Lt,i for i 1..n
- Normalization ( ?wt,i )
- where
- ? (eta) the learning rate
- W0 ( 1/n, 1/n, , 1/n) ? weights initialized
equally
17Share Update HerbsterWarmuth95
- Loss Update learns too fast, but does not recover
fast enough (e ?Lt,I e 1 ? 1/2.7) - The curse of the multiplicative updates
(M.Warmuth) - We must make sure weights dont become too small.
- A pool is created from the shared weights
- The algorithms are forced to share in proportion
to their loss - Weights are redistributed from the pool to make
sure all policies have some minimal weight (
wt1,i )
18ACME Design
19Adaptive Policy vs. Fixed Policies
- Synthetic workload switches its nature to favor
SIZE over LRU at 500sec. - Adaptive policy can switch experts
20NLANR-RTP Proxy Trace Results
- Adaptive policy chooses to stay with best fixed
policy (GDSF) and is just as good
21Weights of Virtual Caches
- GDSF was better than the other fixed policies
most of the time - There was still a little bit MIXING
22Current Work
- Trying different adaptive mechanisms
- Fixed Share, Variable Share, Game Heuristics
- Using different sets of policies
- Currently implemented 12 algorithms
- LRU, MRU, FIFO, LIFO, LFU, MFU,
- RAND, LSIZE, GDS, GDSF, LFUDA, GD
- With different workloads
- Web proxy, File System traces, Synthetic
- With different topologies
- N-Level caches, UCSC Network Topology
- With different memory sizes
23Conclusions 1
- Todays systems are complex and have thousands of
configuration parameters - Real life scenarios are dynamic
- With Static policies
- Either do continuous manual tuning and monitoring
- OR get poor performance
- We tried to manually select heterogeneous
policies for a 2-level cache - It was impossible to choose an exact policy for
the second level - Overall performance of unexpected pairs could be
just as good
24Conclusions 2
- Adaptive Caching using Multiple Experts
- Automatically switches policies or selects a
mixture of policies to track the nature of the
workload - No manual tuning or assumptions about the
workload - Performance will be at least as good as the best
fixed policy or better if there mixing in the
workload - If all caches adaptively tune to the workload
they observe, they would not need to exchange
control messages or summarized databases (e.g.
for exclusive caching) - This system is very scalable and allows
construction of globally distributed caching
clusters
25Thank You
- Machine Learning Group in Santa Cruz
- Manfred Warmuth
- Game Theory Group
- Robert Gramacy, Jonathan Panttaja, Clayton
Bjorland - Storage Systems Research Center (SSRC), UCSC
- Storage Technologies Dept. (STD), HP Labs, Palo
Alto - CERIA and WDAS Committee
26References
- Arlitt99 M. Arlitt et.al. , Evaluating Content
ManagementTechniques for Web Proxy Caches,
Proceedings of the 2nd Workshop on Internet
Server Performance WISP'99 - Bousquet95 O. Bousquet, M. K. Warmuth. Tracking
a small set of experts by Mixing Past Posteriors - Cao97 Pei Cao, Cost-Aware WWW Proxy Caching
Algorithms, USENIX SIST97 - Jin00 S. Jin and A. Bestavros, GreedyDual Web
Caching Algorithm Exploiting the two sources of
temporal locality in web request stream,
Proceedings of the 5th International Web Caching
and Content Delivery Workshop, 2000 - LW92 N. Littlestone, M. Warmuth The Weighted
Majority algorithm, UCSC-CRL-91-28, Revised
October,26,1992 - Rizzo97 L. Rizzo and L. Vicisano,Replacement
policies for a proxy cache,IEEE/ACM transactions
on networking 2000, V8-2, pp158-170
27BACKUP SLIDES
28Future Work
- Workload Characterization
time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
29Future Work
- Workload Characterization
time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
Something like My workload is of nature 30
LRU, 40 size, 20 LFU
30Future Work
- Synthetic Workload Generation
- Adaptivity Benchmarks
time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
1- My workload is of nature 30 LRU, 40 size,
20 LFU 2- Use stack distance metric to form
request subsequences 3- Merge subsequences into
synthetic workload
31Cases for Adaptive Caching
- Where to use adaptive caching
- System Memory and Cooperative Caches
- Scalable OBSD clusters
- Storage Embedded Network (SEN) Clusters
32SEN Device
- A router with embedded volatile and non-volatile
storage to be used for object caching - via object snooping in trusted routers
- reduces client response time, network B/W, server
load - globally scalable networked-caching clusters
33A case for Adaptive Caching
- SEN vs. Hierarchical Proxies
- UCSC network topology
- Parameters Changed
- workload
- total amount of memory
- link speeds
- departmental correlations
- replacement policies
- Metrics Measured
- hit rates and byte hit rates
- mean response times
- server load reductions
34Salient Design Features
- Globally Unique Object Identification (GUOID)
- Ad-hoc multicast support
- Backwards compatible
- Operation
- SEN nodes cache all bypassing data objects
- Clients request
- SEN node sends a local copy if it exists
- Forwards otherwise
35Fixed Share Update Bousquet95
- pool ? 1- (1- ? ) loss(i). wt1,i
- (1..n)
-
- wt1,i (1- ? ) loss(i). wt1,i
- 1/(n-1).(pool 1- (1- ? ) loss(i). wt1,i)
-
- where
- ? the share rate and ? ? 0,1)
- w0 ( 1/n, 1/n, , 1/n)
36Mixing Update Bousquet95
- Mixing Update
- wq,i ? ?t1,q . wq m, where ??t1 1
- q 0? t
- Mixing Schemes
?t1 (q)
?t1 (q)
?t1 (q)
1-?
1-?
1-?
? ? /(t-q)
?
? / t
0 1 2 t-1 t
0 1 2 t-1 t
0 1 2 t-1 t
FS to Start Vector
FS to Uniform Past
FS to Decaying Past
37Another View Weighted Virtual Caches