Ismail Ari - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Ismail Ari

Description:

Univ. California Santa Cruz. Introduction ... Univ. California Santa Cruz. Fitting Caching into ... Tracking a small set of experts by Mixing Past Posteriors ... – PowerPoint PPT presentation

Number of Views:292

Avg rating:3.0/5.0

Slides: 38

Provided by: ismai

Category:

more less

Transcript and Presenter's Notes

Title: Ismail Ari

1
Who is more adaptive? A.C.M.E.(Adaptive Caching
using Multiple Experts)
WDAS, March 2002

Ismail Ari, Ahmed Amer
Ethan L. Miller, Scott Brandt, Darrell D. E. Long

2
Introduction

We describe an adaptive caching technique that
can find the current best policy or mixture of
policies for a workload
Requires no manual tuning
Allows exclusive caching without message
exchanges
Scalable distributed caching clusters can be
formed
Scalable distributed data access achieved

3
Motivations for Proxy Caching

Exponentially growing number of clients on the
Internet
These clients (We) access data/objects
distributed all over the world
Everybody wants fast response times
Latency
some objects are far away,
some objects are very popular and create hot
spots of networks and servers
Distributed caches bring data closer to the
clients and enable data sharing
They reduce latency, network load and server load

4
Summarizing Proverb
With twenty five years of Internet experience
weve learned one way to deal with exponential
growth-Caching. - Van Jacobson
5
Problems with Caching

Cache sizes are dwarfed when compared to the
unique document space accessed by clients
Only the most valuable objects are to be cached
What defines the most valuable?
Different cache replacement policies have
different values or priorities
They combine multiple criteria (recency of
access, popularity) into a Priority Key that
describes their ordering of objects

6
Which Criteria or Policy to use?
FIFO First in First Out LRU Least Recently
Used LFU Least Freq Used
GDSF Greedy Dual Size with Frequency
Cao97,Arlitt99,Jin00 LRV Lowest Relative Value
Rizzo97
7
Which Criteria or Policy to use?
FIFO First in First Out LRU Least Recently
Used LFU Least Freq Used
GDSF Greedy Dual Size with Frequency
Cao97,Arlitt99,Jin00 LRV Lowest Relative Value
Rizzo97
8
Which Criteria or Policy to use?

Policies are statically embedded in systems
a-priori
But we dont know where and how the system will
be used.
Their performance is workload dependent
Some policies are better than others in certain
workload
Request streams may have sub-streams that favor
other policies
As the workload changes over time (hour, day,
year) the performance of static policies degrade
As the network topology changes the workload
changes

9
Solution Be adaptive

Systems and workloads are complex and are under
continuous change
Manual tuning and monitoring is tedious, if at
all possible.
Systems that adapt by message and database
exchanges are not scalable.
Choose all policies !!
Automatically adjusts to the best mixture of
policies that current conditions require
How to mix cache policies?

10
How to mix? Biological Motivations

Imagine all cache policies as species competing
for food (documents or objects) in habitat
(cache)
The fitness of a specie is based on how well it
eats
Fitness of a policy is its Hit Rate or Byte-Hit
Rate
The population share (or frequency) of a specie
depends on its fitness
Highly fit species may starve the others and if
conditions change the whole system collapses

11
How to mix? Biological Motivations

Predators (probabilistically) prey on the most
frequent (or easy) species
Predators protect diversity (mixing) among
species
By avoiding the most fit specie from starving the
others
Our predator implementation (resource manager)
both assigns the objects to caches and manages
cache spaces
Problem Cannot allow duplications?assign
probabilistically
Problem Lucky draws (unfairness) can cause bad
policies to gain fitness

Predator
12
How to mix? Virtual Caches

We define a pool of virtual caches each
simulating a single cache policy and object
ordering
Virtual caches act as if they have the whole
cache, but they only keep object header
information not the actual data

13
Weighted Experts

In Machine learning terminology experts are
algorithms (e.g. LRU) that make predictions
denoted by vector (xt)
Weights of experts (wt) represent their the
quality of predictions
The Master Algorithm predicts with a weighted
average of the experts predictions
ýt wt . xt
Depending on the true outcome (yt) (hit/miss) we
incur loss and later update weights
e.g. Loss( ýt, yt ) (1-0)2 1

14
Fitting Caching into Expert Framework

In our paper we describe
A pool of virtual cache policies voting for
objects
The highest vote objects stay in the real cache
After the hits/misses their weights are updated
proportional to their vote
In current implementation
Virtual caches tell whether they had hits or
misses
This is their prediction or vote
Their predictions are compared to the true
outcome
Virtual policies that predict workload well are
rewarded with a weight increase or vice versa
Real cache looks like the virtual cache with the
highest weight, but is still a mixture of
multiple policies

15
Weight Updates of Virtual Caches

Discrete Loss
Size based loss (Not all misses are equal!)
e.g. f(object size) log(size)

16
Machine Learning Algorithms

Loss Update (Weighted Majority Algorithm LW89)
wt1,i wt,i . e ?Lt,i for i 1..n
Normalization ( ?wt,i )
where
? (eta) the learning rate
W0 ( 1/n, 1/n, , 1/n) ? weights initialized
equally

17
Share Update HerbsterWarmuth95

Loss Update learns too fast, but does not recover
fast enough (e ?Lt,I e 1 ? 1/2.7)
The curse of the multiplicative updates
(M.Warmuth)
We must make sure weights dont become too small.
A pool is created from the shared weights
The algorithms are forced to share in proportion
to their loss
Weights are redistributed from the pool to make
sure all policies have some minimal weight (
wt1,i )

18
ACME Design
19
Adaptive Policy vs. Fixed Policies

Synthetic workload switches its nature to favor
SIZE over LRU at 500sec.
Adaptive policy can switch experts

20
NLANR-RTP Proxy Trace Results

Adaptive policy chooses to stay with best fixed
policy (GDSF) and is just as good

21
Weights of Virtual Caches

GDSF was better than the other fixed policies
most of the time
There was still a little bit MIXING

22
Current Work

Trying different adaptive mechanisms
Fixed Share, Variable Share, Game Heuristics
Using different sets of policies
Currently implemented 12 algorithms
LRU, MRU, FIFO, LIFO, LFU, MFU,
RAND, LSIZE, GDS, GDSF, LFUDA, GD
With different workloads
Web proxy, File System traces, Synthetic
With different topologies
N-Level caches, UCSC Network Topology
With different memory sizes

23
Conclusions 1

Todays systems are complex and have thousands of
configuration parameters
Real life scenarios are dynamic
With Static policies
Either do continuous manual tuning and monitoring
OR get poor performance
We tried to manually select heterogeneous
policies for a 2-level cache
It was impossible to choose an exact policy for
the second level
Overall performance of unexpected pairs could be
just as good

24
Conclusions 2

Adaptive Caching using Multiple Experts
Automatically switches policies or selects a
mixture of policies to track the nature of the
workload
No manual tuning or assumptions about the
workload
Performance will be at least as good as the best
fixed policy or better if there mixing in the
workload
If all caches adaptively tune to the workload
they observe, they would not need to exchange
control messages or summarized databases (e.g.
for exclusive caching)
This system is very scalable and allows
construction of globally distributed caching
clusters

25
Thank You

Machine Learning Group in Santa Cruz
Manfred Warmuth
Game Theory Group
Robert Gramacy, Jonathan Panttaja, Clayton
Bjorland
Storage Systems Research Center (SSRC), UCSC
Storage Technologies Dept. (STD), HP Labs, Palo
Alto
CERIA and WDAS Committee

26
References

Arlitt99 M. Arlitt et.al. , Evaluating Content
ManagementTechniques for Web Proxy Caches,
Proceedings of the 2nd Workshop on Internet
Server Performance WISP'99
Bousquet95 O. Bousquet, M. K. Warmuth. Tracking
a small set of experts by Mixing Past Posteriors
Cao97 Pei Cao, Cost-Aware WWW Proxy Caching
Algorithms, USENIX SIST97
Jin00 S. Jin and A. Bestavros, GreedyDual Web
Caching Algorithm Exploiting the two sources of
temporal locality in web request stream,
Proceedings of the 5th International Web Caching
and Content Delivery Workshop, 2000
LW92 N. Littlestone, M. Warmuth The Weighted
Majority algorithm, UCSC-CRL-91-28, Revised
October,26,1992
Rizzo97 L. Rizzo and L. Vicisano,Replacement
policies for a proxy cache,IEEE/ACM transactions
on networking 2000, V8-2, pp158-170

27
BACKUP SLIDES
28
Future Work

Workload Characterization

time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
29
Future Work

Workload Characterization

time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
Something like My workload is of nature 30
LRU, 40 size, 20 LFU
30
Future Work

Synthetic Workload Generation
Adaptivity Benchmarks

time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
1- My workload is of nature 30 LRU, 40 size,
20 LFU 2- Use stack distance metric to form
request subsequences 3- Merge subsequences into
synthetic workload
31
Cases for Adaptive Caching

Where to use adaptive caching
System Memory and Cooperative Caches
Scalable OBSD clusters
Storage Embedded Network (SEN) Clusters

32
SEN Device

A router with embedded volatile and non-volatile
storage to be used for object caching
via object snooping in trusted routers
reduces client response time, network B/W, server
load
globally scalable networked-caching clusters

33
A case for Adaptive Caching

SEN vs. Hierarchical Proxies
UCSC network topology
Parameters Changed
workload
total amount of memory
link speeds
departmental correlations
replacement policies
Metrics Measured
hit rates and byte hit rates
mean response times
server load reductions

34
Salient Design Features

Globally Unique Object Identification (GUOID)
Ad-hoc multicast support
Backwards compatible
Operation
SEN nodes cache all bypassing data objects
Clients request
SEN node sends a local copy if it exists
Forwards otherwise

35
Fixed Share Update Bousquet95

pool ? 1- (1- ? ) loss(i). wt1,i
(1..n)
wt1,i (1- ? ) loss(i). wt1,i
1/(n-1).(pool 1- (1- ? ) loss(i). wt1,i)
where
? the share rate and ? ? 0,1)
w0 ( 1/n, 1/n, , 1/n)

36
Mixing Update Bousquet95

Mixing Update
wq,i ? ?t1,q . wq m, where ??t1 1
q 0? t
Mixing Schemes

?t1 (q)
?t1 (q)
?t1 (q)
1-?
1-?
1-?
? ? /(t-q)
?
? / t
0 1 2 t-1 t
0 1 2 t-1 t
0 1 2 t-1 t
FS to Start Vector
FS to Uniform Past
FS to Decaying Past
37
Another View Weighted Virtual Caches

Write a Comment

User Comments (0)