Ismail Ari - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Ismail Ari

Description:

Univ. California Santa Cruz. Introduction ... Univ. California Santa Cruz. Fitting Caching into ... Tracking a small set of experts by Mixing Past Posteriors ... – PowerPoint PPT presentation

Number of Views:292
Avg rating:3.0/5.0
Slides: 38
Provided by: ismai
Category:
Tags: ari | ismail | santa | tracker

less

Transcript and Presenter's Notes

Title: Ismail Ari


1
Who is more adaptive? A.C.M.E.(Adaptive Caching
using Multiple Experts)
WDAS, March 2002
  • Ismail Ari, Ahmed Amer
  • Ethan L. Miller, Scott Brandt, Darrell D. E. Long

2
Introduction
  • We describe an adaptive caching technique that
    can find the current best policy or mixture of
    policies for a workload
  • Requires no manual tuning
  • Allows exclusive caching without message
    exchanges
  • Scalable distributed caching clusters can be
    formed
  • Scalable distributed data access achieved

3
Motivations for Proxy Caching
  • Exponentially growing number of clients on the
    Internet
  • These clients (We) access data/objects
    distributed all over the world
  • Everybody wants fast response times
  • Latency
  • some objects are far away,
  • some objects are very popular and create hot
    spots of networks and servers
  • Distributed caches bring data closer to the
    clients and enable data sharing
  • They reduce latency, network load and server load

4
Summarizing Proverb
With twenty five years of Internet experience
weve learned one way to deal with exponential
growth-Caching. - Van Jacobson
5
Problems with Caching
  • Cache sizes are dwarfed when compared to the
    unique document space accessed by clients
  • Only the most valuable objects are to be cached
  • What defines the most valuable?
  • Different cache replacement policies have
    different values or priorities
  • They combine multiple criteria (recency of
    access, popularity) into a Priority Key that
    describes their ordering of objects

6
Which Criteria or Policy to use?
FIFO First in First Out LRU Least Recently
Used LFU Least Freq Used
GDSF Greedy Dual Size with Frequency
Cao97,Arlitt99,Jin00 LRV Lowest Relative Value
Rizzo97
7
Which Criteria or Policy to use?
FIFO First in First Out LRU Least Recently
Used LFU Least Freq Used
GDSF Greedy Dual Size with Frequency
Cao97,Arlitt99,Jin00 LRV Lowest Relative Value
Rizzo97
8
Which Criteria or Policy to use?
  • Policies are statically embedded in systems
    a-priori
  • But we dont know where and how the system will
    be used.
  • Their performance is workload dependent
  • Some policies are better than others in certain
    workload
  • Request streams may have sub-streams that favor
    other policies
  • As the workload changes over time (hour, day,
    year) the performance of static policies degrade
  • As the network topology changes the workload
    changes

9
Solution Be adaptive
  • Systems and workloads are complex and are under
    continuous change
  • Manual tuning and monitoring is tedious, if at
    all possible.
  • Systems that adapt by message and database
    exchanges are not scalable.
  • Choose all policies !!
  • Automatically adjusts to the best mixture of
    policies that current conditions require
  • How to mix cache policies?

10
How to mix? Biological Motivations
  • Imagine all cache policies as species competing
    for food (documents or objects) in habitat
    (cache)
  • The fitness of a specie is based on how well it
    eats
  • Fitness of a policy is its Hit Rate or Byte-Hit
    Rate
  • The population share (or frequency) of a specie
    depends on its fitness
  • Highly fit species may starve the others and if
    conditions change the whole system collapses

11
How to mix? Biological Motivations
  • Predators (probabilistically) prey on the most
    frequent (or easy) species
  • Predators protect diversity (mixing) among
    species
  • By avoiding the most fit specie from starving the
    others
  • Our predator implementation (resource manager)
    both assigns the objects to caches and manages
    cache spaces
  • Problem Cannot allow duplications?assign
    probabilistically
  • Problem Lucky draws (unfairness) can cause bad
    policies to gain fitness

Predator
12
How to mix? Virtual Caches
  • We define a pool of virtual caches each
    simulating a single cache policy and object
    ordering
  • Virtual caches act as if they have the whole
    cache, but they only keep object header
    information not the actual data

13
Weighted Experts
  • In Machine learning terminology experts are
    algorithms (e.g. LRU) that make predictions
    denoted by vector (xt)
  • Weights of experts (wt) represent their the
    quality of predictions
  • The Master Algorithm predicts with a weighted
    average of the experts predictions
  • ýt wt . xt
  • Depending on the true outcome (yt) (hit/miss) we
    incur loss and later update weights
  • e.g. Loss( ýt, yt ) (1-0)2 1

14
Fitting Caching into Expert Framework
  • In our paper we describe
  • A pool of virtual cache policies voting for
    objects
  • The highest vote objects stay in the real cache
  • After the hits/misses their weights are updated
    proportional to their vote
  • In current implementation
  • Virtual caches tell whether they had hits or
    misses
  • This is their prediction or vote
  • Their predictions are compared to the true
    outcome
  • Virtual policies that predict workload well are
    rewarded with a weight increase or vice versa
  • Real cache looks like the virtual cache with the
    highest weight, but is still a mixture of
    multiple policies

15
Weight Updates of Virtual Caches
  • Discrete Loss
  • Size based loss (Not all misses are equal!)
  • e.g. f(object size) log(size)

16
Machine Learning Algorithms
  • Loss Update (Weighted Majority Algorithm LW89)
  • wt1,i wt,i . e ?Lt,i for i 1..n
  • Normalization ( ?wt,i )
  • where
  • ? (eta) the learning rate
  • W0 ( 1/n, 1/n, , 1/n) ? weights initialized
    equally

17
Share Update HerbsterWarmuth95
  • Loss Update learns too fast, but does not recover
    fast enough (e ?Lt,I e 1 ? 1/2.7)
  • The curse of the multiplicative updates
    (M.Warmuth)
  • We must make sure weights dont become too small.
  • A pool is created from the shared weights
  • The algorithms are forced to share in proportion
    to their loss
  • Weights are redistributed from the pool to make
    sure all policies have some minimal weight (
    wt1,i )

18
ACME Design
19
Adaptive Policy vs. Fixed Policies
  • Synthetic workload switches its nature to favor
    SIZE over LRU at 500sec.
  • Adaptive policy can switch experts

20
NLANR-RTP Proxy Trace Results
  • Adaptive policy chooses to stay with best fixed
    policy (GDSF) and is just as good

21
Weights of Virtual Caches
  • GDSF was better than the other fixed policies
    most of the time
  • There was still a little bit MIXING

22
Current Work
  • Trying different adaptive mechanisms
  • Fixed Share, Variable Share, Game Heuristics
  • Using different sets of policies
  • Currently implemented 12 algorithms
  • LRU, MRU, FIFO, LIFO, LFU, MFU,
  • RAND, LSIZE, GDS, GDSF, LFUDA, GD
  • With different workloads
  • Web proxy, File System traces, Synthetic
  • With different topologies
  • N-Level caches, UCSC Network Topology
  • With different memory sizes

23
Conclusions 1
  • Todays systems are complex and have thousands of
    configuration parameters
  • Real life scenarios are dynamic
  • With Static policies
  • Either do continuous manual tuning and monitoring
  • OR get poor performance
  • We tried to manually select heterogeneous
    policies for a 2-level cache
  • It was impossible to choose an exact policy for
    the second level
  • Overall performance of unexpected pairs could be
    just as good

24
Conclusions 2
  • Adaptive Caching using Multiple Experts
  • Automatically switches policies or selects a
    mixture of policies to track the nature of the
    workload
  • No manual tuning or assumptions about the
    workload
  • Performance will be at least as good as the best
    fixed policy or better if there mixing in the
    workload
  • If all caches adaptively tune to the workload
    they observe, they would not need to exchange
    control messages or summarized databases (e.g.
    for exclusive caching)
  • This system is very scalable and allows
    construction of globally distributed caching
    clusters

25
Thank You
  • Machine Learning Group in Santa Cruz
  • Manfred Warmuth
  • Game Theory Group
  • Robert Gramacy, Jonathan Panttaja, Clayton
    Bjorland
  • Storage Systems Research Center (SSRC), UCSC
  • Storage Technologies Dept. (STD), HP Labs, Palo
    Alto
  • CERIA and WDAS Committee

26
References
  • Arlitt99 M. Arlitt et.al. , Evaluating Content
    ManagementTechniques for Web Proxy Caches,
    Proceedings of the 2nd Workshop on Internet
    Server Performance WISP'99
  • Bousquet95 O. Bousquet, M. K. Warmuth. Tracking
    a small set of experts by Mixing Past Posteriors
  • Cao97 Pei Cao, Cost-Aware WWW Proxy Caching
    Algorithms, USENIX SIST97
  • Jin00 S. Jin and A. Bestavros, GreedyDual Web
    Caching Algorithm Exploiting the two sources of
    temporal locality in web request stream,
    Proceedings of the 5th International Web Caching
    and Content Delivery Workshop, 2000
  • LW92 N. Littlestone, M. Warmuth The Weighted
    Majority algorithm, UCSC-CRL-91-28, Revised
    October,26,1992
  • Rizzo97 L. Rizzo and L. Vicisano,Replacement
    policies for a proxy cache,IEEE/ACM transactions
    on networking 2000, V8-2, pp158-170

27
BACKUP SLIDES
28
Future Work
  • Workload Characterization

time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
29
Future Work
  • Workload Characterization

time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
Something like My workload is of nature 30
LRU, 40 size, 20 LFU
30
Future Work
  • Synthetic Workload Generation
  • Adaptivity Benchmarks

time
frequency
LFU
LRU
GDSF
size
FIFO
SIZE
Unique Documents held in limited cache space
1- My workload is of nature 30 LRU, 40 size,
20 LFU 2- Use stack distance metric to form
request subsequences 3- Merge subsequences into
synthetic workload
31
Cases for Adaptive Caching
  • Where to use adaptive caching
  • System Memory and Cooperative Caches
  • Scalable OBSD clusters
  • Storage Embedded Network (SEN) Clusters

32
SEN Device
  • A router with embedded volatile and non-volatile
    storage to be used for object caching
  • via object snooping in trusted routers
  • reduces client response time, network B/W, server
    load
  • globally scalable networked-caching clusters

33
A case for Adaptive Caching
  • SEN vs. Hierarchical Proxies
  • UCSC network topology
  • Parameters Changed
  • workload
  • total amount of memory
  • link speeds
  • departmental correlations
  • replacement policies
  • Metrics Measured
  • hit rates and byte hit rates
  • mean response times
  • server load reductions

34
Salient Design Features
  • Globally Unique Object Identification (GUOID)
  • Ad-hoc multicast support
  • Backwards compatible
  • Operation
  • SEN nodes cache all bypassing data objects
  • Clients request
  • SEN node sends a local copy if it exists
  • Forwards otherwise

35
Fixed Share Update Bousquet95
  • pool ? 1- (1- ? ) loss(i). wt1,i
  • (1..n)
  • wt1,i (1- ? ) loss(i). wt1,i
  • 1/(n-1).(pool 1- (1- ? ) loss(i). wt1,i)
  • where
  • ? the share rate and ? ? 0,1)
  • w0 ( 1/n, 1/n, , 1/n)

36
Mixing Update Bousquet95
  • Mixing Update
  • wq,i ? ?t1,q . wq m, where ??t1 1
  • q 0? t
  • Mixing Schemes

?t1 (q)
?t1 (q)
?t1 (q)
1-?
1-?
1-?
? ? /(t-q)
?
? / t
0 1 2 t-1 t
0 1 2 t-1 t
0 1 2 t-1 t
FS to Start Vector
FS to Uniform Past
FS to Decaying Past
37
Another View Weighted Virtual Caches
Write a Comment
User Comments (0)
About PowerShow.com