Title: Web Prefetching: Costs, Benefits and Performance
1Web Prefetching Costs, Benefits and Performance
- Yingyin Jiang, Min-You Wu, Wei Shu
- Dept. of Electrical Computer Engineering
- The University of New Mexico
- Aug 15, 2002
- WCW 2002, Boulder, Colorado
2Talk Focus
- A solution space of web prefetching
- Several object-selection criteria
- New simple selection algorithm
- Select a good set of objects
- Maximize benefit/cost
- Can be tuned to achieve different goals
- New performance metric
- Balance of benefits (hit rate improvement)
against costs(network bandwidth increase)
3Outline
- Motivation
- Our approach
- Performance Evaluation
- Discussions and Conclusions
4Motivation for Prefetching
- Limited hit ratio by passive caching
- Typical -- 20 to 40
- Limited by newly introduced, dynamically
generated data and rapid changes of objects in
the web - Prefetching can further improve hit ratio (reduce
client latency) but sacrifice network bandwidth - Predict future accesses to objects
- Fetch objects before users request them
5Key Parameters for Prefetching Algorithms
- Object popularity
- Zipf popularity distribution
- pi C / i a
- The probability of a request for the ith most
popular document is inversely proportional to i - Object lifetime
- Indication of object modification
- Key factor for the design of prefetching
algorithm
6Solution Space for Web Prefetching
- Six models
- Two extreme cases
- Passive caches(non-prefetching)
- Prefetching all objects
- The other four algorithms use different
object-selecting criteria and fetch objects with
values that exceed the threshold - Popularity
- Lifetime
- Good Fetch
- APL
7Two Simple Schemes
- Popularity
- Keep the most popular objects in the system
- Update these objects immediately whenever they
are modified - Threshold objects popularity
- Lifetime
- Keep objects with longest lifetimes
- Mostly consider the network resource demands
- Threshold the expected lifetime of object
8Good Fetch
- Proposed by Venkataramani in Venkataramani01
- Attempt to balance the benefit against the cost
of keeping an object - Threshold probability that a prefetched object
is accessed before it changes -
-
, - li object is expected lifetime
- a avg. request arrival rate
- pi object is popularity
- Prefetch object i if
9APL
- Attempt to balance benefit against cost
- Threshold the expected number of requests for
the object i that arrive during its lifetime - ,
- li object is expected lifetime
- a avg. request arrival rate
- pi object is popularity
- Prefetch object i if
10Enhanced APL
- Enhanced APL
- Prefetch object i if
- Motivation -- adapt to network status
- When the network has abundant bandwidth, a larger
value of n can be used to fetch more popular
objects to improve hit ratio - When the network has congestions, a smaller value
of n can be used or prefetching can even be
disabled to save the bandwidth
11Performance Evaluation for Prefetching
- Evaluation metrics
- Benefit -- hit ratio
- Cost bandwidth
- Benefit/cost H/B
- Algorithms to be evaluated
- Popularity
- Lifetime
- Good Fetch
- APL
12New Evaluation Metric H/B
- Measure benefit/cost
- Passive caching serves as a baseline for
comparison - Enhanced
- Emphasize benefit -- hit ratio improvement
- When system has plenty of spare bandwidth, a
small fraction of hit ratio improvement can still
be justified
13Evaluation Methodology
- Analytical simulation evaluation
- Give a proof of concept for performance of
different algorithms - Experimental settings
- Poisson model of user request arrival
- Workload of one million objects Douglis97,
Breslau99, Nishikawa98 - Zipf popularity distribution, with parameter
0.986 Breslau99 - Object lifetimes distribution obtained from
Douglis97 - Fixed object size 10K Bytes Bray96,
Williams96, Abdulla98, Arlitt99 - No correlation between lifetimes, sizes,
popularities Crovella98, Breslau99
14Distribution of Object Lifetimes
Douglis97
- We vary the mean lifetime of objects across
several orders of magnitude - Shifting factor 0, mean 3.8 months
- Shifting factor -2, mean 1.2 days
- Shifting factor -4, mean 16. 7 minutes
- The shift factor denotes the horizontal
displacement along the lifetime axis (on log
scale) of the Cumulative Distribution Function
15Results -- Hit Ratio
shift factor -4
shift factor 0
Hit Ratio
Hit Ratio
Log10( of prefetched objects)
Log10( of prefetched objects)
- Popularity -- the highest hit ratio
- APL (n 1) works very close to GoodFetch
- GoodFetch and APL (n 1) work closer to Lifetime
at longer mean lifetime, and closer to Popularity
at shorter mean lifetime - Lifetime the lowest hit ratio
16Results -- Bandwidth
shift factor -4
BW(kbps)
BW(kbps)
Log10( of prefetched objects)
Log10( of prefetched objects)
- Popularity consumes the most network bandwidth
compared to others - GoodFetch and APL obtain significant improvement
in hit ratio at an expense of moderate bandwidth
increase - e.g. when prefetching 0.1 objects, 15 increase
on hit ratio (39.54 over 24.3) - Total bandwidth lt 2demand bandwidth (113.50 kbps
over 60.57 kbps) - Lifetime consumes a smallest amount of bandwidth
17Results -- H/B
shift factor 0
shift factor -4
H/B
H/B
Log10( of prefetched objects)
Log10( of prefetched objects)
- Popularity drop quickly, not comparable with
others - GoodFetch and APL -- attain high H/B values and
show their effectiveness on maximizing
benefit/cost - Lifetime -- slowly decrease all the way from the
beginning
18APL Family Hit Ratio
Hit Ratio
Log10( of prefetched objects)
- a ( pi )n li , n 0.5, 1, 2, 5
- n 1, APL -gt Good Fetch, maximize benefit/cost
- n gt 1, APL -gt Popularity, increase hit ratio
- n lt 1, APL -gt Lifetime, reduce bandwidth
consumption
19APL Family Bandwidth
BW(kbps)
Log10( of prefetched objects)
- E.g., n 5, APLs hit ratio is very close to
Popularity, and bandwidth cost is favorably
smaller.
20APL Family H/B Ratio
H/B
Log10( of prefetched objects)
- From H/B point of view, APL can bias on popular
or long-lived objects without sacrificing too
much benefit/cost - n 1, achieve the best benefit/cost
- n gt 1, get more increase on hit ratio with fair
BW consumption - n lt 1, reduce bandwidth and still with reasonable
hit ratio
21Enhanced Hk/B
- Recall -- why do we extend H/B to Hk/B?
- Emphasize hit ratio improvement when evaluating
benefit/cost - When evaluating with Hk/B, a small fraction of
hit ratio improvement can still be justified even
at the cost of disproportionate bandwidth
increase
22Hk/B Evaluation
Hk/B
Log10( of prefetched objects)
- With higher k, it allows prefetching of more
objects -gt encourage more hit ratio improvement
with Hk/B - For Hk/B gt 1, how many objects can be prefetched?
- K 1 --700 k 2 -- 2,000 k 3 -- 7,000 k
4 -- 30,000 k 5 -- 200,000
23Discussions
100
Hit Ratio
Bandwidth
- We obtain a solution space for prefetching where
different strategies lie along axes of hit ratio
and bandwidth with different performance
24Conclusions
- We propose a new prefetching algorithm APnL
that can be made adaptive to different network
status by varying n - Prefetching must consider both object popularity
and lifetime in order to significantly improve
hit ratios at modest costs