Title: Objective-Optimal Algorithms for Long-term Web Prefetching
1Objective-Optimal Algorithms for Long-term Web
Prefetching
- Bin Wu Ajay Kshemkalyani
- Dept. of Computer Science,
- Univ. of Illinois at Chicago
- ajayk_at_cs.uic.edu
2Outline
- Problem definition and background
- Web prefetching algorithms
- Performance metrics
- Objective-Greedy algorithms (O(n) time)
- Hit rate greedy (also hit rate optimal)
- Bandwidth greedy (also bandwidth optimal)
- H/B greedy
- H/B-Optimal algorithm (expected O(n) time)
- Simulation results
- Conclusions
3Introduction
- Web caching reduces user-perceived latency
- Client-Server mode
- Bottleneck occurs at server side
- Means of improving performance
- local cache, proxy server, server farm, etc.
- Cache management LRU, Greedy dual-size, etc.
- On-demand caching vs. (long-term) prefetching
- Prefetching is effective in dynamic environments.
- Clients subscribe to web objects
- Server pushes fresh copies into web caches
- Selection of prefetched objects based on
long-term statistical characteristics, maintained
by CDS
4Introduction
- Web prefetching
- Caches web objects in advance
- Updated by web server
- Reduces retrieval latency and user access time
- Requires more bandwidth and increases traffic.
- Performance metrics
- Hit rate
- Bandwidth usage
- Balance of the two
5Object Selection Criteria
- Popularity
- (Access frequency)
- Lifetime
- Good Fetch
- APL
6Web Object Characteristics
- Access frequency
- Zipf-like request model is used in web traffic
modeling. - The relationship between access frequency p and
popularity rank i of web object -
7Web Object Characteristics
- The generalized Zipfs-like distribution of web
requests is calculated as - k is a normalization constant, i is the object ID
(popularity rank), and a is a Zipfs parameter - 0.986 (Cunha et al.),
- 0.75 (Nishikawa et al.) and
- 0.64 (Breslau et al.)
8Web Object Characteristics
- Size of Objects
- Average object size1015 KB.
- No strong correlation between object size and its
access frequency. - Lifetime of web objects
- Average time interval between updates
- Weak correlation between access frequency and
lifetime.
9Caching Architecture
- Prefetching selection algorithms use as an input
these global statistics - Estimates of object reference frequencies
- Estimates of object lifetimes
- Content distribution servers cooporate to
maintain these statistics - When an object is updated in the original server,
the new version will be sent to any cache that
has subscribed to it.
10Solution space for web prefetching
- Two extreme cases
- Passive caches (non-prefetching)
- Least network bandwidth and lowest cache hit rate
- Prefetching all objects
- 100 cache hit rate
- Huge amount of unnecessary bandwidth
- Existing algorithms use different
object-selecting criteria and fetch objects
exceeding the threshold.
11Steady State Properties
- Steady state hit rate for object i
- is defined as freshness factor, f(i)
- Overall hit rate
- Especially,
- (Venkataramani et al.)
12Steady State Properties
- Steady state bandwidth for object i
- Total bandwidth
- Especially
13Objective Metrics
- Hit rate benefit
- Bandwidth cost
- H/B model balance of benefit and cost
- Basic H/B
- Enhanced H/B
- (Jiang, et al.)
14Existing Prefetching Algorithms
- Popularity Markatos et al.
- Keep the most popular objects in the system
- Update these objects immediately when they change
- Criterion objects popularity
- Expected to achieve high hit rate
- Lifetime Jiang et al.
- Keep objects with longest lifetimes
- Mostly consider the network resource demands
- Threshold the expected lifetime of object
- Expected to minimize bandwidth usage
15Existing Prefetching Algorithms
- Good Fetch Venkataramani et al.
- Computes the probability that an object is
accessed before it changes. - Prefetch objects with high probability of being
accessed during their average lifetime -
- Prefetch object i if the probability exceeds
threshold. - Objects with higher access frequencies and longer
update intervals are more likely to be prefetched - Balance the benefit (hit rate increase) against
the cost (bandwidth increase) of keeping an
object.
16Existing Prefetching Algorithms
- APL Jiang et al.
- Computes apl values of web objects.
- apl of an object represents expected number of
accesses during its lifetime - Prefetch object i if its apl exceeds threshold.
- Tends to improve hit rate attempts to balance
benefit (hit rate) against cost (bandwidth).
17Existing Prefetching Algorithms
- Enhanced APL
- ngt1, prefers objects with higher popularity
(emphasize hit rate) - nlt1, prefers objects with longer lifetime
(emphasize network bandwidth)
18Objective-Greedy Algorithms
- Existing algorithms choose prefetching criteria
based on intuitions - These intuitions are not aimed at any specific
performance metrics - These intuitions consider only individual
objects characteristics, not the global impact - None of them gave optimal performance based on
any metric - Simple counter-examples can be shown
19Objective-Greedy Algorithms
- Objective-Greedy algorithms select criteria to
intentionally improve performance based on
various metrics. - E.g., Hit Rate-Greedy algorithm aims to improve
the overall hit rate, thus, reduce the latency of
object requests.
20H/B-Greedy Prefetching
- Consider the H/B value of on-demand caching
- If object j is prefetched, then H/B is updated
to
21H/B-Greedy Prefetching
- We define
-
-
- as the increase factor of object j, incr(j).
- incr(j) indicates the amount by which H/B can be
increased if object j is selected.
22H/B-Greedy Prefetching
- H/B-Greedy prefetching prefetches those m objects
with greatest increase factors. - The selection is based on the effect on the hit
rate by prefetching individual objects. - H/B-Greedy is still not an optimal algorithm in
terms of H/B value.
23(No Transcript)
24Hit Rate-Greedy Prefetching
- To maximize the overall hit rate given the number
of objects to prefetch, m, we select the m
objects with the greatest hit rate contribution - This algorithm is optimal in terms of hit rate.
25Bandwidth-Greedy Prefetching
- To minimize the total bandwidth given m, the
number of objects to prefetch, we select the m
objects with least bandwidth contribution - Bandwidth-Greedy Prefetching is optimal in terms
of bandwidth consumption.
26H/B-Optimal Prefetching
- Optimal algorithm for H/B metric provided by a
solution to the following selection problem. - This is equivalent to maximum weighted average
problem with pre-selected items.
27Maximum Weighted Average
- Maximum Weighted Average Problem
- Totally n courses, with different credit hours
and scores - select m (m lt n ) courses
- maximize the GPA of m selected courses
- Solution
- If m1
- Then select course with highest score
- What if mgt1?
- A misleading intuition select the m courses with
highest scores.
28A Course Selection Problem
Courses A B C D E F G H
Credit hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0
Scores 70 90 95 85 75 60 65 80
- If m2
- If we select the 2 courses with highest scores
C and B. - then GPA 93.33
- But if we select C and D, then GPA 93.57
- Question how to select m courses such that the
GPA is maximized? - Answer Eppstein Hirschberg solved this
29With Pre-selected items
Courses A B C D E F G H
Credit hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0
Scores 70 90 95 85 75 60 65 80
- Maximum Weighted Average with pre-selected items
- Totally n courses, with different credit hours
and scores - Course A and E (for example) must be selected,
plus - Select additional m (m is given, mltn) courses,
such that - the resulting GPA is maximized
30- Pre-selection is not trivial
- Selection domain BI, no pre-selection, m2
- optimal subset B,C, GPA 88.33
- Selection domain BI, A is pre-selected, m2
- one candidate subset A,D,H, GPA 75.61
- better than A,B,C, GPA 70.625
- Conclusion B,C not contained in optimal subset
for pre- selected problem.
Course A B C D E F G H I
Credit 5.0 1.0 2.0 10.0 1.5 2.5 2.0 3.0 4.0
Score 60 95 85 83 63 71 80 77 65
31H/B-Optimal v.s. Course selection
- The problem is formulated as
- Where v05.0702.075500, and w05.02.07.0,
in the previous example. - Equivalent to H/B-Optimal selection problem
32H/B-Optimal v.s. Course selection
33H/B-Optimal algorithm design
- The selection of m courses is not trivial
- For course i, we define auxiliary function
- And for a given number m, we define a Utility
function
34H/B-Optimal algorithm
- Lemma 1
- Suppose A is the maximum GPA we are computing,
then for any subset S S and Sm - Lemma 1 indicates that the optimal subset
contains those courses that have the m largest ri
(A) values
35H/B-Optimal algorithm design
- n6, m4
- Each line is ri (x)
- Assume we know
- A
- Optimal subset
- has the 4 courses
- with largest ri (A) values.
- Dilemma A is unknown
36H/B-Optimal algorithm design
- Lemma 2
- lemma 2 narrows
- range of A
- (Xl , Xr) is the current
- A-range
37H/B-Optimal algorithm design
- If F (xl) gt 0 and F (xr) lt 0, then A in (xl, xr)
- Compute the value of F((xlxr)/2)
- - if F((xlxr)/2) gt 0, then A gt (xlxr)/2
- - if F((xlxr)/2) lt 0, then A lt (xlxr)/2
- - if F((xlxr)/2) 0, then A (xlxr)/2
(Lemma 2) - Narrow down the range of A by half
38H/B-Optimal algorithm design
- Why keep on narrowing down the range of A ?
- If intersection of rj (x) and rk (x) falls out of
range, then the ordering of rj (x) and rk (x) is
determined within the range, so is rj (A) and rk
(A), by comparing their slopes. - If the range is narrow enough that there are no
intersections of r (x) lines within the range
then the total ordering of all r (A) values is
determined. - Now our optimal problem is solved just select
the m candidates with highest r (A) values. - Main idea to solve this optimal problem.
39H/B-Optimal algorithm design
- However, the total ordering requires O(n2) time
complexity - A randomized approach is used instead, this
randomized algorithm - Iteratively reduces the problem domain into a
smaller one. - The algorithm maintains 4 sets X, Y, E, Z,
initially empty -
40H/B-Optimal algorithm design
- In each iteration, randomly selects a course i,
and compare it with each of the other courses,
k. - There are 4 possibilities
- 1). if rk(A) gt ri(A) insert k into set X
- 2). if rk(A) lt ri(A) insert k into set Y
- 3). if wkwi and vkvi insert k into set E
- 4). if undetermined insert k into set Z
- Now do the following loop
- loop
- narrow the range of A by half
- compare ri(A) with rk(A) for k in Z
- if appropriate, move k to X or Y, accordingly
- until Z is sufficiently small (i.e., Z lt
S/32)
41H/B-Optimal algorithm design
- The sets X or Y have enough members.
- Next, examine and compare the sizes of X, Y and
E
42H/B-Optimal algorithm design
- 1). If XE gt m
- At least m courses whose r(A) values are
greater than r(A) value of all courses in Y. All
members in Y may be removed. Then S S -
Y
43H/B-Optimal algorithm design
- 2). If YE gt S-m
- All members in X are among the top m
courses. All members in X must be in the optimal
set. Collapse X into a single course (This course
is included in the final optimal set). Then - S S - X 1
- m m - X 1.
44H/B-Optimal algorithm design
- In either case, the resulting domain has reduced
size. - By iteratively removing or collapsing courses,
the problem domain finally has only one course
remaining a course formed by collapsing all
courses in optimal set. - Complexity
- Expected time complexity, briefly (Assume Sb is
the domain before iteration and Sa after.) - 1). Each iteration takes expected time O(Sb)
- 2). Expected size Sa (207/256) Sb
-
- The recurrence relation of the iteration
- T(n) O(n) T(207/256)n
- Resolves to linear time complexity.
45H/B-Greedy v.s. H/B-Optimal
- H/B-greedy is an approximation to H/B-Optimal
- H/B-greedy achieves higher H/B metric than
- any existing algorithms.
- H/B greedy is more easy to implement than
H/B-Optimal.
46Simulation Results
- Evaluation of H/B Greedy Prefetching
- Figure 1 H/B,for total object number 1,000.
- Figure 2 H/B,for total object number 10,000.
- Figure 3 H/B,for total object number 100,000.
- Figure 4 H/B,for total object number
1,000,000. - Evaluation of H-Greedy and B-Greedy algorithm
- Figure 5 H-Greedy algorithm.
- Figure 6 B-Greedy algorithm.
- Figure 7 B-Greedy algorithm, zoomed in.
47Figure 1 H/B, for total object number1,000
48Figure 2 H/B, for total object number10,000
49Figure 3 H/B, total object number100,000
50Figure 4 H/B, total object number1,000,000
51Figure 5 H-Greedy algorithm
52Figure 6 B-Greedy algorithm
53Figure 7 B-Greedy, Bandwidth magnified
54Performance Comparison
 Table 1. Performance comparison of different
algorithms in terms of various metrics. (Lower
values represents better performance)
55Conclusions
- Proposed a family of Objective-Greedy prefetching
algorithms, that are superior to Popularity, Good
Fetch, APL, Lifetime - Hit rate greedy (this is also optimal)
- Bandwidth greedy (this is also optimal)
- H/B greedy
- All the above are O(n) complexity
- Proposed an H/B-Optimal algorithm, that is also
O(n) expected time - Experimental evaluation shows significant gains
over existing algorithms - H/B-greedy is almost as good as H/B-optimal
56Question?