Objective-Optimal Algorithms for Long-term Web Prefetching - PowerPoint PPT Presentation

About This Presentation

Title:

Objective-Optimal Algorithms for Long-term Web Prefetching

Description:

Title: Web pre-fetching: Costs, Benefits, and performance Author: william Last modified by: user1 Created Date: 11/15/2003 3:20:18 PM Document presentation format – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 57

Provided by: Will1164

Learn more at: https://www.cs.uic.edu

Category:

more less

Transcript and Presenter's Notes

Title: Objective-Optimal Algorithms for Long-term Web Prefetching

1
Objective-Optimal Algorithms for Long-term Web
Prefetching

Bin Wu Ajay Kshemkalyani
Dept. of Computer Science,
Univ. of Illinois at Chicago
ajayk_at_cs.uic.edu

2
Outline

Problem definition and background
Web prefetching algorithms
Performance metrics
Objective-Greedy algorithms (O(n) time)
Hit rate greedy (also hit rate optimal)
Bandwidth greedy (also bandwidth optimal)
H/B greedy
H/B-Optimal algorithm (expected O(n) time)
Simulation results
Conclusions

3
Introduction

Web caching reduces user-perceived latency
Client-Server mode
Bottleneck occurs at server side
Means of improving performance
local cache, proxy server, server farm, etc.
Cache management LRU, Greedy dual-size, etc.
On-demand caching vs. (long-term) prefetching
Prefetching is effective in dynamic environments.
Clients subscribe to web objects
Server pushes fresh copies into web caches
Selection of prefetched objects based on
long-term statistical characteristics, maintained
by CDS

4
Introduction

Web prefetching
Caches web objects in advance
Updated by web server
Reduces retrieval latency and user access time
Requires more bandwidth and increases traffic.
Performance metrics
Hit rate
Bandwidth usage
Balance of the two

5
Object Selection Criteria

Popularity
(Access frequency)
Lifetime
Good Fetch
APL

6
Web Object Characteristics

Access frequency
Zipf-like request model is used in web traffic
modeling.
The relationship between access frequency p and
popularity rank i of web object

7
Web Object Characteristics

The generalized Zipfs-like distribution of web
requests is calculated as
k is a normalization constant, i is the object ID
(popularity rank), and a is a Zipfs parameter
0.986 (Cunha et al.),
0.75 (Nishikawa et al.) and
0.64 (Breslau et al.)

8
Web Object Characteristics

Size of Objects
Average object size1015 KB.
No strong correlation between object size and its
access frequency.
Lifetime of web objects
Average time interval between updates
Weak correlation between access frequency and
lifetime.

9
Caching Architecture

Prefetching selection algorithms use as an input
these global statistics
Estimates of object reference frequencies
Estimates of object lifetimes
Content distribution servers cooporate to
maintain these statistics
When an object is updated in the original server,
the new version will be sent to any cache that
has subscribed to it.

10
Solution space for web prefetching

Two extreme cases
Passive caches (non-prefetching)
Least network bandwidth and lowest cache hit rate
Prefetching all objects
100 cache hit rate
Huge amount of unnecessary bandwidth
Existing algorithms use different
object-selecting criteria and fetch objects
exceeding the threshold.

11
Steady State Properties

Steady state hit rate for object i
is defined as freshness factor, f(i)
Overall hit rate
Especially,
(Venkataramani et al.)

12
Steady State Properties

Steady state bandwidth for object i
Total bandwidth
Especially

13
Objective Metrics

Hit rate benefit
Bandwidth cost
H/B model balance of benefit and cost
Basic H/B
Enhanced H/B
(Jiang, et al.)

14
Existing Prefetching Algorithms

Popularity Markatos et al.
Keep the most popular objects in the system
Update these objects immediately when they change
Criterion objects popularity
Expected to achieve high hit rate
Lifetime Jiang et al.
Keep objects with longest lifetimes
Mostly consider the network resource demands
Threshold the expected lifetime of object
Expected to minimize bandwidth usage

15
Existing Prefetching Algorithms

Good Fetch Venkataramani et al.
Computes the probability that an object is
accessed before it changes.
Prefetch objects with high probability of being
accessed during their average lifetime
Prefetch object i if the probability exceeds
threshold.
Objects with higher access frequencies and longer
update intervals are more likely to be prefetched
Balance the benefit (hit rate increase) against
the cost (bandwidth increase) of keeping an
object.

16
Existing Prefetching Algorithms

APL Jiang et al.
Computes apl values of web objects.
apl of an object represents expected number of
accesses during its lifetime
Prefetch object i if its apl exceeds threshold.
Tends to improve hit rate attempts to balance
benefit (hit rate) against cost (bandwidth).

17
Existing Prefetching Algorithms

Enhanced APL
ngt1, prefers objects with higher popularity
(emphasize hit rate)
nlt1, prefers objects with longer lifetime
(emphasize network bandwidth)

18
Objective-Greedy Algorithms

Existing algorithms choose prefetching criteria
based on intuitions
These intuitions are not aimed at any specific
performance metrics
These intuitions consider only individual
objects characteristics, not the global impact
None of them gave optimal performance based on
any metric
Simple counter-examples can be shown

19
Objective-Greedy Algorithms

Objective-Greedy algorithms select criteria to
intentionally improve performance based on
various metrics.
E.g., Hit Rate-Greedy algorithm aims to improve
the overall hit rate, thus, reduce the latency of
object requests.

20
H/B-Greedy Prefetching

Consider the H/B value of on-demand caching
If object j is prefetched, then H/B is updated
to

21
H/B-Greedy Prefetching

We define
as the increase factor of object j, incr(j).
incr(j) indicates the amount by which H/B can be
increased if object j is selected.

22
H/B-Greedy Prefetching

H/B-Greedy prefetching prefetches those m objects
with greatest increase factors.
The selection is based on the effect on the hit
rate by prefetching individual objects.
H/B-Greedy is still not an optimal algorithm in
terms of H/B value.

23
(No Transcript)
24
Hit Rate-Greedy Prefetching

To maximize the overall hit rate given the number
of objects to prefetch, m, we select the m
objects with the greatest hit rate contribution
This algorithm is optimal in terms of hit rate.

25
Bandwidth-Greedy Prefetching

To minimize the total bandwidth given m, the
number of objects to prefetch, we select the m
objects with least bandwidth contribution
Bandwidth-Greedy Prefetching is optimal in terms
of bandwidth consumption.

26
H/B-Optimal Prefetching

Optimal algorithm for H/B metric provided by a
solution to the following selection problem.
This is equivalent to maximum weighted average
problem with pre-selected items.

27
Maximum Weighted Average

Maximum Weighted Average Problem
Totally n courses, with different credit hours
and scores
select m (m lt n ) courses
maximize the GPA of m selected courses
Solution
If m1
Then select course with highest score
What if mgt1?
A misleading intuition select the m courses with
highest scores.

28
A Course Selection Problem
Courses A B C D E F G H
Credit hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0
Scores 70 90 95 85 75 60 65 80

If m2
If we select the 2 courses with highest scores
C and B.
then GPA 93.33
But if we select C and D, then GPA 93.57
Question how to select m courses such that the
GPA is maximized?
Answer Eppstein Hirschberg solved this

29
With Pre-selected items
Courses A B C D E F G H
Credit hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0
Scores 70 90 95 85 75 60 65 80

Maximum Weighted Average with pre-selected items
Totally n courses, with different credit hours
and scores
Course A and E (for example) must be selected,
plus
Select additional m (m is given, mltn) courses,
such that
the resulting GPA is maximized

Pre-selection is not trivial
Selection domain BI, no pre-selection, m2
optimal subset B,C, GPA 88.33
Selection domain BI, A is pre-selected, m2
one candidate subset A,D,H, GPA 75.61
better than A,B,C, GPA 70.625
Conclusion B,C not contained in optimal subset
for pre- selected problem.

Course A B C D E F G H I
Credit 5.0 1.0 2.0 10.0 1.5 2.5 2.0 3.0 4.0
Score 60 95 85 83 63 71 80 77 65
31
H/B-Optimal v.s. Course selection

The problem is formulated as
Where v05.0702.075500, and w05.02.07.0,
in the previous example.
Equivalent to H/B-Optimal selection problem

32
H/B-Optimal v.s. Course selection
33
H/B-Optimal algorithm design

The selection of m courses is not trivial
For course i, we define auxiliary function
And for a given number m, we define a Utility
function

34
H/B-Optimal algorithm

Lemma 1
Suppose A is the maximum GPA we are computing,
then for any subset S S and Sm
Lemma 1 indicates that the optimal subset
contains those courses that have the m largest ri
(A) values

35
H/B-Optimal algorithm design

n6, m4
Each line is ri (x)
Assume we know
A
Optimal subset
has the 4 courses
with largest ri (A) values.
Dilemma A is unknown

36
H/B-Optimal algorithm design

Lemma 2
lemma 2 narrows
range of A
(Xl , Xr) is the current
A-range

37
H/B-Optimal algorithm design

If F (xl) gt 0 and F (xr) lt 0, then A in (xl, xr)
Compute the value of F((xlxr)/2)
- if F((xlxr)/2) gt 0, then A gt (xlxr)/2
- if F((xlxr)/2) lt 0, then A lt (xlxr)/2
- if F((xlxr)/2) 0, then A (xlxr)/2
(Lemma 2)
Narrow down the range of A by half

38
H/B-Optimal algorithm design

Why keep on narrowing down the range of A ?
If intersection of rj (x) and rk (x) falls out of
range, then the ordering of rj (x) and rk (x) is
determined within the range, so is rj (A) and rk
(A), by comparing their slopes.
If the range is narrow enough that there are no
intersections of r (x) lines within the range
then the total ordering of all r (A) values is
determined.
Now our optimal problem is solved just select
the m candidates with highest r (A) values.
Main idea to solve this optimal problem.

39
H/B-Optimal algorithm design

However, the total ordering requires O(n2) time
complexity
A randomized approach is used instead, this
randomized algorithm
Iteratively reduces the problem domain into a
smaller one.
The algorithm maintains 4 sets X, Y, E, Z,
initially empty

40
H/B-Optimal algorithm design

In each iteration, randomly selects a course i,
and compare it with each of the other courses,
k.
There are 4 possibilities
1). if rk(A) gt ri(A) insert k into set X
2). if rk(A) lt ri(A) insert k into set Y
3). if wkwi and vkvi insert k into set E
4). if undetermined insert k into set Z
Now do the following loop
loop
narrow the range of A by half
compare ri(A) with rk(A) for k in Z
if appropriate, move k to X or Y, accordingly
until Z is sufficiently small (i.e., Z lt
S/32)

41
H/B-Optimal algorithm design

The sets X or Y have enough members.
Next, examine and compare the sizes of X, Y and
E

42
H/B-Optimal algorithm design

1). If XE gt m
At least m courses whose r(A) values are
greater than r(A) value of all courses in Y. All
members in Y may be removed. Then S S -
Y

43
H/B-Optimal algorithm design

2). If YE gt S-m
All members in X are among the top m
courses. All members in X must be in the optimal
set. Collapse X into a single course (This course
is included in the final optimal set). Then
S S - X 1
m m - X 1.

44
H/B-Optimal algorithm design

In either case, the resulting domain has reduced
size.
By iteratively removing or collapsing courses,
the problem domain finally has only one course
remaining a course formed by collapsing all
courses in optimal set.
Complexity
Expected time complexity, briefly (Assume Sb is
the domain before iteration and Sa after.)
1). Each iteration takes expected time O(Sb)
2). Expected size Sa (207/256) Sb
The recurrence relation of the iteration
T(n) O(n) T(207/256)n
Resolves to linear time complexity.

45
H/B-Greedy v.s. H/B-Optimal

H/B-greedy is an approximation to H/B-Optimal
H/B-greedy achieves higher H/B metric than
any existing algorithms.
H/B greedy is more easy to implement than
H/B-Optimal.

46
Simulation Results

Evaluation of H/B Greedy Prefetching
Figure 1 H/B,for total object number 1,000.
Figure 2 H/B,for total object number 10,000.
Figure 3 H/B,for total object number 100,000.
Figure 4 H/B,for total object number
1,000,000.
Evaluation of H-Greedy and B-Greedy algorithm
Figure 5 H-Greedy algorithm.
Figure 6 B-Greedy algorithm.
Figure 7 B-Greedy algorithm, zoomed in.

47
Figure 1 H/B, for total object number1,000
48
Figure 2 H/B, for total object number10,000
49
Figure 3 H/B, total object number100,000
50
Figure 4 H/B, total object number1,000,000
51
Figure 5 H-Greedy algorithm
52
Figure 6 B-Greedy algorithm
53
Figure 7 B-Greedy, Bandwidth magnified
54
Performance Comparison

Table 1. Performance comparison of different
algorithms in terms of various metrics. (Lower
values represents better performance)
55
Conclusions

Proposed a family of Objective-Greedy prefetching
algorithms, that are superior to Popularity, Good
Fetch, APL, Lifetime
Hit rate greedy (this is also optimal)
Bandwidth greedy (this is also optimal)
H/B greedy
All the above are O(n) complexity
Proposed an H/B-Optimal algorithm, that is also
O(n) expected time
Experimental evaluation shows significant gains
over existing algorithms
H/B-greedy is almost as good as H/B-optimal