Objective-Optimal Algorithms for Long-term Web Prefetching - PowerPoint PPT Presentation

About This Presentation
Title:

Objective-Optimal Algorithms for Long-term Web Prefetching

Description:

Title: Web pre-fetching: Costs, Benefits, and performance Author: william Last modified by: user1 Created Date: 11/15/2003 3:20:18 PM Document presentation format – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 57
Provided by: Will1164
Learn more at: https://www.cs.uic.edu
Category:

less

Transcript and Presenter's Notes

Title: Objective-Optimal Algorithms for Long-term Web Prefetching


1
Objective-Optimal Algorithms for Long-term Web
Prefetching
  • Bin Wu Ajay Kshemkalyani
  • Dept. of Computer Science,
  • Univ. of Illinois at Chicago
  • ajayk_at_cs.uic.edu

2
Outline
  • Problem definition and background
  • Web prefetching algorithms
  • Performance metrics
  • Objective-Greedy algorithms (O(n) time)
  • Hit rate greedy (also hit rate optimal)
  • Bandwidth greedy (also bandwidth optimal)
  • H/B greedy
  • H/B-Optimal algorithm (expected O(n) time)
  • Simulation results
  • Conclusions

3
Introduction
  • Web caching reduces user-perceived latency
  • Client-Server mode
  • Bottleneck occurs at server side
  • Means of improving performance
  • local cache, proxy server, server farm, etc.
  • Cache management LRU, Greedy dual-size, etc.
  • On-demand caching vs. (long-term) prefetching
  • Prefetching is effective in dynamic environments.
  • Clients subscribe to web objects
  • Server pushes fresh copies into web caches
  • Selection of prefetched objects based on
    long-term statistical characteristics, maintained
    by CDS

4
Introduction
  • Web prefetching
  • Caches web objects in advance
  • Updated by web server
  • Reduces retrieval latency and user access time
  • Requires more bandwidth and increases traffic.
  • Performance metrics
  • Hit rate
  • Bandwidth usage
  • Balance of the two

5
Object Selection Criteria
  • Popularity
  • (Access frequency)
  • Lifetime
  • Good Fetch
  • APL

6
Web Object Characteristics
  • Access frequency
  • Zipf-like request model is used in web traffic
    modeling.
  • The relationship between access frequency p and
    popularity rank i of web object

7
Web Object Characteristics
  • The generalized Zipfs-like distribution of web
    requests is calculated as
  • k is a normalization constant, i is the object ID
    (popularity rank), and a is a Zipfs parameter
  • 0.986 (Cunha et al.),
  • 0.75 (Nishikawa et al.) and
  • 0.64 (Breslau et al.)

8
Web Object Characteristics
  • Size of Objects
  • Average object size1015 KB.
  • No strong correlation between object size and its
    access frequency.
  • Lifetime of web objects
  • Average time interval between updates
  • Weak correlation between access frequency and
    lifetime.

9
Caching Architecture
  • Prefetching selection algorithms use as an input
    these global statistics
  • Estimates of object reference frequencies
  • Estimates of object lifetimes
  • Content distribution servers cooporate to
    maintain these statistics
  • When an object is updated in the original server,
    the new version will be sent to any cache that
    has subscribed to it.

10
Solution space for web prefetching
  • Two extreme cases
  • Passive caches (non-prefetching)
  • Least network bandwidth and lowest cache hit rate
  • Prefetching all objects
  • 100 cache hit rate
  • Huge amount of unnecessary bandwidth
  • Existing algorithms use different
    object-selecting criteria and fetch objects
    exceeding the threshold.

11
Steady State Properties
  • Steady state hit rate for object i
  • is defined as freshness factor, f(i)
  • Overall hit rate
  • Especially,
  • (Venkataramani et al.)

12
Steady State Properties
  • Steady state bandwidth for object i
  • Total bandwidth
  • Especially

13
Objective Metrics
  • Hit rate benefit
  • Bandwidth cost
  • H/B model balance of benefit and cost
  • Basic H/B
  • Enhanced H/B
  • (Jiang, et al.)

14
Existing Prefetching Algorithms
  • Popularity Markatos et al.
  • Keep the most popular objects in the system
  • Update these objects immediately when they change
  • Criterion objects popularity
  • Expected to achieve high hit rate
  • Lifetime Jiang et al.
  • Keep objects with longest lifetimes
  • Mostly consider the network resource demands
  • Threshold the expected lifetime of object
  • Expected to minimize bandwidth usage

15
Existing Prefetching Algorithms
  • Good Fetch Venkataramani et al.
  • Computes the probability that an object is
    accessed before it changes.
  • Prefetch objects with high probability of being
    accessed during their average lifetime


  • Prefetch object i if the probability exceeds
    threshold.
  • Objects with higher access frequencies and longer
    update intervals are more likely to be prefetched
  • Balance the benefit (hit rate increase) against
    the cost (bandwidth increase) of keeping an
    object.

16
Existing Prefetching Algorithms
  • APL Jiang et al.
  • Computes apl values of web objects.
  • apl of an object represents expected number of
    accesses during its lifetime
  • Prefetch object i if its apl exceeds threshold.
  • Tends to improve hit rate attempts to balance
    benefit (hit rate) against cost (bandwidth).

17
Existing Prefetching Algorithms
  • Enhanced APL
  • ngt1, prefers objects with higher popularity
    (emphasize hit rate)
  • nlt1, prefers objects with longer lifetime
    (emphasize network bandwidth)

18
Objective-Greedy Algorithms
  • Existing algorithms choose prefetching criteria
    based on intuitions
  • These intuitions are not aimed at any specific
    performance metrics
  • These intuitions consider only individual
    objects characteristics, not the global impact
  • None of them gave optimal performance based on
    any metric
  • Simple counter-examples can be shown

19
Objective-Greedy Algorithms
  • Objective-Greedy algorithms select criteria to
    intentionally improve performance based on
    various metrics.
  • E.g., Hit Rate-Greedy algorithm aims to improve
    the overall hit rate, thus, reduce the latency of
    object requests.

20
H/B-Greedy Prefetching
  • Consider the H/B value of on-demand caching
  • If object j is prefetched, then H/B is updated
    to

21
H/B-Greedy Prefetching
  • We define
  • as the increase factor of object j, incr(j).
  • incr(j) indicates the amount by which H/B can be
    increased if object j is selected.

22
H/B-Greedy Prefetching
  • H/B-Greedy prefetching prefetches those m objects
    with greatest increase factors.
  • The selection is based on the effect on the hit
    rate by prefetching individual objects.
  • H/B-Greedy is still not an optimal algorithm in
    terms of H/B value.

23
(No Transcript)
24
Hit Rate-Greedy Prefetching
  • To maximize the overall hit rate given the number
    of objects to prefetch, m, we select the m
    objects with the greatest hit rate contribution
  • This algorithm is optimal in terms of hit rate.

25
Bandwidth-Greedy Prefetching
  • To minimize the total bandwidth given m, the
    number of objects to prefetch, we select the m
    objects with least bandwidth contribution
  • Bandwidth-Greedy Prefetching is optimal in terms
    of bandwidth consumption.

26
H/B-Optimal Prefetching
  • Optimal algorithm for H/B metric provided by a
    solution to the following selection problem.
  • This is equivalent to maximum weighted average
    problem with pre-selected items.

27
Maximum Weighted Average
  • Maximum Weighted Average Problem
  • Totally n courses, with different credit hours
    and scores
  • select m (m lt n ) courses
  • maximize the GPA of m selected courses
  • Solution
  • If m1
  • Then select course with highest score
  • What if mgt1?
  • A misleading intuition select the m courses with
    highest scores.

28
A Course Selection Problem
Courses A B C D E F G H
Credit hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0
Scores 70 90 95 85 75 60 65 80
  • If m2
  • If we select the 2 courses with highest scores
    C and B.
  • then GPA 93.33
  • But if we select C and D, then GPA 93.57
  • Question how to select m courses such that the
    GPA is maximized?
  • Answer Eppstein Hirschberg solved this

29
With Pre-selected items
Courses A B C D E F G H
Credit hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0
Scores 70 90 95 85 75 60 65 80
  • Maximum Weighted Average with pre-selected items
  • Totally n courses, with different credit hours
    and scores
  • Course A and E (for example) must be selected,
    plus
  • Select additional m (m is given, mltn) courses,
    such that
  • the resulting GPA is maximized

30
  • Pre-selection is not trivial
  • Selection domain BI, no pre-selection, m2
  • optimal subset B,C, GPA 88.33
  • Selection domain BI, A is pre-selected, m2
  • one candidate subset A,D,H, GPA 75.61
  • better than A,B,C, GPA 70.625
  • Conclusion B,C not contained in optimal subset
    for pre- selected problem.

Course A B C D E F G H I
Credit 5.0 1.0 2.0 10.0 1.5 2.5 2.0 3.0 4.0
Score 60 95 85 83 63 71 80 77 65
31
H/B-Optimal v.s. Course selection
  • The problem is formulated as
  • Where v05.0702.075500, and w05.02.07.0,
    in the previous example.
  • Equivalent to H/B-Optimal selection problem

32
H/B-Optimal v.s. Course selection
33
H/B-Optimal algorithm design
  • The selection of m courses is not trivial
  • For course i, we define auxiliary function
  • And for a given number m, we define a Utility
    function

34
H/B-Optimal algorithm
  • Lemma 1
  • Suppose A is the maximum GPA we are computing,
    then for any subset S S and Sm
  • Lemma 1 indicates that the optimal subset
    contains those courses that have the m largest ri
    (A) values

35
H/B-Optimal algorithm design
  • n6, m4
  • Each line is ri (x)
  • Assume we know
  • A
  • Optimal subset
  • has the 4 courses
  • with largest ri (A) values.
  • Dilemma A is unknown

36
H/B-Optimal algorithm design
  • Lemma 2
  • lemma 2 narrows
  • range of A
  • (Xl , Xr) is the current
  • A-range

37
H/B-Optimal algorithm design
  • If F (xl) gt 0 and F (xr) lt 0, then A in (xl, xr)
  • Compute the value of F((xlxr)/2)
  • - if F((xlxr)/2) gt 0, then A gt (xlxr)/2
  • - if F((xlxr)/2) lt 0, then A lt (xlxr)/2
  • - if F((xlxr)/2) 0, then A (xlxr)/2
    (Lemma 2)
  • Narrow down the range of A by half

38
H/B-Optimal algorithm design
  • Why keep on narrowing down the range of A ?
  • If intersection of rj (x) and rk (x) falls out of
    range, then the ordering of rj (x) and rk (x) is
    determined within the range, so is rj (A) and rk
    (A), by comparing their slopes.
  • If the range is narrow enough that there are no
    intersections of r (x) lines within the range
    then the total ordering of all r (A) values is
    determined.
  • Now our optimal problem is solved just select
    the m candidates with highest r (A) values.
  • Main idea to solve this optimal problem.

39
H/B-Optimal algorithm design
  • However, the total ordering requires O(n2) time
    complexity
  • A randomized approach is used instead, this
    randomized algorithm
  • Iteratively reduces the problem domain into a
    smaller one.
  • The algorithm maintains 4 sets X, Y, E, Z,
    initially empty

40
H/B-Optimal algorithm design
  • In each iteration, randomly selects a course i,
    and compare it with each of the other courses,
    k.
  • There are 4 possibilities
  • 1). if rk(A) gt ri(A) insert k into set X
  • 2). if rk(A) lt ri(A) insert k into set Y
  • 3). if wkwi and vkvi insert k into set E
  • 4). if undetermined insert k into set Z
  • Now do the following loop
  • loop
  • narrow the range of A by half
  • compare ri(A) with rk(A) for k in Z
  • if appropriate, move k to X or Y, accordingly
  • until Z is sufficiently small (i.e., Z lt
    S/32)

41
H/B-Optimal algorithm design
  • The sets X or Y have enough members.
  • Next, examine and compare the sizes of X, Y and
    E

42
H/B-Optimal algorithm design
  • 1). If XE gt m
  • At least m courses whose r(A) values are
    greater than r(A) value of all courses in Y. All
    members in Y may be removed. Then S S -
    Y

43
H/B-Optimal algorithm design
  • 2). If YE gt S-m
  • All members in X are among the top m
    courses. All members in X must be in the optimal
    set. Collapse X into a single course (This course
    is included in the final optimal set). Then
  • S S - X 1
  • m m - X 1.

44
H/B-Optimal algorithm design
  • In either case, the resulting domain has reduced
    size.
  • By iteratively removing or collapsing courses,
    the problem domain finally has only one course
    remaining a course formed by collapsing all
    courses in optimal set.
  • Complexity
  • Expected time complexity, briefly (Assume Sb is
    the domain before iteration and Sa after.)
  • 1). Each iteration takes expected time O(Sb)
  • 2). Expected size Sa (207/256) Sb
  • The recurrence relation of the iteration
  • T(n) O(n) T(207/256)n
  • Resolves to linear time complexity.

45
H/B-Greedy v.s. H/B-Optimal
  • H/B-greedy is an approximation to H/B-Optimal
  • H/B-greedy achieves higher H/B metric than
  • any existing algorithms.
  • H/B greedy is more easy to implement than
    H/B-Optimal.

46
Simulation Results
  • Evaluation of H/B Greedy Prefetching
  • Figure 1 H/B,for total object number 1,000.
  • Figure 2 H/B,for total object number 10,000.
  • Figure 3 H/B,for total object number 100,000.
  • Figure 4 H/B,for total object number
    1,000,000.
  • Evaluation of H-Greedy and B-Greedy algorithm
  • Figure 5 H-Greedy algorithm.
  • Figure 6 B-Greedy algorithm.
  • Figure 7 B-Greedy algorithm, zoomed in.

47
Figure 1 H/B, for total object number1,000
48
Figure 2 H/B, for total object number10,000
49
Figure 3 H/B, total object number100,000
50
Figure 4 H/B, total object number1,000,000
51
Figure 5 H-Greedy algorithm
52
Figure 6 B-Greedy algorithm
53
Figure 7 B-Greedy, Bandwidth magnified
54
Performance Comparison

  Table 1. Performance comparison of different
algorithms in terms of various metrics. (Lower
values represents better performance)
55
Conclusions
  • Proposed a family of Objective-Greedy prefetching
    algorithms, that are superior to Popularity, Good
    Fetch, APL, Lifetime
  • Hit rate greedy (this is also optimal)
  • Bandwidth greedy (this is also optimal)
  • H/B greedy
  • All the above are O(n) complexity
  • Proposed an H/B-Optimal algorithm, that is also
    O(n) expected time
  • Experimental evaluation shows significant gains
    over existing algorithms
  • H/B-greedy is almost as good as H/B-optimal

56

Question?
Write a Comment
User Comments (0)
About PowerShow.com