A Survey of Web Caching Schemes for the Internet PowerPoint PPT Presentation

presentation player overlay
1 / 52
About This Presentation
Transcript and Presenter's Notes

Title: A Survey of Web Caching Schemes for the Internet


1
A Survey of Web Caching Schemes for the Internet
  • Jia Wang

2
Agenda
  • The World Wide Web
  • Problem and solution (caching)
  • Proxy servers
  • Advantages of web caching
  • Disadvantages of web caching
  • Elements of A WWW caching system
  • Desirable properties of WWW caching system
  • Problems in designing caching systems for the WWW
  • Caching architecture

3
The World Wide Web
  • The WWW can be considered as a large distributed
    information system.
  • Exponential growth in size.
  • On may 1999 included 600 millions of static web
    pages.
  • Increases 15 per month.
  • Very popular.

4
SIZE OF DISTINCT STATIC WEB PAGES
5
The World Wide Web
  • Usage is relatively inexpensive
  • Accessing information is very fast
  • Documents appeal to a wide range of interests
  • But..

6
The World Wide Web
  • Network congestion
  • Server overloading

7
Problem
  • Internet backbone capacity increases 60 per
    year.
  • Bandwidth is not growing fast enough.
  • Without solution WWW will become too congested
    and its entire appeal would be lost.

8
Solution
  • Caching
  • Placing popular objects at locations close to
    the clients.

9
proxy servers
  • HTTP servers handled by companies for security
    reasons.
  • The bottleneck of the connection between the
    client and the internet.
  • Shared by all clients inside the firewall.

10
(No Transcript)
11
proxy servers
  • Belonging to same organization, clients share
    common interests.
  • They probably access the same set of documents.

12
thus
  • On the proxy server, a previously requested and
    cached documents would likely result in future
    hits.

13
proxy severs
  • Caching most popular web pages on the proxy
    server can
  • Save network bandwidth
  • Lower access latency for the client

14
Advantages of web caching
  • Reduces bandwidth consumption
  • Decreases network traffic
  • Lessens network congestion
  • Access latency
  • frequently used docs are cached nearby
  • less traffic ? shorter delay for docs not cached

15
Advantages of web caching (cont.)
  • Reduces workload of remote server
  • Data can be accessed when remote server is down
    (enhanced robustness).
  • Allows analysis of organization usage patterns
  • cooperation between caches increases efficiency.

16
Disadvantages of web caching
  • Data not updated automatically
  • Cache miss can cause increase in latency (extra
    proxy processing).
  • Bottleneck effect limit of clients per proxy.
  • A single proxy is a single point of failure
  • Information providers can not monitor of
    visits per site.

17
Elements of A WWW caching system
  • Documents can be cached at the clients, the
    proxies and the servers.

18
Elements of a WWW caching system
19
Desirable properties of WWW caching system
  • fast access
  • robustness
  • transparency
  • scalability
  • efficiency
  • adaptivity
  • stability
  • load balance
  • ability to deal with heterogeneity
  • simplicity

20
Fast access
  • Reduce web access latency to a minimum.
  • Especially comparing to other servers not using
    caching techniques.

21
Robustness
  • Robustness Availability to user
  • eliminate single point failure
  • in case of failure fall down gracefully
  • easy to recover from failure

22
Transparency
  • Transparent to the user
  • The user should only notice
  • Faster response
  • Higher availability

23
Scalability
  • Scale well along the increasing size and density
    of the network.
  • All protocols should be as lightweight as
    possible.

24
Efficiency
  • impose minimal additional burden on the network
    (in control data packets)
  • do not adopt any scheme which leads to
    under-utilization of the network

25
Adaptivity
  • adapt to dynamic changing in the user demand and
    network environment
  • achieve optimal performance

26
Stability
  • Do not introduce instabilities into the network

27
Load balancing
  • distribute load evenly through the entire network
  • no bottlenecks / hot-spots

28
Ability to deal with heterogeneity
  • Adapt to a range of network architecture
    (hardware software)

29
Simplicity
  • Mechanism simple to deploy
  • simpler schemes are easier to implement and
    likely to be accepted as international standards

30
  • What Problems do we face in designing caching
    systems for the WWW ???

31
Problems in designing caching systems for the WWW
  • Caching system architecture
  • how cache proxies are organized hierarchically,
    distributed or hybrid.

32
Problems in designing caching systems for the WWW
  • Proxy placement
  • were to place a cache proxy in order to optimize
    performance

33
Problems in designing caching systems for the WWW
  • Caching contents
  • What can be cached in the caching system

34
Problems in designing caching systems for the WWW
  • Proxy cooperation
  • How do proxies cooperate with each other

35
Problems in designing caching systems for the WWW
  • Data sharing
  • what kind of data/information can be shared among
    among cooperative proxies

36
Problems in designing caching systems for the WWW
  • Cache resolution/routing
  • how does a proxy decide where to fetch a page
    requested by a client.

37
Problems in designing caching systems for the WWW
  • Prefetching
  • How does a proxy decide what and when to prefetch
    from webservers or other proxies to reduce access
    latency.

38
Problems in designing caching systems for the WWW
  • Cache placement/ replacement
  • how the proxy decides which page to be stored in
    its cache and which page to be removed from it.

39
Problems in designing caching systems for the WWW
  • Cache coherency
  • how does a proxy maintain data consistency

40
Problems in designing caching systems for the WWW
  • Control information distribution
  • how is the control information (e.g URL)
    distributed among proxies.

41
Problems in designing caching systems for the WWW
  • Dynamic data caching
  • how to deal with data that is not cachable

42
Caching architecture
  • Hierarchical
  • Caches are placed at multiple levels of the
    network.

national
regional
institutional
bottom
43
Hierarchicalarchitecture
  • Bottom clients/browsers caches.

web page not found
national
regional
web page not found
institutional
web page not found
bottom
44
Hierarchicalarchitecture
  • after web page is found

forward page, leave copy
national
regional
forward page, leave copy
institutional
forward page, leave copy
bottom
45
Hierarchical architecture
  • Advantages
  • Bandwidth efficient especially when cache
    servers are slow.
  • Allows to efficiently diffuse popular web pages
    towards the demand.

46
Hierarchical architecture
  • Disadvantages
  • Cache server needs to be placed at key access
    points of the network ? requires coordination
    among caches.
  • Each level adds a delay.
  • High levels are bottlenecks.
  • multiple copies at different cache levels.

47
Distributed architecture
  • Caches at the bottom level only.
  • No other intermediate caching levels.
  • Each cache server contains meta-data on the data
    stored on other servers.
  • Hierarchy used only for distributing information
    about location of the copy.
  • No copying of actual documents.

48
Distributed architecture
  • Advantages
  • Traffic flows through low network levels which
    are less congested.
  • No additional disk space required for
    intermediate network levels.
  • Better load sharing.
  • More fault tolerant.

49
Distributed architecture
  • Disadvantages
  • High connection times
  • Higher bandwidth usage
  • Administrative issues.

50
Distributed architecture
  • Examples
  • ICP Internet Cache Protocol (Harvest group)
  • Retrieve data from neighboring caches parent
    caches
  • CARP Cache Array Routing Protocol
  • URL space divided to an array of caches.
  • Each cache stores only documents whose URL are
    hashed to it.

51
Hybrid architecture
  • Caches may cooperate with other caches at the
    same level or at a higher level using distributed
    caching.
  • ICP is an example
  • the document is fetched from a parent/neighbor
    cache that has the lowest RTT.

52
Performance of architectures
  • Hierarchical caching has shorter connection times
    than distributed caching.
  • Additional copies at intermediate level reduces
    retrieval latency for small documents.
  • Distributed caching has shorter transmission
    times higher bandwidth usage.
  • Well configured hybrid scheme can reduce both
    connection time and transmission time.
Write a Comment
User Comments (0)
About PowerShow.com