Title: A Survey of Web Caching Schemes for the Internet
1A Survey of Web Caching Schemes for the Internet
2Agenda
- The World Wide Web
- Problem and solution (caching)
- Proxy servers
- Advantages of web caching
- Disadvantages of web caching
- Elements of A WWW caching system
- Desirable properties of WWW caching system
- Problems in designing caching systems for the WWW
- Caching architecture
3The World Wide Web
- The WWW can be considered as a large distributed
information system. - Exponential growth in size.
- On may 1999 included 600 millions of static web
pages. - Increases 15 per month.
- Very popular.
4SIZE OF DISTINCT STATIC WEB PAGES
5The World Wide Web
- Usage is relatively inexpensive
- Accessing information is very fast
- Documents appeal to a wide range of interests
- But..
6The World Wide Web
- Network congestion
- Server overloading
7Problem
- Internet backbone capacity increases 60 per
year. - Bandwidth is not growing fast enough.
- Without solution WWW will become too congested
and its entire appeal would be lost.
8Solution
- Caching
- Placing popular objects at locations close to
the clients.
9proxy servers
- HTTP servers handled by companies for security
reasons. - The bottleneck of the connection between the
client and the internet. - Shared by all clients inside the firewall.
10(No Transcript)
11proxy servers
- Belonging to same organization, clients share
common interests. - They probably access the same set of documents.
12thus
- On the proxy server, a previously requested and
cached documents would likely result in future
hits.
13proxy severs
- Caching most popular web pages on the proxy
server can - Save network bandwidth
- Lower access latency for the client
14Advantages of web caching
- Reduces bandwidth consumption
- Decreases network traffic
- Lessens network congestion
- Access latency
- frequently used docs are cached nearby
- less traffic ? shorter delay for docs not cached
15Advantages of web caching (cont.)
- Reduces workload of remote server
- Data can be accessed when remote server is down
(enhanced robustness). - Allows analysis of organization usage patterns
- cooperation between caches increases efficiency.
16Disadvantages of web caching
- Data not updated automatically
- Cache miss can cause increase in latency (extra
proxy processing). - Bottleneck effect limit of clients per proxy.
- A single proxy is a single point of failure
- Information providers can not monitor of
visits per site.
17Elements of A WWW caching system
- Documents can be cached at the clients, the
proxies and the servers.
18Elements of a WWW caching system
19Desirable properties of WWW caching system
- fast access
- robustness
- transparency
- scalability
- efficiency
- adaptivity
- stability
- load balance
- ability to deal with heterogeneity
- simplicity
20Fast access
- Reduce web access latency to a minimum.
- Especially comparing to other servers not using
caching techniques.
21Robustness
- Robustness Availability to user
- eliminate single point failure
- in case of failure fall down gracefully
- easy to recover from failure
22Transparency
- Transparent to the user
- The user should only notice
- Faster response
- Higher availability
23Scalability
- Scale well along the increasing size and density
of the network. - All protocols should be as lightweight as
possible.
24Efficiency
- impose minimal additional burden on the network
(in control data packets) - do not adopt any scheme which leads to
under-utilization of the network
25Adaptivity
- adapt to dynamic changing in the user demand and
network environment - achieve optimal performance
26Stability
- Do not introduce instabilities into the network
27Load balancing
- distribute load evenly through the entire network
- no bottlenecks / hot-spots
28Ability to deal with heterogeneity
- Adapt to a range of network architecture
(hardware software)
29Simplicity
- Mechanism simple to deploy
- simpler schemes are easier to implement and
likely to be accepted as international standards
30- What Problems do we face in designing caching
systems for the WWW ???
31Problems in designing caching systems for the WWW
- Caching system architecture
- how cache proxies are organized hierarchically,
distributed or hybrid.
32Problems in designing caching systems for the WWW
- Proxy placement
- were to place a cache proxy in order to optimize
performance
33Problems in designing caching systems for the WWW
- Caching contents
- What can be cached in the caching system
34Problems in designing caching systems for the WWW
- Proxy cooperation
- How do proxies cooperate with each other
35Problems in designing caching systems for the WWW
- Data sharing
- what kind of data/information can be shared among
among cooperative proxies
36Problems in designing caching systems for the WWW
- Cache resolution/routing
- how does a proxy decide where to fetch a page
requested by a client.
37Problems in designing caching systems for the WWW
- Prefetching
- How does a proxy decide what and when to prefetch
from webservers or other proxies to reduce access
latency.
38Problems in designing caching systems for the WWW
- Cache placement/ replacement
- how the proxy decides which page to be stored in
its cache and which page to be removed from it.
39Problems in designing caching systems for the WWW
- Cache coherency
- how does a proxy maintain data consistency
40Problems in designing caching systems for the WWW
- Control information distribution
- how is the control information (e.g URL)
distributed among proxies.
41Problems in designing caching systems for the WWW
- Dynamic data caching
- how to deal with data that is not cachable
42Caching architecture
- Hierarchical
- Caches are placed at multiple levels of the
network.
national
regional
institutional
bottom
43Hierarchicalarchitecture
- Bottom clients/browsers caches.
web page not found
national
regional
web page not found
institutional
web page not found
bottom
44Hierarchicalarchitecture
forward page, leave copy
national
regional
forward page, leave copy
institutional
forward page, leave copy
bottom
45Hierarchical architecture
- Advantages
- Bandwidth efficient especially when cache
servers are slow. - Allows to efficiently diffuse popular web pages
towards the demand.
46Hierarchical architecture
- Disadvantages
- Cache server needs to be placed at key access
points of the network ? requires coordination
among caches. - Each level adds a delay.
- High levels are bottlenecks.
- multiple copies at different cache levels.
47Distributed architecture
- Caches at the bottom level only.
- No other intermediate caching levels.
- Each cache server contains meta-data on the data
stored on other servers. - Hierarchy used only for distributing information
about location of the copy. - No copying of actual documents.
48Distributed architecture
- Advantages
- Traffic flows through low network levels which
are less congested. - No additional disk space required for
intermediate network levels. - Better load sharing.
- More fault tolerant.
49Distributed architecture
- Disadvantages
- High connection times
- Higher bandwidth usage
- Administrative issues.
50Distributed architecture
- Examples
- ICP Internet Cache Protocol (Harvest group)
- Retrieve data from neighboring caches parent
caches - CARP Cache Array Routing Protocol
- URL space divided to an array of caches.
- Each cache stores only documents whose URL are
hashed to it.
51Hybrid architecture
- Caches may cooperate with other caches at the
same level or at a higher level using distributed
caching. - ICP is an example
- the document is fetched from a parent/neighbor
cache that has the lowest RTT.
52Performance of architectures
- Hierarchical caching has shorter connection times
than distributed caching. - Additional copies at intermediate level reduces
retrieval latency for small documents. - Distributed caching has shorter transmission
times higher bandwidth usage. - Well configured hybrid scheme can reduce both
connection time and transmission time.