Web%20Caching - PowerPoint PPT Presentation

About This Presentation
Title:

Web%20Caching

Description:

Web Caching Dr. Yingwu Zhu – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 52
Provided by: Ying139
Category:
Tags: 20caching | proxy | squid | web

less

Transcript and Presenter's Notes

Title: Web%20Caching


1
Web Caching
  • Dr. Yingwu Zhu

2
What is Web Caching
  • Introducing proxy servers at certain points in
    the network that serve in caching Web documents
    for faster client access.
  • Comparable to the cache memory in a computer
    system

3
Proxy Cache
clients
servers
Reply
Req.
proxy
Req.
Reply
4
How?
  • Client send requests to the proxy.
  • If the requested document is in its cache, the
    proxy serves the request from its cache.
  • Otherwise, the proxy forward the request to the
    server.
  • Server replies the request through the proxy
    (proxy keep a copy of the requested document).

5
Why Web Caching?
  • Rapid growth in HTTP traffic to form the largest
    part of the Internet traffic which causes more
    network congestion and server unavailability.
  • The number of Web static pages almost doubles
    every year
  • Some old data
  • Number of unique pages 800M lt X lt 2.2B
  • Number of unique web sites 8,500,000
  • static pages 30 - 40
  • pages revisited 80
  • expected hit-rate 24 - 32

6
Why Web Caching?
  • Bandwidth
  • Latency
  • Performance Response Time
  • Server Load
  • Failure Redundancy

7
Expected Gains
  • Bandwidth saving
  • Improving content availability.
  • Improving web server availability.
  • Server load balancing.
  • Reducing user-perceived latency

8
What Content and Protocols
  • HTTP 1.0 Basic protocol
  • Send Request based on fix number of verbs
  • GET
  • HEAD
  • POST
  • Receive response, meta-data, content

9
What Content and Protocols
  • HTTP Request
  • Request Simple-Request Full-Request
  • Simple-Request "GET" SP Request-URI CRLF
  • Full-Request Request-Line
  • ( General-Header
  • Request-Header
  • Entity-Header )
  • CRLF
  • Entity-Body

10
What Content and Protocols
  • Example
  • GET /pub/www/index.html HTTP/1.0
  • Response
  • HTTP/1.1 200 OK
  • Server Microsoft-IIS/5.0
  • Date Sat, 19 Oct 2002 054653 GMT
  • Expires Sun, 20 Oct 2002 160000 GMT
  • Content-Length 2291
  • Content-Type text/html
  • Cache-control private

11
What Content and Protocols
  • Example if-modified-since
  • GET /pub/www/index.html HTTP/1.0
  • If-Modified-Since Sat, 19 Oct 2002 194331 GMT
  • Response
  • HTTP/1.1 200 OK
  • Server Microsoft-IIS/5.0
  • Date Thu, 13 Jul 2000 054653 GMT
  • Expires Sun, 20 Oct 2002 160000 GMT
  • Content-Length 2291
  • Content-Type text/html
  • Cache-control private

12
What Content and Protocols
  • Example if-modified-since
  • GET /pub/www/index.html HTTP/1.0
  • If-Modified-Since Sat, 19 Oct 2002 194331 GMT
  • Response
  • HTTP/1.1 304 Not Modified

13
HTTP support for caching
  • Conditional requests (IMS)
  • Servers can set expires and max-age
  • Request indirection application level routing
  • Range requests, entity tag
  • Cache-control header
  • Requests min-fresh, max-stale, no-transform
  • Responses must-revalidate, public, private,
    no-cache

14
Where
Local ISP
Content Server
Reverse Proxy
cache
cdn
L4 Switch
Data Center ISP
Intranet
cache
Browser
cache
Browser
cache
Browser
cdn
cache
15
Cache Types
  • Proxy Caching
  • Reverse Proxy Caching
  • Transparent Caching
  • Adaptive Caching
  • Push Caching
  • Active Caching

16
Proxy Caching
  • Harvest/Squid
  • Provide web content for a fixed user base
  • Deployed at the network edges (company or
    institutional gateway or firewall hosts)
  • Standalone operation
  • Manual configuration in web browsers
  • Commodity product/technology
  • Single point of failures

17
Reverse Proxy Caching
  • Designed to offload duties from one or more
    specific servers
  • Data size is limited to size of static content on
    the server
  • Challenge is fast, disk-less operation
  • Cache consistency is easy

18
Transparent Caching
  • Intercept HTTP requests and redirect them to web
    cache servers or cache clusters
  • No client configuration
  • Violates end-to-end paradigm
  • Client thinks it is talking directly to server
  • Server thinks it is talking to cache
  • Implemented as L4-switch
  • Layer 4 switch makes switching decisions based on
    TCP or UDP port number, i.e., 80

19
Transparent Caching
20
Adaptive Caching
  • ISP Level caching, global data placement
    optimization
  • Cooperating multiple distributed caches
  • Operate as a cache-mesh based on content demand
  • Cache Group Management Protocol
  • How meshes are formed
  • How individual caches join/leave the meshes
  • Content Routing Protocol sends request to the
    appropriate cache within the meshes
  • Uses distributed cache meshes to solve the hot
    spot problem
  • Caches dynamically join and leave the groups
    based on content demand
  • Administrative boundaries must be relaxed

21
Push Caching
  • Keep data close to those clients requesting this
    information
  • Send the data out proactively
  • Assumption we are able launch caches that may
    cross administrative boundaries
  • Incurs cost (storage and transmission)

22
Active Caching
  • Applies caching to dynamic documents
  • 30 of client HTTP requests contains cookies
  • The servers provides the cache with the objects
    and any associated cache applets
  • Use an applet inside of the cache to customize
    dynamic pages on the fly

23
Cache Placement/Deployment
  • Close to clients/content consumers
  • Proxy caching
  • Transparent proxy caching
  • Close to servers/content providers
  • Improve access to logical sets of data
  • Delay-sensitive data video, audio
  • Reverse proxy caching
  • Push caching
  • Network choke points strategic deployment
  • Adaptive caching
  • Problem with administrative control

24
Zipf Law vs. Web Access
  • Zipf Law
  • Web Access
  • Caching?

25
Zipfs Law
  • Zipfs law The frequency of an event P as a
    function of rank i is a power law function
  • Pi ? / ia where a 1

26
Zipfs Law
  • Observed to be true for
  • Frequency of written words in English texts
  • Population of cities
  • Income of a company as a function of rank

27
Zipfs Law vs. Web Access
  • For a given server, page access by rank follows
    Zipfs law
  • Web requests from a fixed population of users
    follows Zipfs law 0.64 lt a lt 0.83

28
Observations
  • Top 1 of all documents account for 20 - 35 of
    proxy requests
  • Top 10 account for 45 - 55 of requests
  • It takes 25 to 40 of all documents to account
    for 70 of requests
  • It takes 70 to 80 of all documents to account
    for 90 of requests

29
Zipfs Law and Caching
  • Discussion
  • How does this help in cache design?

30
Basic caching algorithm
  • Pages may be
  • Fresh up-to-date
  • Expired current date gt expiration date
  • Stale old

31
Basic caching algorithm - 2
  • If (page is in the cache)
  • if ( page is expired or stale )
  • Get from server - if-modified-since
  • If not modified, Get from cache
  • Get from Server
  • Else
  • Get from Server

Soft Miss
32
Basic caching algorithm - 3
  • If cache has space
  • Store the file
  • Else
  • Delete expired from cache
  • Delete stale from cache
  • Delete LRU from cache
  • Delete largest/smallest from cache?

33
Cache Replacement
  • Cache size is limited, need replacement policy
  • LRU
  • LFU
  • Greedy-dual size
  • Many others

34
Cache Consistency
  • Multiple copies of objects created
  • How and when renewing the copies?
  • Goals
  • Avoid stale copies
  • Keep non useful traffic as low as possible

35
Cache Consistency Polling
  • Solution 1 polling every time

implemented in HTTP using the optional
if-modified-since" request header field Benefit
strong consistency Drawback very slow cache hit
36
Cache Consistency Polling
  • Solution 2 polling if TTL expires, widely used
  • Associate a TTL (12 hours or 2 days) with each
    cached object

implemented in HTTP using the optional "expires"
header field Benefit fast cache hit Drawback
weak cache consistency (5 stale) due to TTL is
an a priori estimate of an object's life time
37
Cache Consistency
  • Solution 3 Invalidation Protocols
  • The server helps the proxy in maintaining
    consistency
  • Invalidation protocols
  • When the proxy makes a request,
  • Piggyback cache validation (PCV) the proxy
    provides some other potentially stale copies for
    server validating
  • Piggyback cache invalidation (PCI) the server
    provides some copies which have been updated
    since last access
  • Use of volumes
  • Volume lease
  • The client receive a lease from the server
  • During the lease validity the client can retreive
    copies from proxy
  • When the lease expire the client has to renew it
  • Problems scalability, servers needs keep cache
    states

38
Cache Cooperation
  • Hierarchical caching
  • Cache servers form a hierarchy, tree-like
    structures
  • Parent servers top of the hierarchy, receive
    requests from child servers. If they do not have
    the requested objects, either ask their parents
    or original web servers
  • Sibling servers if the local cache does not have
    the requested object, then ask its sibling
    caches. If the sibling caches do not have the
    object, then the local cache asks the parent cache

39
(No Transcript)
40
Cache Hierarchies
  • Use hierarchy to scale a proxy
  • Why?
  • Larger population higher hit rate (less
    compulsory misses)
  • Larger effective cache size
  • Why is population for single proxy limited?
  • Performance, administration, policy, etc.
  • NLANR cache hierarchy
  • Most popular
  • 9 top level caches
  • Internet Cache Protocol based (ICP)
  • Squid/Harvest proxy
  • How to locate content?

41
ICP (Internet cache protocol)
  • Simple protocol to query another cache for
    content
  • Uses UDP why?
  • ICP message contents
  • Type query, hit, hit_obj, miss
  • Other identifier, URL, version, sender address
  • Special message types used with UDP echo port
  • Used to probe server or dumb cache
  • Query and then wait till time-out (2 sec)
  • Transfers between caches still done using HTTP

42
Squid
Parent
ICP Query
ICP Query
Child
Child
Child
Web page request
  • Client

43
Squid
Parent
ICP MISS
ICP MISS
Child
Child
Child
  • Client

44
Squid
Parent
Web page request
Child
Child
Child
  • Client

45
Squid
Parent
ICP Query
ICP Query
ICP Query
Child
Child
Child
Web page request
  • Client

46
Squid
Parent
ICP HIT
ICP MISS
ICP HIT
Child
Child
Child
Web page request
  • Client

47
Squid
Parent
Web page request
Child
Child
Child
  • Client

48
Hierarchical caching
  • Ideally, want the cache mesh to behave as a
    single cache with equivalent capacity and
    processing capability
  • ICP many copies of popular objects created
    capacity wasted
  • High Latency More than one hop needed for
    searching object
  • How to improve? ? Discuss!

49
Problems with caching
  • Over 50 of all HTTP objects are uncacheable.
  • Sources
  • Dynamic data ? stock prices, frequently updated
    content
  • CGI scripts ? results based on passed parameters
  • SSL ? encrypted data is not cacheable
  • Most web clients dont handle mixed pages well
    ?many generic objects transferred with SSL
  • Cookies ? results may be based on passed data
  • Hit metering ? owner wants to measure of hits
    for revenue, etc, so, cache busting

50
Risks of Using Proxy
  • Benefits reduce latency, bandwidth saving, etc.
  • Risks
  • Obsolete data
  • Violate client privacy the proxy can keep a log
    file telling which objects the client has
    requested
  • Data integrity

51
Real Proxy Servers
  • Squid The most widely used. The better working
    and the free one.
  • http//www.squid-cache.org/
  • Microsoft ISA Server 2004 Microsoft developed
    ISA to replace Microsoft proxy server. Its fully
    functional with Active Directory
  • http//www.microsoft.com/isaserver/
  • Apache Apache web server has a module to do
    reverse caching (experimental)
  • http//httpd.apache.org/docs-2.0/mod/mod_cach
    e.html
  • Cisco Cache Engine sits next to (mostly) Cisco
    routers and receives transparently redirected
    HTTP requests http//www.cisco.com/warp/public/cc/
    pd/cxsr/500/index.shtml
  • CERN/W3C HTTPd It was the original proxy server.
    http//www.w3.org/hypertext/WWW/Daemon/Status.html
Write a Comment
User Comments (0)
About PowerShow.com