Title: Web Caching
1Web Caching
2Why Caching?
- Faster browsing experience for users
- Cache hit rate
- Traffic Prioritization
- Reduce network bandwidth requirements
significantly - Live media stream splitting
- Control who goes where and who does what and
when they can do it - Audit Employee Use of Corporate Assets
- Increase Performance, Increase Security, Improve
Productivity, and Reduce Costs!
3Take A Look At This Page.
4Web Caching
Internet
Server
Lisas Desktop
5The Cache Dilemma
?
Hit Rate
Freshness
6Why Hit Rate is Important
- Better cache hit-rate means
- Higher effective bandwidth
- Lower avg. latency
- Improve hit-rate with
- Locality of access
- More users
7What Content and Protocols
- HTTP 1.0 Basic protocol
- Send Request based on fix number of verbs
- GET
- HEAD
- POST
- Receive response, meta-data, content
8What Content and Protocols
- Example
- GET /pub/www/index.html HTTP/1.0
- Response
- HTTP/1.1 200 OK
- Server Microsoft-IIS/5.0
- Date Sat, 19 Oct 2002 054653 GMT
- Expires Sun, 20 Oct 2002 160000 GMT
- Content-Length 2291
- Content-Type text/html
- Cache-control private
9What Content and Protocols
- Example if-modified-since
- GET /pub/www/index.html HTTP/1.0
- If-Modified-Since Sat, 19 Oct 2002 194331 GMT
- Response
- HTTP/1.1 200 OK
- Server Microsoft-IIS/5.0
- Date Thu, 13 Jul 2000 054653 GMT
- Expires Sun, 20 Oct 2002 160000 GMT
- Content-Length 2291
- Content-Type text/html
- Cache-control private
10What Content and Protocols
- Example if-modified-since
- GET /pub/www/index.html HTTP/1.0
- If-Modified-Since Sat, 19 Oct 2002 194331 GMT
- Response
- HTTP/1.1 304 Not Modified
11Basic caching algorithm
- Pages may be
- Fresh up-to-date
- Expired current date gt expiration date
- Stale old
12Basic caching algorithm - 2
- If (page is in the cache)
- if ( page is expired or stale )
- Get from server - if-modified-since
- If not modified, Get from cache
- else Get from Server
- else
- Get from cache
- Else
- Get from Server
13Basic caching algorithm - 3
- If cache has space
- Store the file
- Else
- Delete expired from cache
- Delete stale from cache
- Delete LRU from cache
- Delete largest/smallest from cache?
- Store the file
14Proxy Details
GET / HTTP/1.1 Host localhost1235 User-Agent
Mozilla/5.0 (Macintosh U Intel Mac OS X en-US
rv1.8.0.7) Accept image/png,/q0.5 Accept-Lan
guage en-us,enq0.5 Accept-Encoding
gzip,deflate Accept-Charset ISO-8859-1,utf-8q0.
7,q0.7 Keep-Alive 300 Connection keep-alive
15Proxy Details
GET http//star.cs.byu.edu/CS360 HTTP/1.1 Host
star.cs.byu.edu User-Agent Mozilla/5.0
(Macintosh U Intel Mac OS X en-US
rv1.8.0.7) Accept text/xml,application/xml,text/
plainq0.8,image/png,/q0.5 Accept-Language
en-us,enq0.5 Accept-Encoding
gzip,deflate Accept-Charset ISO-8859-1,utf-8q0.
7,q0.7 Keep-Alive 300 Proxy-Connection
keep-alive
16Zipfs Law
- In a corpus of natural language utterances, the
frequency of any word is roughly inversely
proportional to its rank in the frequency table. - The most frequent word will occur approximately
twice as often as the second most frequent word,
etc. - Example In the Brown Corpus, the is the most
frequently occurring word and accounts for nearly
7 of all word occurrences. (69971 of slightly
over 1 million) - 2nd place of - slightly over 3.5 (36411)
- 3rd place and (28852)
- Only 135 words are needed to account for half of
the Brown Corpus
17Zipfs law
- Zipfs law The frequency of an event P as a
function of rank i is a power law function - Pi ? / ia where a 1
18Zipfs law
- Observed to be true for
- Frequency of written words in English texts
- Population of cities
- Income of a company as a function of rank
19Zipfs law and web access
- For a given server, page access by rank follows
Zipfs law - Web requests from a fixed population of users
follows Zipfs law 0.64 lt a lt 0.83
20Observations
- Top 1 of all documents account for 20 - 35 of
proxy requests - Top 10 account for 45 - 55 of requests
- It takes 25 to 40 of all documents to account
for 70 of requests - It takes 70 to 80 of all documents to account
for 90 of requests
21Observations
22Observations
- For an infinite sized cache, the hit-ratio for a
web-proxy grows in a log-like fashion as a
function of the client population of the proxy
and the number of requests seen by the proxy.
23Local URL Resolution Protocol
- Peer-to-Peer web-cache
- Bootstrapping Peer Discovery
- UDP broadcast
- Content Location
- UDP broadcast for content
- Content Delivery
- Direct Download from single peer
24(No Transcript)