Title: HTTP Caching
1HTTP Caching Cache-Bustingfor Content
Publishers
- Michael J. Radwin
- OReilly Open Source Convention
- July 28, 2004
2Publishers must think about caching
- Publishers have a lot of web content
- HTML, images, Flash, movies
- Speed is important part of user experience
- Bandwidth is expensive
- Use what you need, but avoid unnecessary extra
- Personalization differentiates
- Show timely data (stock quotes, news stories)
- Get accurate advertising statistics
- Protect sensitive info (e-mail, account balances)
3HTTP Review
(1) Client connects to www.example.com port 80
Server
Client
Internet
(2) Client sends HTTP GET request
Server
Client
Internet
4HTTP Review (contd)
(3) Client reads HTTP response from server
Server
Client
Internet
(4) Client and Server close connection
Server
Client
Internet
5HTTP Example
- mradwin_at_machshav telnet www.example.com 80
- Trying 192.168.37.203...
- Connected to w6.example.com.
- Escape character is ''.
- GET /foo/index.html HTTP/1.1
- Host www.example.com
- HTTP/1.1 200 OK
- Date Wed, 28 Jul 2004 233612 GMT
- Last-Modified Fri, 23 Jul 2004 015237 GMT
- Content-Length 3688
- Connection close
- Content-Type text/html
- lthtmlgtltheadgt
- lttitlegtHello Worldlt/titlegt
- ...
6Browsers use private caches
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul 2004
015237 GMT Content-Length 3688 Content-Type
text/html
Client stores copy of http//www.example.com/foo/i
ndex.html on its hard disk with timestamp.
7Revalidation (Conditional GET)
GET /foo/index.html HTTP/1.1 Host
www.example.com If-Modified-Since Fri, 23 Jul
2004 015237 GMT
Server
Client
Internet
HTTP/1.1 304 Not Modified
8Non-Caching Proxy
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
9Proxy Cache Miss
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
10Proxy Cache Hit
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
11Proxy Cache Revalidation Hit
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host
www.example.com If-Modified-Since Fri, 23 Jul ...
Server
Client
Proxy
Internet
HTTP/1.1 304 Not Modified
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
12Assumptions about content types
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
13Top 5 techniques for publishers
- Use Cache-Control private for personalized
content - Implement Images Never Expire policy
- Use a cookie-free TLD for static content
- Use Apache defaults for CSS JavaScript
- Use random strings in URL for accurate hit
metering or very sensitive content
141. Use Cache-Control privatefor personalized
content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
15Bad caching of personalized content
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
Webmail Server
Proxy
Client 1
Internet
Janes e-mail message
Janes e-mail message
16Bad caching of personalized content
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
Webmail Server
Proxy
Client 1
Internet
Janes e-mail message
msg3.html
17Bad caching of personalized content
Webmail Server
Proxy
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie usermary
Internet
Client 2
msg3.html
Janes e-mail message
18Whats cacheable?
- HTTP/1.1 allows caching anything by default
- Unless explicit Cache-Control header
- In practice, most caches avoid anything with
- Cache-Control/Pragma header
- Cookie/Set-Cookie headers
- WWW-Authenticate/Authorization header
- POST/PUT method
- 302/307 status code
19Cache-Control private
- Shared caches bad for shared content
- Mary shouldnt be able to read Janes webmail
- Private caches perfectly OK
- Speed up web browsing experience
- Avoid personalization leakage with single line in
httpd.conf or .htaccess - Header set Cache-Control private
202. Images Never Expire policy
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
21The Images Never Expire Policy
- Encourage caching of icons logos
- Forever 10 years in Internet biz
- Must change URL when you change image
- http//us.yimg.com/i/new.gif
- http//us.yimg.com/i/new2.gif
- Tradeoff
- More difficult for designers
- Bandwidth savings, faster user experience
22Images Never Expire (mod_expires)
- Works with both HTTP/1.0 and HTTP/1.1
- ExpiresActive On
- ExpiresByType image/gif A315360000
- ExpiresByType image/jpeg A315360000
- ExpiresByType image/png A315360000
23Images Never Expire (mod_headers)
- Works with HTTP/1.1 only
- ltFilesMatch "\.(gifjpe?gpng)"gt
- Header set Cache-Control \ "max-age315360000"
- lt/FilesMatchgt
- Works with both HTTP/1.0 and HTTP/1.1
- ltFilesMatch "\.(gifjpe?gpng)"gt
- Header set Expires \ "Mon, 28 Jul 2014
233000 GMT" - lt/FilesMatchgt
24mod_images_never_expire
- / Enforce policy with module that runs at URI
translation hook / - static int translate_imgexpire(request_rec r)
- const char ext
- if ((ext strrchr(r-gturi, '.')) ! NULL)
- if (strcasecmp(ext, ".gif") 0
strcasecmp(ext, ".jpg") 0 - strcasecmp(ext, ".png") 0
strcasecmp(ext, ".jpeg") 0) - if (ap_table_get(r-gtheaders_in,
"If-Modified-Since") ! NULL - ap_table_get(r-gtheaders_in,
"If-None-Match") ! NULL) - / Don't bother checking filesystem, just
hand back a 304 / - return HTTP_NOT_MODIFIED
-
-
-
- return DECLINED
253. Cookie-free TLD for static content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
26Cookie-free TLD for static content
- For maximum efficiency use two domains
- www.example.com for HTML
- static.example.net for images
- Many proxies wont cache Cookie reqs
- But multimedia is never personalized
- Cookies would ignored by server anyways
27Typical GET request w/Cookies
- GET /i/foo/bar/quux.gif HTTP/1.1
- Host www.example.com
- User-Agent Mozilla/5.0 (Windows U Windows NT
5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8 - Accept application/x-shockwave-flash,text/xml,app
lication/xml,application/xhtmlxml,text/htmlq0.9
,text/plainq0.8,video/x-mng,image/png,image/jpeg
,image/gifq0.2,/q0.1 - Cookie UmtvtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEc
A--uxIIr.ABun42vnticvufc8v brandflash1
Bamfco1503sgp8b2 FaNC184LcsvfX96G.JR27qSjCHu
7bII3s. tXa44psMLliFtVoJB_m5wecWY_.7bK1It
LYCl_v2l_lv7l_lh03m8d50c8bo
l_s3yu2qxz5zvwquwwuzv22wrwr5t3w1zsrl_lid14rsb7
6l_ra8l_um1_0_1_0_0 GTSessionID8359908990238
3599089902340645635 Yv1n6eecgejj7012f
lh03m8d50c8bo/opm012o33013000007jb1647ra
8lgusintlusnp1 PROMOSOURCEfp5 YGCVd
TziTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ-a
YAEskDAAwRz5HlDUN2Tdc2wBT0RBekFURXdPRFV3TWpFek
5ETS0BYQFZQUUBb2sBWlcwLQF0aXABWUhaTVBBAXp6AWlUdS5B
QmdXQQ--afQUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE
FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ft
c1ZFZUhBLS0- LYSl_fh0l_vomyla
PAp0dg13DX4Ndgk-p16L5qmg--exMv.AB
YP.usv2maddrd1525SRobertsonBlvd01LosAng
eles01CA0190035-42310144800134.05159001-118.3
8434201901a0190035 - Referer http//www.example.com/foo/bar.php?abc12
3def456 - Accept-Language en-us,enq0.7,heq0.3
- Accept-Encoding gzip,deflate
- Accept-Charset ISO-8859-1,utf-8q0.7,q0.7
- Keep-Alive 300
- Connection keep-alive
28Same request, no Cookies
- GET /i/foo/bar/quux.gif HTTP/1.1
- Host static.example.net
- User-Agent Mozilla/5.0 (Windows U Windows NT
5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8 - Accept application/x-shockwave-flash,text/xml,app
lication/xml,application/xhtmlxml,text/htmlq0.9
,text/plainq0.8,video/x-mng,image/png,image/jpeg
,image/gifq0.2,/q0.1 - Referer http//www.example.com/foo/bar.php?abc12
3def456 - Accept-Language en-us,enq0.7,heq0.3
- Accept-Encoding gzip,deflate
- Accept-Charset ISO-8859-1,utf-8q0.7,q0.7
- Keep-Alive 300
- Connection keep-alive
- Added bonus much smaller GET request
- Dial-up MTU size 576 bytes, PPPoE 1492
- 1450 bytes reduced to 550
294. Apache defaults for static, occasionally-changi
ng content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
30Revalidation works pretty well
- Revalidation default behavior for static content
- Browser sends If-Modified-Since request
- Server replies with short 304 Not Modified
- No fancy Apache config needed
- Use if you cant predict when content will change
- Page designers can change immediately
- No renaming necessary
- Cost extra HTTP transaction for 304
- Small with Keep-Alive, but large sites disable
31Techniques to encourage caching
- Send explicit Cache-Control or Expires
- Generate static content headers
- Last-Modified, ETag
- Content-Length
- Avoid cgi-bin, .cgi or ? in URLs
- Some proxies (e.g. Squid) wont cache
- Use PATH_INFO instead
325. Random URL strings for accurate hit metering
or very sensitive content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
33Accurate advertising statistics
- If you trust proxies
- Send Cache-Control must-revalidate
- Count 304 Not Modified log entries as hits
- If you dont
- Ask client to fetch uncacheable image URL
- Return 307 to highly cacheable image file
- Count 307s as hits
- Dont bother to look at cacheable server log
34Hit-metering for advertisements (1)
- ltscript type"text/javascript"gt
- var r Math.random()
- var t new Date()
- document.write("ltimg width'109' height'52'
src'http//ads.example.com/ad/foo/bar.gif?t"
t.getTime() "r" r "'gt") - lt/scriptgt
- ltnoscriptgt
- ltimg width"109" height"52" src
"http//ads.example.com/ad/foo/bar.gif?js0"gt - lt/noscriptgt
35Hit-metering for advertisements (2)
- GET /ad/foo/bar.gif?t1090538707r0.5107729172349
83 HTTP/1.1 - Host ads.example.com
- User-Agent Mozilla/5.0 (Windows U Windows NT
5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8 - Referer http//www.example.com/foo/bar.php?abc12
3def456 - Cookie uidC50DF33E-E202-4206-B1F3-946AEDF9308B
- HTTP/1.1 307 Temporary Redirect
- Date Wed, 28 Jul 2004 234506 GMT
- Cache-Control max-age0,no-cache,no-store
- Expires Tue, 11 Oct 1977, 012345 GMT
- Pragma no-cache
- Location http//static.example.net/i/foo/bar.gif
- Content-Type text/html
- lta href"http//static.example.net/i/foo/bar.gif"gt
Movedlt/agt
36Hit-metering for advertisements (3)
- GET /i/foo/bar.gif HTTP/1.1
- Host static.example.net
- User-Agent Mozilla/5.0 (Windows U Windows NT
5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8 - Referer http//www.example.com/foo/bar.php?abc12
3def456 - HTTP/1.1 200 OK
- Date Wed, 28 Jul 2004 234507 GMT
- Last-Modified Mon, 05 Oct 1998 183251 GMT
- ETag "69079e-ad91-40212cc8"
- Cache-Control public,max-age315360000
- Expires Mon, 28 Jul 2014 234507 GMT
- Content-Length 6096
- Content-Type image/gif
- GIF89a...
37Turning proxies into private caches
- Use distinct tokens in URL
- No two users use same token
- Defeats shared proxy caches
- Works well with private caches
- Doesnt break the back button
- May break visited-link highlighting
- e.g. JavaScript timestamps/random numbers
- Every link is blue, no purple
38Breaking the Back button
- When users click browser Back button
- Expect to go back one page instantly
- Private cache enables this behavior
- Aggressive cache-busting breaks Back button
- Server sends Pragma no-cache or Expires in past
- Browser must re-visit server to re-fetch page
- Hitting network much slower than hitting disk
- Use very sparingly
- Compromising user experience is A Bad Thing
39Review Top 5 techniques
- Use Cache-Control private for personalized
content - Implement Images Never Expire policy
- Use a cookie-free TLD for static content
- Use Apache defaults for CSS JavaScript
- Use random strings in URL for accurate hit
metering or very sensitive content