Title: HTTP Caching
1HTTP Caching Cache-Bustingfor Content
Publishers
Michael J. Radwin http//public.yahoo.com/radwin/
ApacheCon 2004 November 17, 2004
2Hi, Im Michael J. Radwin
- Engineering Manager, Yahoo! Inc.
- Internal FAMP dev support
- FreeBSD, Apache, MySQL, PHP
- Also, some C/C libs (networking, data storage)
- Web Services infrastructure
- Developer Tools
- CVS, Bugzilla, package mgmt, i18n workflow
- Slides online
- http//public.yahoo.com/radwin/
3Why youre here today
- Publishers have a lot of web content
- HTML, images, Flash, movies
- Speed is important part of user experience
- Bandwidth is expensive
- Use what you need, but avoid unnecessary extra
- Personalization differentiates
- Show timely data (stock quotes, news stories)
- Get accurate advertising statistics
- Protect sensitive info (e-mail, account balances)
4Not covered in this talk
- Proxy deployment
- Configuring proxy cache servers (i.e. Squid)
- Configuring browsers to use proxy caches
- Transparent/interception proxy caching
- Intercache protocols (ICP, HTCP)
- HTTP acceleration (a k a reverse proxies)
- Database query results caching
5HTTP Review
(1) Client connects to www.example.com port 80
Server
Client
Internet
(2) Client sends HTTP GET request
Server
Client
Internet
6HTTP Review (contd)
(3) Client reads HTTP response from server
Server
Client
Internet
(4) Client and Server close connection
Server
Client
Internet
7HTTP Example
- mradwin_at_machshav telnet www.example.com 80
- Trying 192.168.37.203...
- Connected to w6.example.com.
- Escape character is ''.
- GET /foo/index.html HTTP/1.1
- Host www.example.com
- HTTP/1.1 200 OK
- Date Wed, 28 Jul 2004 233612 GMT
- Last-Modified Fri, 23 Jul 2004 015237 GMT
- Content-Length 3688
- Connection close
- Content-Type text/html
- lthtmlgtltheadgt
- lttitlegtHello Worldlt/titlegt
- ...
8Browsers use private caches
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul 2004
015237 GMT Content-Length 3688 Content-Type
text/html
Client stores copy of http//www.example.com/foo/i
ndex.html on its hard disk with timestamp.
9Revalidation (Conditional GET)
GET /foo/index.html HTTP/1.1 Host
www.example.com If-Modified-Since Fri, 23 Jul
2004 015237 GMT
Server
Client
Internet
HTTP/1.1 304 Not Modified
10Non-Caching Proxy
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
11Proxy Cache Miss
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
12Proxy Cache Hit
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
13Proxy Cache Revalidation Hit
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host
www.example.com If-Modified-Since Fri, 23 Jul ...
Server
Client
Proxy
Internet
HTTP/1.1 304 Not Modified
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
14Assumptions about content types
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
15Top 5 techniques for publishers
- Use Cache-Control private for personalized
content - Implement Images Never Expire policy
- Use a cookie-free TLD for static content
- Use Apache defaults for CSS JavaScript
- Use random strings in URL for accurate hit
metering or very sensitive content
161. Use Cache-Control privatefor personalized
content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
17Shared caching gone awry (1)
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
Webmail Server
Proxy
Client 1
Internet
Janes e-mail message
Janes e-mail message
18Shared caching gone awry (2)
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
Webmail Server
Proxy
Client 1
Internet
Janes e-mail message
msg3.html
19Shared caching gone awry (3)
Webmail Server
Proxy
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie usermary
Internet
Client 2
msg3.html
Janes e-mail message
20Whats cacheable?
- HTTP/1.1 allows caching anything by default
- Unless explicit Cache-Control header
- In practice, most caches avoid anything with
- Cache-Control/Pragma header
- Cookie/Set-Cookie headers
- WWW-Authenticate/Authorization header
- POST/PUT method
- 302/307 status code
21Cache-Control private
- Shared caches bad for shared content
- Mary shouldnt be able to read Janes mail
- Private caches perfectly OK
- Speed up web browsing experience
- Avoid personalization leakage with single line in
httpd.conf or .htaccess - Header set Cache-Control private
222. Images Never Expire policy
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
23Images Never Expire Policy
- Encourage caching of icons logos
- Forever 10 years in Internet biz
- Must change URL when you change img
- http//us.yimg.com/i/new.gif
- http//us.yimg.com/i/new2.gif
- Tradeoff
- More difficult for designers
- Bandwidth savings, faster user experience
24Imgs Never Expire mod_expires
- Works with both HTTP/1.0 and HTTP/1.1
- ExpiresActive On
- ExpiresByType image/gif A315360000
- ExpiresByType image/jpeg A315360000
- ExpiresByType image/png A315360000
25Imgs Never Expire mod_headers
- Works with HTTP/1.1 only
- ltFilesMatch "\.(gifjpe?gpng)"gt
- Header set Cache-Control \ "max-age315360000"
- lt/FilesMatchgt
- Works with both HTTP/1.0 and HTTP/1.1
- ltFilesMatch "\.(gifjpe?gpng)"gt
- Header set Expires \ "Mon, 28 Jul 2014
233000 GMT" - lt/FilesMatchgt
26mod_images_never_expire
- / Enforce policy with module that runs at URI
translation hook / - static int translate_imgexpire(request_rec r)
- const char ext
- if ((ext strrchr(r-gturi, '.')) ! NULL)
- if (strcasecmp(ext,".gif") 0
strcasecmp(ext,".jpg") 0 - strcasecmp(ext,".png") 0
strcasecmp(ext,".jpeg") 0) - if (ap_table_get(r-gtheaders_in,"If-Modified-
Since") ! NULL - ap_table_get(r-gtheaders_in,"If-None-Matc
h") ! NULL) - / Don't bother checking filesystem, just
hand back a 304 / - return HTTP_NOT_MODIFIED
-
-
-
- return DECLINED
273. Cookie-free static content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
28Use a cookie-free Top Level Domain for static
content
- For maximum efficiency use two domains
- www.example.com for HTML
- img.example.net for images
- Some proxies wont cache Cookie reqs
- But multimedia is never personalized
- Cookies irrelevant for images
29Typical GET request w/Cookies
- GET /i/foo/bar/quux.gif HTTP/1.1
- Host www.example.com
- User-Agent Mozilla/5.0 (Windows U Windows NT
5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8 - Accept application/x-shockwave-flash,text/xml,app
lication/xml,application/xhtmlxml,text/htmlq0.9
,text/plainq0.8,video/x-mng,image/png,image/jpeg
,image/gifq0.2,/q0.1 - Cookie UmtvtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEc
A--uxIIr.ABun42vnticvufc8v brandflash1
Bamfco1503sgp8b2 FaNC184LcsvfX96G.JR27qSjCHu
7bII3s. tXa44psMLliFtVoJB_m5wecWY_.7bK1It
LYCl_v2l_lv7l_lh03m8d50c8bo
l_s3yu2qxz5zvwquwwuzv22wrwr5t3w1zsrl_lid14rsb7
6l_ra8l_um1_0_1_0_0 GTSessionID8359908990238
3599089902340645635 Yv1n6eecgejj7012f
lh03m8d50c8bo/opm012o33013000007jb1647ra
8lgusintlusnp1 PROMOSOURCEfp5 YGCVd
TziTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ-a
YAEskDAAwRz5HlDUN2Tdc2wBT0RBekFURXdPRFV3TWpFek
5ETS0BYQFZQUUBb2sBWlcwLQF0aXABWUhaTVBBAXp6AWlUdS5B
QmdXQQ--afQUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE
FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ft
c1ZFZUhBLS0- LYSl_fh0l_vomyla
PAp0dg13DX4Ndgk-p16L5qmg--exMv.AB
YP.usv2maddrd1525SRobertsonBlvd01LosAng
eles01CA0190035-42310144800134.05159001-118.3
8434201901a0190035 - Referer http//www.example.com/foo/bar.php?abc12
3def456 - Accept-Language en-us,enq0.7,heq0.3
- Accept-Encoding gzip,deflate
- Accept-Charset ISO-8859-1,utf-8q0.7,q0.7
- Keep-Alive 300
- Connection keep-alive
30Same request, no Cookies
- GET /i/foo/bar/quux.gif HTTP/1.1
- Host img.example.net
- User-Agent Mozilla/5.0 (Windows U Windows NT
5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8 - Accept application/x-shockwave-flash,text/xml,app
lication/xml,application/xhtmlxml,text/htmlq0.9
,text/plainq0.8,video/x-mng,image/png,image/jpeg
,image/gifq0.2,/q0.1 - Referer http//www.example.com/foo/bar.php?abc12
3def456 - Accept-Language en-us,enq0.7,heq0.3
- Accept-Encoding gzip,deflate
- Accept-Charset ISO-8859-1,utf-8q0.7,q0.7
- Keep-Alive 300
- Connection keep-alive
- Added bonus much smaller GET request
- Dial-up MTU size 576 bytes, PPPoE 1492
- 1450 bytes reduced to 550
314. Apache defaults for static, occasionally-changi
ng content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
32Revalidation works pretty well
- Revalidation default behavior for static content
- Browser sends If-Modified-Since request
- Server replies with short 304 Not Modified
- No fancy Apache config needed
- Use if you cant predict when content will change
- Page designers can change immediately
- No renaming necessary
- Cost extra HTTP transaction for 304
- Small with Keep-Alive, but large sites disable
335. Random URL strings for hit metering, sensitive
content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
34Accurate advertising statistics
- If you trust proxies
- Send Cache-Control must-revalidate
- Count 304 Not Modified log entries as hits
- If you dont
- Ask client to fetch uncacheable image URL
- Return 302 to highly cacheable image file
- Count 302s as hits
- Dont bother to look at cacheable server log
35Hit-metering for ads (1)
- ltscript type"text/javascript"gt
- var r Math.random()
- var t new Date()
- document.write("ltimg width'109' height'52'
src'http//ads.example.com/ad/foo/bar.gif?t"
t.getTime() "r" r "'gt") - lt/scriptgt
- ltnoscriptgt
- ltimg width"109" height"52" src
"http//ads.example.com/ad/foo/bar.gif?js0"gt - lt/noscriptgt
36Hit-metering for ads (2)
- GET /ad/foo/bar.gif?t1090538707r0.5107729172349
83 HTTP/1.1 - Host ads.example.com
- User-Agent Mozilla/5.0 (Windows U Windows NT
5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8 - Referer http//www.example.com/foo/bar.php?abc12
3def456 - Cookie uidC50DF33E-E202-4206-B1F3-946AEDF9308B
- HTTP/1.1 302 Moved Temporarily
- Date Wed, 28 Jul 2004 234506 GMT
- Cache-Control max-age0,no-cache,no-store
- Expires Tue, 11 Oct 1977, 012345 GMT
- Pragma no-cache
- Location http//img.example.net/i/foo/bar.gif
- Content-Type text/html
- lta href"http//img.example.net/i/foo/bar.gif"gtMov
edlt/agt
37Hit-metering for ads (3)
- GET /i/foo/bar.gif HTTP/1.1
- Host img.example.net
- User-Agent Mozilla/5.0 (Windows U Windows NT
5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8 - Referer http//www.example.com/foo/bar.php?abc12
3def456 - HTTP/1.1 200 OK
- Date Wed, 28 Jul 2004 234507 GMT
- Last-Modified Mon, 05 Oct 1998 183251 GMT
- ETag "69079e-ad91-40212cc8"
- Cache-Control public,max-age315360000
- Expires Mon, 28 Jul 2014 234507 GMT
- Content-Length 6096
- Content-Type image/gif
- GIF89a...
38Defeating proxies turning public caches into
private caches
- Use distinct tokens in URL
- No two users use same token
- Defeats shared proxy caches
- Works well with private caches
- Doesnt break the back button
- May break visited-link highlighting
- e.g. JavaScript timestamps/random numbers
- Every link is blue, no purple
39Breaking the Back button
- When users click browser Back button
- Expect to go back one page instantly
- Private cache enables this behavior
- Aggressive cache-busting breaks Back button
- Server sends Pragma no-cache or Expires in past
- Browser must re-visit server to re-fetch page
- Hitting network much slower than hitting disk
- Use very sparingly
- Compromising user experience is A Bad Thing
40Review Top 5 techniques
- Use Cache-Control private for personalized
content - Implement Images Never Expire policy
- Use a cookie-free TLD for static content
- Use Apache defaults for CSS JavaScript
- Use random strings in URL for accurate hit
metering or very sensitive content
41Review encouraging caching
- Send explicit Cache-Control or Expires
- Generate static content headers
- Last-Modified, ETag
- Content-Length
- Avoid cgi-bin, .cgi or ? in URLs
- Some proxies (e.g. Squid) wont cache
- Workaround use PATH_INFO instead
42Review discouraging caching
- Use POST instead of GET
- Use random strings and ? char in URL
- Omit Content-Length Last-Modified
- Send explicit headers on response
- Breaks the back button
- Only as a last resort
- Cache-Control max-age0,no-cache,no-store
- Expires Thu, 01 Jan 1970 123456 GMT
- Pragma no-cache
43Recommended Reading
- Web Caching and Replication
- Michael Rabinovich Oliver Spatscheck
- Addison-Wesley, 2001
- Web Caching
- Duane Wessels
- O'Reilly, 2001
44Wrapping Up
- Please fill out Session Eval Form
- Session WE16
- Title HTTP Caching Cache-busting
- Speaker Michael Radwin
- Slides online
- http//public.yahoo.com/radwin/
45(No Transcript)