HTTP Caching - PowerPoint PPT Presentation

About This Presentation
Title:

HTTP Caching

Description:

Internal FAMP dev & support FreeBSD, Apache, MySQL, PHP Also, some C/C++ libs (networking, data storage) Web Services infrastructure Developer Tools CVS, Bugzilla, ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 46
Provided by: Yah575
Category:
Tags: http | bugzilla | caching | tools

less

Transcript and Presenter's Notes

Title: HTTP Caching


1
HTTP Caching Cache-Bustingfor Content
Publishers
Michael J. Radwin http//public.yahoo.com/radwin/
ApacheCon 2004 November 17, 2004
2
Hi, Im Michael J. Radwin
  • Engineering Manager, Yahoo! Inc.
  • Internal FAMP dev support
  • FreeBSD, Apache, MySQL, PHP
  • Also, some C/C libs (networking, data storage)
  • Web Services infrastructure
  • Developer Tools
  • CVS, Bugzilla, package mgmt, i18n workflow
  • Slides online
  • http//public.yahoo.com/radwin/

3
Why youre here today
  • Publishers have a lot of web content
  • HTML, images, Flash, movies
  • Speed is important part of user experience
  • Bandwidth is expensive
  • Use what you need, but avoid unnecessary extra
  • Personalization differentiates
  • Show timely data (stock quotes, news stories)
  • Get accurate advertising statistics
  • Protect sensitive info (e-mail, account balances)

4
Not covered in this talk
  • Proxy deployment
  • Configuring proxy cache servers (i.e. Squid)
  • Configuring browsers to use proxy caches
  • Transparent/interception proxy caching
  • Intercache protocols (ICP, HTCP)
  • HTTP acceleration (a k a reverse proxies)
  • Database query results caching

5
HTTP Review
(1) Client connects to www.example.com port 80
Server
Client
Internet
(2) Client sends HTTP GET request
Server
Client
Internet
6
HTTP Review (contd)
(3) Client reads HTTP response from server
Server
Client
Internet
(4) Client and Server close connection
Server
Client
Internet
7
HTTP Example
  • mradwin_at_machshav telnet www.example.com 80
  • Trying 192.168.37.203...
  • Connected to w6.example.com.
  • Escape character is ''.
  • GET /foo/index.html HTTP/1.1
  • Host www.example.com
  • HTTP/1.1 200 OK
  • Date Wed, 28 Jul 2004 233612 GMT
  • Last-Modified Fri, 23 Jul 2004 015237 GMT
  • Content-Length 3688
  • Connection close
  • Content-Type text/html
  • lthtmlgtltheadgt
  • lttitlegtHello Worldlt/titlegt
  • ...

8
Browsers use private caches
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul 2004
015237 GMT Content-Length 3688 Content-Type
text/html
Client stores copy of http//www.example.com/foo/i
ndex.html on its hard disk with timestamp.
9
Revalidation (Conditional GET)
GET /foo/index.html HTTP/1.1 Host
www.example.com If-Modified-Since Fri, 23 Jul
2004 015237 GMT
Server
Client
Internet
HTTP/1.1 304 Not Modified
10
Non-Caching Proxy
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
11
Proxy Cache Miss
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
12
Proxy Cache Hit
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
13
Proxy Cache Revalidation Hit
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host
www.example.com If-Modified-Since Fri, 23 Jul ...
Server
Client
Proxy
Internet
HTTP/1.1 304 Not Modified
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
14
Assumptions about content types
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
15
Top 5 techniques for publishers
  1. Use Cache-Control private for personalized
    content
  2. Implement Images Never Expire policy
  3. Use a cookie-free TLD for static content
  4. Use Apache defaults for CSS JavaScript
  5. Use random strings in URL for accurate hit
    metering or very sensitive content

16
1. Use Cache-Control privatefor personalized
content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
17
Shared caching gone awry (1)
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
Webmail Server
Proxy
Client 1
Internet
Janes e-mail message
Janes e-mail message

18
Shared caching gone awry (2)
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
Webmail Server
Proxy
Client 1
Internet
Janes e-mail message
msg3.html
19
Shared caching gone awry (3)
Webmail Server
Proxy
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie usermary
Internet
Client 2
msg3.html
Janes e-mail message
20
Whats cacheable?
  • HTTP/1.1 allows caching anything by default
  • Unless explicit Cache-Control header
  • In practice, most caches avoid anything with
  • Cache-Control/Pragma header
  • Cookie/Set-Cookie headers
  • WWW-Authenticate/Authorization header
  • POST/PUT method
  • 302/307 status code

21
Cache-Control private
  • Shared caches bad for shared content
  • Mary shouldnt be able to read Janes mail
  • Private caches perfectly OK
  • Speed up web browsing experience
  • Avoid personalization leakage with single line in
    httpd.conf or .htaccess
  • Header set Cache-Control private

22
2. Images Never Expire policy
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
23
Images Never Expire Policy
  • Encourage caching of icons logos
  • Forever 10 years in Internet biz
  • Must change URL when you change img
  • http//us.yimg.com/i/new.gif
  • http//us.yimg.com/i/new2.gif
  • Tradeoff
  • More difficult for designers
  • Bandwidth savings, faster user experience

24
Imgs Never Expire mod_expires
  • Works with both HTTP/1.0 and HTTP/1.1
  • ExpiresActive On
  • ExpiresByType image/gif A315360000
  • ExpiresByType image/jpeg A315360000
  • ExpiresByType image/png A315360000

25
Imgs Never Expire mod_headers
  • Works with HTTP/1.1 only
  • ltFilesMatch "\.(gifjpe?gpng)"gt
  • Header set Cache-Control \ "max-age315360000"
  • lt/FilesMatchgt
  • Works with both HTTP/1.0 and HTTP/1.1
  • ltFilesMatch "\.(gifjpe?gpng)"gt
  • Header set Expires \ "Mon, 28 Jul 2014
    233000 GMT"
  • lt/FilesMatchgt

26
mod_images_never_expire
  • / Enforce policy with module that runs at URI
    translation hook /
  • static int translate_imgexpire(request_rec r)
  • const char ext
  • if ((ext strrchr(r-gturi, '.')) ! NULL)
  • if (strcasecmp(ext,".gif") 0
    strcasecmp(ext,".jpg") 0
  • strcasecmp(ext,".png") 0
    strcasecmp(ext,".jpeg") 0)
  • if (ap_table_get(r-gtheaders_in,"If-Modified-
    Since") ! NULL
  • ap_table_get(r-gtheaders_in,"If-None-Matc
    h") ! NULL)
  • / Don't bother checking filesystem, just
    hand back a 304 /
  • return HTTP_NOT_MODIFIED
  • return DECLINED

27
3. Cookie-free static content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
28
Use a cookie-free Top Level Domain for static
content
  • For maximum efficiency use two domains
  • www.example.com for HTML
  • img.example.net for images
  • Some proxies wont cache Cookie reqs
  • But multimedia is never personalized
  • Cookies irrelevant for images

29
Typical GET request w/Cookies
  • GET /i/foo/bar/quux.gif HTTP/1.1
  • Host www.example.com
  • User-Agent Mozilla/5.0 (Windows U Windows NT
    5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8
  • Accept application/x-shockwave-flash,text/xml,app
    lication/xml,application/xhtmlxml,text/htmlq0.9
    ,text/plainq0.8,video/x-mng,image/png,image/jpeg
    ,image/gifq0.2,/q0.1
  • Cookie UmtvtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEc
    A--uxIIr.ABun42vnticvufc8v brandflash1
    Bamfco1503sgp8b2 FaNC184LcsvfX96G.JR27qSjCHu
    7bII3s. tXa44psMLliFtVoJB_m5wecWY_.7bK1It
    LYCl_v2l_lv7l_lh03m8d50c8bo
    l_s3yu2qxz5zvwquwwuzv22wrwr5t3w1zsrl_lid14rsb7
    6l_ra8l_um1_0_1_0_0 GTSessionID8359908990238
    3599089902340645635 Yv1n6eecgejj7012f
    lh03m8d50c8bo/opm012o33013000007jb1647ra
    8lgusintlusnp1 PROMOSOURCEfp5 YGCVd
    TziTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ-a
    YAEskDAAwRz5HlDUN2Tdc2wBT0RBekFURXdPRFV3TWpFek
    5ETS0BYQFZQUUBb2sBWlcwLQF0aXABWUhaTVBBAXp6AWlUdS5B
    QmdXQQ--afQUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE
    FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ft
    c1ZFZUhBLS0- LYSl_fh0l_vomyla
    PAp0dg13DX4Ndgk-p16L5qmg--exMv.AB
    YP.usv2maddrd1525SRobertsonBlvd01LosAng
    eles01CA0190035-42310144800134.05159001-118.3
    8434201901a0190035
  • Referer http//www.example.com/foo/bar.php?abc12
    3def456
  • Accept-Language en-us,enq0.7,heq0.3
  • Accept-Encoding gzip,deflate
  • Accept-Charset ISO-8859-1,utf-8q0.7,q0.7
  • Keep-Alive 300
  • Connection keep-alive

30
Same request, no Cookies
  • GET /i/foo/bar/quux.gif HTTP/1.1
  • Host img.example.net
  • User-Agent Mozilla/5.0 (Windows U Windows NT
    5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8
  • Accept application/x-shockwave-flash,text/xml,app
    lication/xml,application/xhtmlxml,text/htmlq0.9
    ,text/plainq0.8,video/x-mng,image/png,image/jpeg
    ,image/gifq0.2,/q0.1
  • Referer http//www.example.com/foo/bar.php?abc12
    3def456
  • Accept-Language en-us,enq0.7,heq0.3
  • Accept-Encoding gzip,deflate
  • Accept-Charset ISO-8859-1,utf-8q0.7,q0.7
  • Keep-Alive 300
  • Connection keep-alive
  • Added bonus much smaller GET request
  • Dial-up MTU size 576 bytes, PPPoE 1492
  • 1450 bytes reduced to 550

31
4. Apache defaults for static, occasionally-changi
ng content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
32
Revalidation works pretty well
  • Revalidation default behavior for static content
  • Browser sends If-Modified-Since request
  • Server replies with short 304 Not Modified
  • No fancy Apache config needed
  • Use if you cant predict when content will change
  • Page designers can change immediately
  • No renaming necessary
  • Cost extra HTTP transaction for 304
  • Small with Keep-Alive, but large sites disable

33
5. Random URL strings for hit metering, sensitive
content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
34
Accurate advertising statistics
  • If you trust proxies
  • Send Cache-Control must-revalidate
  • Count 304 Not Modified log entries as hits
  • If you dont
  • Ask client to fetch uncacheable image URL
  • Return 302 to highly cacheable image file
  • Count 302s as hits
  • Dont bother to look at cacheable server log

35
Hit-metering for ads (1)
  • ltscript type"text/javascript"gt
  • var r Math.random()
  • var t new Date()
  • document.write("ltimg width'109' height'52'
    src'http//ads.example.com/ad/foo/bar.gif?t"
    t.getTime() "r" r "'gt")
  • lt/scriptgt
  • ltnoscriptgt
  • ltimg width"109" height"52" src
    "http//ads.example.com/ad/foo/bar.gif?js0"gt
  • lt/noscriptgt

36
Hit-metering for ads (2)
  • GET /ad/foo/bar.gif?t1090538707r0.5107729172349
    83 HTTP/1.1
  • Host ads.example.com
  • User-Agent Mozilla/5.0 (Windows U Windows NT
    5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8
  • Referer http//www.example.com/foo/bar.php?abc12
    3def456
  • Cookie uidC50DF33E-E202-4206-B1F3-946AEDF9308B
  • HTTP/1.1 302 Moved Temporarily
  • Date Wed, 28 Jul 2004 234506 GMT
  • Cache-Control max-age0,no-cache,no-store
  • Expires Tue, 11 Oct 1977, 012345 GMT
  • Pragma no-cache
  • Location http//img.example.net/i/foo/bar.gif
  • Content-Type text/html
  • lta href"http//img.example.net/i/foo/bar.gif"gtMov
    edlt/agt

37
Hit-metering for ads (3)
  • GET /i/foo/bar.gif HTTP/1.1
  • Host img.example.net
  • User-Agent Mozilla/5.0 (Windows U Windows NT
    5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8
  • Referer http//www.example.com/foo/bar.php?abc12
    3def456
  • HTTP/1.1 200 OK
  • Date Wed, 28 Jul 2004 234507 GMT
  • Last-Modified Mon, 05 Oct 1998 183251 GMT
  • ETag "69079e-ad91-40212cc8"
  • Cache-Control public,max-age315360000
  • Expires Mon, 28 Jul 2014 234507 GMT
  • Content-Length 6096
  • Content-Type image/gif
  • GIF89a...

38
Defeating proxies turning public caches into
private caches
  • Use distinct tokens in URL
  • No two users use same token
  • Defeats shared proxy caches
  • Works well with private caches
  • Doesnt break the back button
  • May break visited-link highlighting
  • e.g. JavaScript timestamps/random numbers
  • Every link is blue, no purple

39
Breaking the Back button
  • When users click browser Back button
  • Expect to go back one page instantly
  • Private cache enables this behavior
  • Aggressive cache-busting breaks Back button
  • Server sends Pragma no-cache or Expires in past
  • Browser must re-visit server to re-fetch page
  • Hitting network much slower than hitting disk
  • Use very sparingly
  • Compromising user experience is A Bad Thing

40
Review Top 5 techniques
  1. Use Cache-Control private for personalized
    content
  2. Implement Images Never Expire policy
  3. Use a cookie-free TLD for static content
  4. Use Apache defaults for CSS JavaScript
  5. Use random strings in URL for accurate hit
    metering or very sensitive content

41
Review encouraging caching
  • Send explicit Cache-Control or Expires
  • Generate static content headers
  • Last-Modified, ETag
  • Content-Length
  • Avoid cgi-bin, .cgi or ? in URLs
  • Some proxies (e.g. Squid) wont cache
  • Workaround use PATH_INFO instead

42
Review discouraging caching
  • Use POST instead of GET
  • Use random strings and ? char in URL
  • Omit Content-Length Last-Modified
  • Send explicit headers on response
  • Breaks the back button
  • Only as a last resort
  • Cache-Control max-age0,no-cache,no-store
  • Expires Thu, 01 Jan 1970 123456 GMT
  • Pragma no-cache

43
Recommended Reading
  • Web Caching and Replication
  • Michael Rabinovich Oliver Spatscheck
  • Addison-Wesley, 2001
  • Web Caching
  • Duane Wessels
  • O'Reilly, 2001

44
Wrapping Up
  • Please fill out Session Eval Form
  • Session WE16
  • Title HTTP Caching Cache-busting
  • Speaker Michael Radwin
  • Slides online
  • http//public.yahoo.com/radwin/

45
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com