HTTP Caching - PowerPoint PPT Presentation

About This Presentation
Title:

HTTP Caching

Description:

Publishers have a lot of web content. HTML, images, Flash, movies ... Show timely data (stock quotes, news stories) Get accurate advertising statistics ... – PowerPoint PPT presentation

Number of Views:451
Avg rating:3.0/5.0
Slides: 40
Provided by: michael341
Learn more at: https://www.radwin.org
Category:

less

Transcript and Presenter's Notes

Title: HTTP Caching


1
HTTP Caching Cache-Bustingfor Content
Publishers
  • Michael J. Radwin
  • OReilly Open Source Convention
  • July 28, 2004

2
Publishers must think about caching
  • Publishers have a lot of web content
  • HTML, images, Flash, movies
  • Speed is important part of user experience
  • Bandwidth is expensive
  • Use what you need, but avoid unnecessary extra
  • Personalization differentiates
  • Show timely data (stock quotes, news stories)
  • Get accurate advertising statistics
  • Protect sensitive info (e-mail, account balances)

3
HTTP Review
(1) Client connects to www.example.com port 80
Server
Client
Internet
(2) Client sends HTTP GET request
Server
Client
Internet
4
HTTP Review (contd)
(3) Client reads HTTP response from server
Server
Client
Internet
(4) Client and Server close connection
Server
Client
Internet
5
HTTP Example
  • mradwin_at_machshav telnet www.example.com 80
  • Trying 192.168.37.203...
  • Connected to w6.example.com.
  • Escape character is ''.
  • GET /foo/index.html HTTP/1.1
  • Host www.example.com
  • HTTP/1.1 200 OK
  • Date Wed, 28 Jul 2004 233612 GMT
  • Last-Modified Fri, 23 Jul 2004 015237 GMT
  • Content-Length 3688
  • Connection close
  • Content-Type text/html
  • lthtmlgtltheadgt
  • lttitlegtHello Worldlt/titlegt
  • ...

6
Browsers use private caches
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul 2004
015237 GMT Content-Length 3688 Content-Type
text/html
Client stores copy of http//www.example.com/foo/i
ndex.html on its hard disk with timestamp.
7
Revalidation (Conditional GET)
GET /foo/index.html HTTP/1.1 Host
www.example.com If-Modified-Since Fri, 23 Jul
2004 015237 GMT
Server
Client
Internet
HTTP/1.1 304 Not Modified
8
Non-Caching Proxy
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
9
Proxy Cache Miss
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
10
Proxy Cache Hit
GET /foo/index.html HTTP/1.1 Host www.example.com
Server
Client
Proxy
Internet
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
11
Proxy Cache Revalidation Hit
GET /foo/index.html HTTP/1.1 Host www.example.com
GET /foo/index.html HTTP/1.1 Host
www.example.com If-Modified-Since Fri, 23 Jul ...
Server
Client
Proxy
Internet
HTTP/1.1 304 Not Modified
HTTP/1.1 200 OK Last-Modified Fri, 23 Jul
... Content-Length 3688 Content-Type text/html
12
Assumptions about content types
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
13
Top 5 techniques for publishers
  1. Use Cache-Control private for personalized
    content
  2. Implement Images Never Expire policy
  3. Use a cookie-free TLD for static content
  4. Use Apache defaults for CSS JavaScript
  5. Use random strings in URL for accurate hit
    metering or very sensitive content

14
1. Use Cache-Control privatefor personalized
content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
15
Bad caching of personalized content
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
Webmail Server
Proxy
Client 1
Internet
Janes e-mail message
Janes e-mail message

16
Bad caching of personalized content
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie userjane
Webmail Server
Proxy
Client 1
Internet
Janes e-mail message
msg3.html
17
Bad caching of personalized content
Webmail Server
Proxy
GET /msg3.html HTTP/1.1 Host webmail.example.com
Cookie usermary
Internet
Client 2
msg3.html
Janes e-mail message
18
Whats cacheable?
  • HTTP/1.1 allows caching anything by default
  • Unless explicit Cache-Control header
  • In practice, most caches avoid anything with
  • Cache-Control/Pragma header
  • Cookie/Set-Cookie headers
  • WWW-Authenticate/Authorization header
  • POST/PUT method
  • 302/307 status code

19
Cache-Control private
  • Shared caches bad for shared content
  • Mary shouldnt be able to read Janes webmail
  • Private caches perfectly OK
  • Speed up web browsing experience
  • Avoid personalization leakage with single line in
    httpd.conf or .htaccess
  • Header set Cache-Control private

20
2. Images Never Expire policy
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
21
The Images Never Expire Policy
  • Encourage caching of icons logos
  • Forever 10 years in Internet biz
  • Must change URL when you change image
  • http//us.yimg.com/i/new.gif
  • http//us.yimg.com/i/new2.gif
  • Tradeoff
  • More difficult for designers
  • Bandwidth savings, faster user experience

22
Images Never Expire (mod_expires)
  • Works with both HTTP/1.0 and HTTP/1.1
  • ExpiresActive On
  • ExpiresByType image/gif A315360000
  • ExpiresByType image/jpeg A315360000
  • ExpiresByType image/png A315360000

23
Images Never Expire (mod_headers)
  • Works with HTTP/1.1 only
  • ltFilesMatch "\.(gifjpe?gpng)"gt
  • Header set Cache-Control \ "max-age315360000"
  • lt/FilesMatchgt
  • Works with both HTTP/1.0 and HTTP/1.1
  • ltFilesMatch "\.(gifjpe?gpng)"gt
  • Header set Expires \ "Mon, 28 Jul 2014
    233000 GMT"
  • lt/FilesMatchgt

24
mod_images_never_expire
  • / Enforce policy with module that runs at URI
    translation hook /
  • static int translate_imgexpire(request_rec r)
  • const char ext
  • if ((ext strrchr(r-gturi, '.')) ! NULL)
  • if (strcasecmp(ext, ".gif") 0
    strcasecmp(ext, ".jpg") 0
  • strcasecmp(ext, ".png") 0
    strcasecmp(ext, ".jpeg") 0)
  • if (ap_table_get(r-gtheaders_in,
    "If-Modified-Since") ! NULL
  • ap_table_get(r-gtheaders_in,
    "If-None-Match") ! NULL)
  • / Don't bother checking filesystem, just
    hand back a 304 /
  • return HTTP_NOT_MODIFIED
  • return DECLINED

25
3. Cookie-free TLD for static content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
26
Cookie-free TLD for static content
  • For maximum efficiency use two domains
  • www.example.com for HTML
  • static.example.net for images
  • Many proxies wont cache Cookie reqs
  • But multimedia is never personalized
  • Cookies would ignored by server anyways

27
Typical GET request w/Cookies
  • GET /i/foo/bar/quux.gif HTTP/1.1
  • Host www.example.com
  • User-Agent Mozilla/5.0 (Windows U Windows NT
    5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8
  • Accept application/x-shockwave-flash,text/xml,app
    lication/xml,application/xhtmlxml,text/htmlq0.9
    ,text/plainq0.8,video/x-mng,image/png,image/jpeg
    ,image/gifq0.2,/q0.1
  • Cookie UmtvtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEc
    A--uxIIr.ABun42vnticvufc8v brandflash1
    Bamfco1503sgp8b2 FaNC184LcsvfX96G.JR27qSjCHu
    7bII3s. tXa44psMLliFtVoJB_m5wecWY_.7bK1It
    LYCl_v2l_lv7l_lh03m8d50c8bo
    l_s3yu2qxz5zvwquwwuzv22wrwr5t3w1zsrl_lid14rsb7
    6l_ra8l_um1_0_1_0_0 GTSessionID8359908990238
    3599089902340645635 Yv1n6eecgejj7012f
    lh03m8d50c8bo/opm012o33013000007jb1647ra
    8lgusintlusnp1 PROMOSOURCEfp5 YGCVd
    TziTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ-a
    YAEskDAAwRz5HlDUN2Tdc2wBT0RBekFURXdPRFV3TWpFek
    5ETS0BYQFZQUUBb2sBWlcwLQF0aXABWUhaTVBBAXp6AWlUdS5B
    QmdXQQ--afQUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE
    FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ft
    c1ZFZUhBLS0- LYSl_fh0l_vomyla
    PAp0dg13DX4Ndgk-p16L5qmg--exMv.AB
    YP.usv2maddrd1525SRobertsonBlvd01LosAng
    eles01CA0190035-42310144800134.05159001-118.3
    8434201901a0190035
  • Referer http//www.example.com/foo/bar.php?abc12
    3def456
  • Accept-Language en-us,enq0.7,heq0.3
  • Accept-Encoding gzip,deflate
  • Accept-Charset ISO-8859-1,utf-8q0.7,q0.7
  • Keep-Alive 300
  • Connection keep-alive

28
Same request, no Cookies
  • GET /i/foo/bar/quux.gif HTTP/1.1
  • Host static.example.net
  • User-Agent Mozilla/5.0 (Windows U Windows NT
    5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8
  • Accept application/x-shockwave-flash,text/xml,app
    lication/xml,application/xhtmlxml,text/htmlq0.9
    ,text/plainq0.8,video/x-mng,image/png,image/jpeg
    ,image/gifq0.2,/q0.1
  • Referer http//www.example.com/foo/bar.php?abc12
    3def456
  • Accept-Language en-us,enq0.7,heq0.3
  • Accept-Encoding gzip,deflate
  • Accept-Charset ISO-8859-1,utf-8q0.7,q0.7
  • Keep-Alive 300
  • Connection keep-alive
  • Added bonus much smaller GET request
  • Dial-up MTU size 576 bytes, PPPoE 1492
  • 1450 bytes reduced to 550

29
4. Apache defaults for static, occasionally-changi
ng content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
30
Revalidation works pretty well
  • Revalidation default behavior for static content
  • Browser sends If-Modified-Since request
  • Server replies with short 304 Not Modified
  • No fancy Apache config needed
  • Use if you cant predict when content will change
  • Page designers can change immediately
  • No renaming necessary
  • Cost extra HTTP transaction for 304
  • Small with Keep-Alive, but large sites disable

31
Techniques to encourage caching
  • Send explicit Cache-Control or Expires
  • Generate static content headers
  • Last-Modified, ETag
  • Content-Length
  • Avoid cgi-bin, .cgi or ? in URLs
  • Some proxies (e.g. Squid) wont cache
  • Use PATH_INFO instead

32
5. Random URL strings for accurate hit metering
or very sensitive content
Rate of change once published Frequently
Occasionally Rarely/Never
HTML CSS JavaScript Images Flash PDF
Dynamic Content Static
Content Personalized Same for everyone
33
Accurate advertising statistics
  • If you trust proxies
  • Send Cache-Control must-revalidate
  • Count 304 Not Modified log entries as hits
  • If you dont
  • Ask client to fetch uncacheable image URL
  • Return 307 to highly cacheable image file
  • Count 307s as hits
  • Dont bother to look at cacheable server log

34
Hit-metering for advertisements (1)
  • ltscript type"text/javascript"gt
  • var r Math.random()
  • var t new Date()
  • document.write("ltimg width'109' height'52'
    src'http//ads.example.com/ad/foo/bar.gif?t"
    t.getTime() "r" r "'gt")
  • lt/scriptgt
  • ltnoscriptgt
  • ltimg width"109" height"52" src
    "http//ads.example.com/ad/foo/bar.gif?js0"gt
  • lt/noscriptgt

35
Hit-metering for advertisements (2)
  • GET /ad/foo/bar.gif?t1090538707r0.5107729172349
    83 HTTP/1.1
  • Host ads.example.com
  • User-Agent Mozilla/5.0 (Windows U Windows NT
    5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8
  • Referer http//www.example.com/foo/bar.php?abc12
    3def456
  • Cookie uidC50DF33E-E202-4206-B1F3-946AEDF9308B
  • HTTP/1.1 307 Temporary Redirect
  • Date Wed, 28 Jul 2004 234506 GMT
  • Cache-Control max-age0,no-cache,no-store
  • Expires Tue, 11 Oct 1977, 012345 GMT
  • Pragma no-cache
  • Location http//static.example.net/i/foo/bar.gif
  • Content-Type text/html
  • lta href"http//static.example.net/i/foo/bar.gif"gt
    Movedlt/agt

36
Hit-metering for advertisements (3)
  • GET /i/foo/bar.gif HTTP/1.1
  • Host static.example.net
  • User-Agent Mozilla/5.0 (Windows U Windows NT
    5.0 en-US rv1.7) Gecko/20040707 Firefox/0.8
  • Referer http//www.example.com/foo/bar.php?abc12
    3def456
  • HTTP/1.1 200 OK
  • Date Wed, 28 Jul 2004 234507 GMT
  • Last-Modified Mon, 05 Oct 1998 183251 GMT
  • ETag "69079e-ad91-40212cc8"
  • Cache-Control public,max-age315360000
  • Expires Mon, 28 Jul 2014 234507 GMT
  • Content-Length 6096
  • Content-Type image/gif
  • GIF89a...

37
Turning proxies into private caches
  • Use distinct tokens in URL
  • No two users use same token
  • Defeats shared proxy caches
  • Works well with private caches
  • Doesnt break the back button
  • May break visited-link highlighting
  • e.g. JavaScript timestamps/random numbers
  • Every link is blue, no purple

38
Breaking the Back button
  • When users click browser Back button
  • Expect to go back one page instantly
  • Private cache enables this behavior
  • Aggressive cache-busting breaks Back button
  • Server sends Pragma no-cache or Expires in past
  • Browser must re-visit server to re-fetch page
  • Hitting network much slower than hitting disk
  • Use very sparingly
  • Compromising user experience is A Bad Thing

39
Review Top 5 techniques
  1. Use Cache-Control private for personalized
    content
  2. Implement Images Never Expire policy
  3. Use a cookie-free TLD for static content
  4. Use Apache defaults for CSS JavaScript
  5. Use random strings in URL for accurate hit
    metering or very sensitive content
Write a Comment
User Comments (0)
About PowerShow.com