Title: CS193H: High Performance Web Sites Lecture 16: Rule 13
1CS193HHigh Performance Web SitesLecture 16
Rule 13 Configure ETags
- Steve Souders
- Google
- souders_at_cs.stanford.edu
2announcements
- 11/17 guest lecturer Robert Johnson
(Facebook), "Fast Data at Massive Scale - lessons
learned at Facebook"
3Expires
GET /v-app/scripts/107652916-dom.common.js
HTTP/1.1 Host www.blogger.com User-Agent
Mozilla/5.0 () Gecko/2008070208
Firefox/3.0.1 Accept-Encoding gzip,deflate
HTTP/1.1 200 OK Content-Type application/x-javasc
ript Last-Modified Mon, 22 Sep 2008 211435
GMT Content-Length 2066 Content-Encoding
gzip XmoÛHþ\ÿFÖvãwØoq...
HTTP/1.1 200 OK Content-Type application/x-javasc
ript Last-Modified Mon, 22 Sep 2008 211435
GMT Content-Length 2066 Content-Encoding
gzip Expires Fri, 26 Sep 2008 220000
GMT XmoÛHþ\ÿFÖvãwØoq...
- expiration date determines freshness
- can also use Cache-Controlmax-age
4Conditional GET (IMS)
sometime after 3pm PT 9/24/08
GET /v-app/scripts/107652916-dom.common.js
HTTP/1.1 Host www.blogger.com User-Agent
Mozilla/5.0 () Gecko/2008070208
Firefox/3.0.1 Accept-Encoding gzip,deflate If-Mod
ified-Since Mon, 22 Sep 2008 211435 GMT
HTTP/1.1 304 Not Modified
- IMS determines validity does the browser's
cached version match what's on the server? - the comparison is based on the resource's date
- a 304 response is sent instead of all the data
- IMS is used when Reload is pressed
5ETag Response Header
GET /v-app/scripts/107652916-dom.common.js
HTTP/1.1 Host www.blogger.com User-Agent
Mozilla/5.0 () Gecko/2008070208
Firefox/3.0.1 Accept-Encoding gzip,deflate
HTTP/1.1 200 OK Content-Type application/x-javasc
ript Last-Modified Mon, 22 Sep 2008 211435
GMT Content-Length 2066 Content-Encoding
gzip XmoÛHþ\ÿFÖvãwØoq...
HTTP/1.1 200 OK Content-Type application/x-javasc
ript Last-Modified Mon, 22 Sep 2008 211435
GMT Content-Length 2066 Content-Encoding
gzip Expires Fri, 26 Sep 2008 220000 GMT ETag
"19f1e-7920-4525b037f0440" XmoÛHþ\ÿFÖvãwØoq...
6Conditional GET (INM)
sometime after 3pm PT 9/24/08
GET /v-app/scripts/107652916-dom.common.js
HTTP/1.1 Host www.blogger.com User-Agent
Mozilla/5.0 () Gecko/2008070208
Firefox/3.0.1 Accept-Encoding gzip,deflate If-Mod
ified-Since Mon, 22 Sep 2008 211435
GMT If-None-Match "19f1e-7920-4525b037f0440"
HTTP/1.1 304 Not Modified
- alternative way to test validity
7What is an ETag
- http//www.w3.org/Protocols/rfc2616/rfc2616-sec3.h
tmlsec3.11 - added in HTTP/1.1
- used by clients and servers to validate expired
resources - more flexible than Last-Modified date
- "An entity tag consists of an opaque quoted
string" - " An entity tag MUST be unique across all
versions of all entities associated with a
particular resource."
8If-None-Match (hit)
http//www.w3.org/Protocols/rfc2616/rfc2616-sec14.
htmlsec14.26
- "If any of the entity tags match the entity tag
of the entity that would have been returned in
the response to a similar GET request (without
the If-None-Match header) on that resource,
then the server MUST NOT perform the requested
method, unless required to do so because the
resource's modification date fails to match that
supplied in an If-Modified-Since header field in
the request. Instead, if the request method was
GET or HEAD, the server SHOULD respond with a 304
(Not Modified) response,"
9INM, IMS hit miss
hit miss
hit 304 full response
miss
If-Modified- Since
If-None-Match
10If-None-Match (miss)
- If none of the entity tags match, then the server
MAY perform the requested method as if the
If-None-Match header field did not exist, but
MUST also ignore any If-Modified-Since header
field(s) in the request. That is, if no entity
tags match, then the server MUST NOT return a 304
(Not Modified) response.
11INM, IMS hit miss
hit miss
hit 304 full response
miss full response full response
If-Modified- Since
If-None-Match
if not managed properly, sending both IMS and INM
lowers the chances of a simple, small 304
response How could it not be managed properly?!
12Apache ETags
- "19f1e-7920-4525b037f0440"
- "inode-size-timestamp"
- inode used by filesystems to store file type,
owner, group, permissions, etc. - inode for the same file differs across servers
even if file size, timestamp, and directory is
the same - http//stevesouders.com/images/arrow-right-9x13.pn
g - ETag "21f5315-d4-5d51f0c0"
- http//1.cuzillion.com/images/arrow-right-9x13.png
- ETag "1ee57ec-d4-5d51f0c0"
13IIS ETags
- "b4f35327edac51113f"
- "timestampchangenumber"
- changenumber counter to track IIS configuration
changes - changenumber rarely the same across servers
- http//hp.msn.com/global/c/hpv10/favicon.ico
- ETag "b4f35327edac51113f"
- ETag "b4f35327edac51e6e"
14example ETag miss
- GET /global/c/hpv10/favicon.ico HTTP/1.1
- Host hp.msn.com
- If-Modified-Since Wed, 26 Oct 2005 223958 GMT
- If-None-Match "b4f35327edac5119bc"
- HTTP/1.x 200 OK
- Content-Length 1406
- Etag "b4f35327edac51d76"
- Last-Modified Wed, 26 Oct 2005 223958 GMT
- Expires Wed, 06 Feb 2008 011016 GMT
Last-Modified matches (but IMS misses)
timestamp is the same
changenumber differs, validations misses, entire
body is resent
validation miss
15the problem with ETags
- the default ETag syntax in Apache and IIS makes
it unlikely that INM will match across servers,
even when the resource is the same - probability of an incorrect INM miss
- (n-1)/n where "n" is the number of servers
- not an issue if you just have one server
- http//www.apacheweek.com/issues/02-01-18
- "can cause an unnecessary performance hit as
resources are fetched more often than is
required" - http//support.microsoft.com/kb/922703
- "IIS 6.0 sends a 200 response because it
considers the different change numbers to mean
that the resources are not the same versions"
16the solution for ETags
- if you're not leveraging ETags, turn them off
- reduces size of requests and responses
- reduces outbound traffic from your servers
- increases proxy cache hit rate
- Apache
- FileETag none
- IIS
- synchronize changenumber across servers
- http//support.microsoft.com/kb/922703/
17ETags in the wild
server ETags? default syntax?
www.aol.com AOLserver no
www.ebay.com IIS yes yes
www.facebook.com Apache no
www.google.com/search gws no
search.live.com/results ASP.NET yes no
www.msn.com IIS no
www.myspace.com Apache some no
en.wikipedia.org/wiki Apache lighthttpd some yes no ?
www.yahoo.com YTS no
www.youtube.com btfe no
18possible uses for ETags
19Homework
- 11/7 1159pm rules 4-10 applied to your
"Improving a Top Site" class project - 11/12 315pm Web 100 Double Check
- look at your rows in Web 100 spreadsheet
- double-check your entries for any rows in red
- update incorrect entries
- enter "y" in "Double Checked" column
- read HPWS Chapter 14
20Questions
- Why were ETags introduced in HTTP/1.1?
- What do "IMS" and "INM" stand for?
- How do IMS and INM interplay during resource
validation? - What's the default syntax for ETags in Apache and
IIS? - What component in each default syntax hurts
performance, and why? - What are three performance gains you can achieve
by turning off ETags?