Title: HTTP%20
1HTTP HyperText Transfer Protocol
2Common Protocols
- In order for two remote machines to understand
each other they should - speak the same language
- coordinate their talk
- The solution is to use protocols
- Examples
- FTP File Transfer Protocol
- SMTP Simple Mail Transfer Protocol
- NNTP Network News Transfer Protocol
- HTTP HyperText Transfer Protocol
3Why HTTP was Needed?
- According to Tim Berners-Lee (1991), a protocol
was needed with the following features - A subset of the file transfer protocol
- The ability to request an index search
- Automatic format negotiation
- The ability to refer the client to another server
4Proxy Server
Web Server
www.cs.huji.ac.il80
File System
5Department Proxy Server
University Proxy Server
Israel Proxy Server
Web Server
www.w3.org80
6Terminology
- User agent client which initiates a request
(browser, editor, Web robot, ) - Origin server the server on which a given
resource resides (Web server a.k.a. HTTP server) - Proxy acts as both a server and a client
- Gateway server which acts as intermediary for
other servers - Tunnel acts as a blind relay between two
applications we can implement a custom protocol
using HTTP tunneling
7Resources
- A resource is a chunk of information that can be
identified by a URL (Universal Resource Locator) - A resource can be
- A file
- A dynamically created page
- What we see on the browser can be a combination
of some resources
8Universal Resource Locator
protocol//hostport/pathanchor?parameters
http//www.cs.huji.ac.il/dbi/index.htmlinfo
http//www.google.com/search?hlenqblabla
- There are other types of URLs
- mailtoltaccount_at_sitegt
- newsltnewsgroup-namegt
9In a URL
- Spaces are represented by
- Characters such as ,, are encoded in the form
xx where xx is the ascii value in hexadecimal
For example, 26 - The inputs to the parameters are given as a list
of pairs of a parameter and a value - var1value1var2value2var3value3
10warpeace Tolstoy
11http//www.google.com/search?hlenqwar26peaceT
olstoy
12An HTTP Session
- A basic HTTP session has four phases
- Client opens the connection (a TCP connection)
- Client makes a request
- Server sends a response
- Server closes the connection
13Nesting in Page
Index.html
What we see on the browser can be a combination
of several resources
14Nested Objects
- Suppose a client accesses a page containing 10
inline images, how many sessions will be required
to display the page completely? - The answer is 11 HTTP sessions why?
- Some browsers/servers support a feature called
keep-alive which can keep the connection open
until it is explicitly closed - How can this help?
15Stateless Protocol
- HTTP is a stateless protocol, which means that
once a server has delivered the requested data to
a client, the server retains no memory of what
has just taken place (even if the connection is
keep-alive) - What are the difficulties in working with a
stateless protocol? - How would you implement a site for buying some
items? - So why dont we have states in HTTP?
16The Format of HTTPRequests and Responses
- An initial line
- Zero or more header lines
- A blank line (i.e., a CRLF by itself), and
- An optional message body (e.g., a file, query
data, or query output) - Note CRLF \r\n
- (usually ASCII 13 followed by ASCII 10)
17Headers
How do we know who is the host when there is
no host header?
- HTTP 1.0 defines 16 headers
- None are required
- HTTP 1.1 defines 46 headers
- One header (Host) is required in requests that
are sent to Web servers - A request that is sent to a proxy does not have
to include any header - A response does not have to include any header
18HTTP Requests
19The Format of a Request
method
sp
URL
sp
version
header
value
header
value
Entity Body
20Request Example
- GET /index.html HTTP/1.1 CRLF
- Accept image/gif, image/jpeg CRLF
- User-Agent Mozilla/4.0 CRLF
- Host www.cs.huji.ac.il80 CRLF
- Connection Keep-Alive CRLF
- CRLF
21Request Example
- GET /index.html HTTP/1.1
- Accept image/gif, image/jpeg
- User-Agent Mozilla/4.0
- Host www.cs.huji.ac.il80
- Connection Keep-Alive
- blank line here
22Request Methods
23Common Request Methods
- GET returns the contents of the indicated
document - HEAD returns the header information for the
indicated document - Useful for finding out info about a resource
without retrieving it - POST treats the document as an application and
sends some data to it
24More Request Methods
- PUT replaces the content of the document with
some data - DELETE deletes the indicated document
- TRACE invokes a remote loop-back of the request.
The final recipient SHOULD reflect the message
back to the client - Usually these methods are not allowed
25GET Request
- A request to get a resource from the Web
- The most frequently used method
- The request has no message body, but parameters
can be sent in the request URL (i.e., the URL
without the host part)
26HEAD Request
- A HEAD request asks the server to return the
response headers only, and not the actual
resource (i.e., no message body) - This is useful for checking characteristics of a
resource without actually downloading it, thus
saving bandwidth - Used for testing hypertext links for validity,
accessibility and recent modification
27Post Request
- POST request can send data to the server
- POST is mostly used in form-filling
- The data filled into the form are translated by
the browser into some special format and sent to
a program on the server using the POST command
28Post Request (cont.)
- There is a block of data sent with the request,
in the message body - There are usually extra headers to describe this
message body, like Content-Type and
Content-Length - The request URL is a URL of a program to handle
the sent data, not a file - The HTTP response is normally the output of a
program, not a static file
29Post Example
- Here's a typical form submission, using POST
- POST /path/register.cgi HTTP/1.0
- From frog_at_cs.huji.ac.il
- User-Agent HTTPTool/1.0
- Content-Type application/x-www-form-urlencoded
- Content-Length 35
- homeRoss109favoriteflavorflies
30Request Headers
31HTTP 1.1 Request Headers
- The common request headers of HTTP 1.1 are
described in the following slides - Accept
- Accept-Encoding
- Authorization
- Connection
- Cookie
- Host
- If-Modified-Since
- Referer
- User-Agent
32Accept Request Headers
- Accept
- Specifies the MIME types that the client can
handle (e.g., text/html, image/gif) - Server can send different content to different
clients - Accept-Encoding
- Indicates encodings (e.g., gzip) client can handle
33More Accept Request Headers
- Accept-Charset
- Accept-Language
34Authorization Request Header
- Authorization
- User identification for password-protected pages
- Instead of HTTP authorization, use HTML forms to
send username/password and store in state (e.g.,
session object )
35Connection Request Header
- Connection
- Connection keep-alive means that the browser can
handle persistent connection - Keep-alive is the default in HTTP 1.1
- In a persistent connection, the server can reuse
the same socket over again for requests that are
very close together from the same client - Connection close means that the connection is
closed after each request
36Content-Length Request Header
- This header is only applicable to POST requests
- It specifies the size of the POST data in bytes
37Cookie Request Header
- Gives cookies previously sent to the client
- Not in the HTTP 1.1 specification, but is widely
supported (originally, a Netscape extension)
38Host Request Header
- Indicates host and port as given in the original
URL - Required in HTTP 1.1
- Needed due to request forwarding and machines
that have multiple hostnames
39If-Modified-Since Request Header
- This header indicates that client wants the page
only if it has been changed after the specified
data - If-Unmodified-Since is the reverse of
If-Modified-Since - It is used for PUT requests (update this
document only if nobody else has changed it since
I generated it)
40The Format of the Date inIf-Modified-Sinceand
in If-Unmodified-Since
- Greenwich Mean Time should be used and the format
is - Last-Modified Fri, 31 Dec 1999 235959 GMT
41Referer Request Header
- URL of referring Web page
- Useful for tracking traffic
- It is logged by many servers
- Can be easily spoofed
- Note the spelling error correct spelling is
Referrer, but use Referer
42User-Agent Request Header
- The value of this header is a string identifying
the browser making the request - Use sparingly
- Again, can be easily spoofed