Title: CS352 Application Level Protocols
1CS352- Application Level Protocols
2Application-Level Protocols
- HTTP (web)
- FTP (file transfer)
- SMTP (mail)
- DNS (name lookup)
- Not really applications by OSI standards, but
higher than level 4. - Level 5 or 6?
3Themes
- Representation at different levels
- ASCII protocols
- Text-based
- How Messages are structured
- Request/response nature of these protocols
- Name Lookup
- Division of concerns (e.g. zones)
- Name to number mapping
- Reverse map
- Caching
4Application-Level overview
- Layer-4 provides a byte-stream
- Infinite, ordered stream of 8-bit bytes
- HTTP, SMTP, FTP use text messages built on
layer-4 byte streams - simple ASCII protocols
- Messages are a sequence of text-based commands
- Like Java string, but each character is in 7 or
8-bit ASCII, not 16-bit Unicode - Control and data typically separated by a
return (e.g., control/line feed pair of bytes)
5Representation by Level
Host A
Host B
GET index.html
Layer 7
Layer 7
ASCII Text Strings
Layer 6
Layer 6
Layer 5
Layer 5
Layer 4
Layer 4
71,69,84,32,105,110
Byte Stream
Layer 3
Layer 3
71,69,84
32,105,110
Discrete Packets
71,69,84
32,105,110
Discrete Packets
Layer 2
Layer 2
1000111, 1000101,
Layer 1
Layer 1
Bit Sequence
Physical Medium
6 HTTP (Hyper Text Transfer Protocol)
7Overview
- Application Protocol for browsers, web-servers
- Simple ASCII protocol
- Additionally, HTTP has a notion of invoking
methods on a named resources - Resource can be anything named in a Uniform
Resource Locator (URL) - http//remus.rutgers.edu/newaccount.html
- Most often, an HTML file (but doesnt have to
be!) - sometimes its the output of a program
8URL Naming
- What does a URL refer to?
- HTML files?
- PDF documents
- Runnable programs (scripts)
- Java objects methods?
9Path of an HTTP request
Client Server Architecture
10HTTP Protocol Summary
- Client connects to server
- Client sends HTTP message request
- With GET, POST or HEAD methods
- Server sends HTTP message as a response
11HTTP Messages
- initial line
- method or response code version
- zero or more header lines
- Information about message content
- a blank line
- optional message body
- a file, or client input, or server output
12HTTP request message general format
13Common Response codes
- 2XX success codes
- 200 OK
- 3XX redirection codes
- 301 moved
- 4XX client errors
- 404 not found
- 5XX server errors
- 502 service overloaded
14Example Client Message
- GET /newacct.html HTTP/1.0
- From francis_at_rutgers.edu
- User-Agent Mozilla-linux/4.7
- (blank line here)
15Example Server Response
- HTTP/1.0 404 Not Found
- (blank line here)
16Example Client Message
- GET /newaccount.html HTTP/1.0
- From francis_at_rutgers.edu
- User-Agent Mozilla-linux/4.7
- (blank line here)
17Example Server Response
?response code
- HTTP/1.1 200 OK
- Date Sun, 17 Sep 2000 231251 GMT
- Server Apache/1.3.3 (Unix)
- Last-Modified Wed, 30 Aug 2000 021201 GMT
- ETag "1ac6-9c1-39ac6d71"
- Accept-Ranges bytes
- Content-Length 2497
- Connection close
- Content-Type text/html
- lthtmlgt
- ltheadgt
- lttitlegtBuilding new accountslt/titlegt
- lt/headgt
- ltbodygt
- ltcentergt
- ltimg src"images/sample.jpg"gt
-
header
?Blank line separating header/body
body
18MIME Headers
- Responses from servers to complete GET requests
contain MIME information - MIME Multipurpose Internet Mail Extensions
- MIME allows media types other than simple ASCII
text to be encoded into a message - The Content-Type line in the MIME header
indicates what type of data (type/subtype) is
contained in the message - Examples
- Content-Type text/html
- Content-Type Image/GIF
19POST Method
- What a browser submits in when a form is sent to
the server - Stylized way of passing form data
- 2 ways to encode form data
- Fat URL via GET
- for older systems that didnt support POST
- POST method
20POST Requests
- Most commonly used by browsers to send large
form responses to servers - Forms are web pages that contain fields that the
browser user can edit or change
21POST Requests (contd)
POST /index.html HTTP/1.1 languageanymessaget
hisisamessagetotheserverbeingsentbytheb
rowserwithaPOSTrequest
22Encoding form data with POST
- General form is
- variable1value1variable2value2
- Spaces changed to
- Other characters encoded(I.e. escaped) via
23Example Client POST request
- POST /cgi-bin/rats.cgi HTTP/1.0
- Referer http//nes8192/cgi-bin/rats.cgi
- Connection Keep-Alive
- User-Agent Mozilla/4.73 en (X11 U Linux
2.2.12-20 i686) - Host nes8192
- Accept image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, image/png, / - Accept-Encoding gzip
- Accept-Language en
- Accept-Charset iso-8859-1,,utf-8
- Content-type application/x-www-form-urlencoded
- Content-length 93
- Accountcs111fallFirstrichardLastmartinSSN12
3456789Bday01011980.StateCreateAccount
24HTTP in context
Server A.B.C.D80
Client W.X.Y.Z
ss serverSocket(port 80)
cc socket(A.B.C.D, 80)
sc ss.accept
out.print(GET /newaccount.html http/1.0))
Time
read input from socket parse header read
data find resource build response header send
resource write to socket
read header read input display HTML
25Why loading pages seems slow
- Potential problems
- Client is overloaded
- DNS takes a long time
- Network overloaded
- Dropped packets gt TCP windows
- Large pages
- Server is overloaded
- Solutions proxy servers, Flow servers
26Caching Proxies
Clients
GET foo.html
GET foo.html
Web Server
Proxy Server
Store foo.html
27Flow Approach
- Re-write URLs in web pages
- Point URL to nearest server for the data
- HTML from main server
- Images, sound, animations point to closer servers
- Requires knowledge of network topology!
- Used by Akamai
28Flow Approach (cont)
Web Server
GET Image01.gif
GET Index.html
Client
29HTTP 1.0
- Simple protocol
- Client issues 1 operation per TCP connection
- Connnect() Get index.html close()
- Connect() Get image01.html close ()
- How long does it take to retrieve a whole page?
- Concurrency by using multiple connections can
speed this up, but
30HTTP 1.1
- Client keeps connection open to server
- Makes multiple requests per connection
- Get foo.html, get image02.gif .
- Length of time socket stays up?
- of open connections on server?
- 1.0 allows server to close connections faster
- Not clear if 1.1 is better from the servers
perspective
31Web Server Scripting
- A URL may refer to a static web page or a
server-side script - Script is just a program that is run in response
to a HTTP request - Server-side scripts produce web page content as
output - This is what a dynamic web page is
- Standard argument passing convention between the
web server and the program Common Gateway
Interface (CGI) - CGI scripts may be written in any language (Perl
Python, sh, csh, Java.) - CGI scripts are commonly used to produce
responses to Web page form input from client
browsers
32Client Side Embedded Web Page Scripts and
Programs
- Web pages may also contain scripts or programs
within the HTML code to be run on the client - Unlike server scripts, web page scripts and
programs run on the browser machines processor,
not on the servers processor - Examples
- Javascript
- VBScript
- Java applets
- Example non-trivial program http//www.whereismyb
us.com/ - Takes Rutgers campus bus positions as input
- Client side plots different routes on a map
33HTML (Hyper Text Markup Language)
- The text is surrounded by tags which describe the
formatting and layout of the text on the browser
window - Allows for data input also using FORMS
- Documentations/Tutorials
- http//www.jmarshall.com/easy/html/
- http//www.jmarshall.com/easy/cgi
- View source code of any page you visit in the
browser
34SMTP (Simple Mail Transfer Protocol)
35Email
- Email is transferred from one host to another
using the - Simple Mail Transfer Protocol (SMTP)
- Like HTTP, SMTP has a similar ASCII command and
reply set to transfer messages between machines - Think of a set of request strings and reply
strings sent over the network - SMTP transfers occur between
- sending host and dedicated email server
- dedicated email servers
- They do not occur between receiving hosts and
email servers - These are POP or IMAP protocols
36SMTP Protocol
220 hill.com SMTP service ready HELO
town.com 250 hill.com Hello town.com, pleased
to meet you MAIL FROM ltjack_at_town.comgt 250
ltjack_at_town.comgt Sender ok RCPT TO
ltjill_at_hill.comgt 250 ltjill_at_hill.comgt Recipient
ok DATA 354 Enter mail, end with . on a line
by itself From jack_at_town.com To
jill_at_hill.com Subject Please fetch me a pail of
water Jill, Im not feeling up to hiking today.
Will you please fetch me a pail of water? . 250
message accepted QUIT 221 hill.com closing
connection
37SMTP Direct Mode
Direct mode
Sending email from jack_at_town.com to jill_at_hill.com
SMTP Messages
Email Server
town. com
SMTP Responses
for hill.com
town.com first finds IP address for hill.com
email server using DNS request (typeMS) town.com
opens TCP connection on SMTP port 25 and
initiates SMTP protocol to transfer email message
38SMTP Relay Mode
Relay mode
Sending email from jack_at_town.com to jill_at_hill.com
Email Server
Email Server
town. com
for hill.com
for town.com
town.com is configured to send all email messages
through a local email server The local email
server buffers email messages and forwards them
to other email servers
39Retrieving Email from a desktop
- Users retrieve email from their assigned email
server - Email retrieval does NOT use the SMTP protocol
- 3 common protocols for retrieval
- Email server adds received messages to a file
stored on a shared file system (e.g.,
/var/mail/jill) - Email downloaded via the POP3 protocol
- Email accessed via the IMAP protocol
40FTP (File Transfer Protocol )
41FTP
- Download/upload files between a client and server
- One of the first Internet protocols
- More complex than SMTP
- ASCII control connection
- Separate data connection performs presentation
functions - E.g, formats and converts data depending on type
- Sends passwords in plain ASCII text
- Eavesdropper can recover passwords
- Fatal flaw, turned off at a lot of sites
- Replaced with scp, sftp instead
42FTP Client/Server
User
Server Program
Server protocol interpreter
client file system
Server data Transfer function
server file system
43Sample FTP Command Set
- LIST list directory
- GET get a file (download)
- MGET get multiple files
- STOR store (upload) a file
- TYPE set the data transfer type
- USER set the username
- QUIT End the session
44Sample FTP Replies
- 200 Command OK
- 214 Help Message
- 331 Username OK, password required
- 425 Cant open data connection
- 452 Error writing file
- 500 Syntax error (unrecognized command)
- 502 Unimplemented MODE
45Sample FTP Session
- ftp ftp.rutgers.edu
- Connected to kublai.td.Rutgers.EDU.
- 220 ftp.rutgers.edu FTP server (Version
wu-2.6.2(9) Thu Feb 7 133116 EST 2002) - ready.
- Name (ftp.rutgers.edurmartin) anonymous
- 331 Guest login ok, send your complete e-mail
address as password. - Password
- 230 Guest login ok, access restrictions apply.
- Remote system type is UNIX.
- ftpgt cd /pub/redhat/linux/9/en/os/i386/images
- ftpgt get bootdisk.img
- local bootdisk.img remote bootdisk.img
- 227 Entering Passive Mode (165,230,246,3,149,67)
- 150 Opening BINARY mode data connection for
bootdisk.img (1474560 bytes). - 226 Transfer complete.
- 1474560 bytes received in 0001 (767.79 KB/s)
- ftpgt quit
46Domain Name System (DNS)
47 Domain Name System (DNS)
- Problem statement
- Average brain can easily remember 7 digits
- On average, IP addresses have 12 digits
- We need an easier way to remember IP addresses
- Solution
- Use alphanumeric names to refer to hosts
- Add a distributed, hierarchical protocol (called
DNS) to map between alphanumeric host names and
binary IP addresses - We call this Address Resolution
48Domain Name Hierarchy
...
...
com
edu
net
gov
int
mil
org
ae
us
zw
rutgers
yale
yahoo
cnn
Country Domains
cs
eng
Generic Domains
49Domain Name Management
- The domain name hierarchy is divided into zones
- Zone A separate portion of the DNS hierarchy
- No two zones should overlap
- Name servers
- In each zone, there is a primary name server and
one or more secondary name servers - Name servers contain two kinds of address
mappings - Authoritative mappings For hosts within the zone
- Cached mappings For previously requested
mappings to hosts not in the zone
50Domain Name Hierarchy
...
...
com
edu
net
gov
int
mil
org
ae
us
zw
rutgers
yale
yahoo
cnn
cs
eng
51DNS Protocol
- When client wants to know an IP address for a
host name - Client sends a DNS query to the primary name
server in its zone - If name server contains the mapping, it returns
the IP address to the client - Otherwise, the name server forwards the request
to the root name server - The request works its way down the tree toward
the host until it reaches a name server with the
correct mapping
52DNS ProtocolExample
remus.rutgers.edu
Scenario remus.rutgers.edu tries to resolve an
IP address for venus.cs.yale.edu using a
recursive query
1
8
ns-lcsr.rutgers.edu
2
7
a.root-servers.net
3
6
yale.edu
4
5
cs.yale.edu
53DNS ProtocolAnother Example
remus.rutgers.edu
Scenario remus.rutgers.edu tries to resolve an
IP address for venus.cs.yale.edu using an
iterative query
1
2
ns-lcsr.rutgers.edu
3
4
a.root-servers.net
5
6
yale.edu
7
8
cs.yale.edu
54DNS Packets
- Clients communicate with DNS servers using either
TCP or UDP on port 53
0
15 16
31
Transaction Identification
Flags
Number of Questions
Number of Answer RRs
Number of Authoritative RRs
Number of Additional RRs
Questions (variable length)
Answer Resource Records (variable length)
Authoritative Resource Records (variable length)
Additional Resource Records (variable length)
55DNS Packet Fields
- Transaction Identification Random number used
to match client queries with name server
responses - Flags
- QR 0Query, 1Response
- opcode 0standard query, 1inverse query,
2status request - AA Authoritative answer
- TC Truncated DNS packet
- RD Recursion desired
- RA Recursion available
- rcode Return code. 0no error, 3name error
1 4 1
1 1 1 3
4
QR
opcode
AA
TC
RD
RA
(unused)
rcode
56DNS Packet Fields (contd)
- Transaction Identification Random number used
to match client queries with name server
responses - Number of Questions Number of DNS queries in
the packet - Not supported in many DNS servers!
- Number of Answer RRs Number of
non-authoritative DNS responses in the packet - Number of Authoritative RRs Number of
authoritative DNS responses in the packet - Number of Additional RRs Number of other DNS
responses in the packet (usually contains other
DNS servers in domain) - Questions Answers Variable length fields to
store DNS queries and DNS server responses
57DNS Queries
DNS Packet Question field contains a sequence of
queries
Query name (variable length)
Query Type
Query Class
Query Name Contains an encoded form of the name
for which we are seeking an IP address Query
Type 1IP address, 2name server, 12pointer
record, etc. Query Class 1Internet address
58Encoding Query Names
- DNS queries must be encoded in a special way
- Divide host address into segments whenever a
period appears - For each segment, store a byte representing the
length of the segment followed by the letters in
the segment - Store a zero byte at the end of the query
59Encoding Query NamesExample
remus.rutgers.edu
remus rutgers edu
NOTE These count fields are not the ASCII
characters 5, 7, 3 and 0!!!
60DNS Responses
DNS Packet RR fields contain a sequence of
resource records
Domain name (variable length)
Type
Class
Time-to-live
Resource Data (variable
length)
Resource data length
- Domain Name Encoded domain name for query
- Type Class Same as for query (1IP
1Internet) - Time-to-Live How long this responses will be
useful - Resource Data Contains the four-byte IP address
61DNS Caching
- Going to the root server and then down the tree
every time we need to resolve an address is
inefficient - Introduce address caching at name servers
- Store host-to-IP-address mappings from recently
requested host names at name server - When the same address is requested later, use the
cached version at the local name server instead
of recursively querying other name servers again
62DNS CachingExample
remus.rutgers.edu
1
8
First time remus.rutgers.edu tries to resolve
an IP address for venus.cs.yale.edu using a
recursive query
Later venus.cs.yale.edu has been cached at
ns-lcsr. remus.rutgers.edu (and any other host
that uses ns-lcsr) will receive the cached IP
address for venus.cs.yale.edu
ns-lcsr.rutgers.edu
remus.rutgers.edu
2
7
1
2
a.root-servers.net
ns-lcsr.rutgers.edu
3
6
yale.edu
4
5
cs.yale.edu
63Interface to DNS
- The dig and nslookup programs provide an
interface to DNS - dig remus.rutgers.edu
- Server ns-lcsr.rutgers.edu
- Address 128.6.4.4
- Name remus.rutgers.edu
- Address 128.6.13.3
64Bootstrapping DNS
- How does a host contact the name server if all it
has is the name and no IP address? - IP address of at least 1 nameserver must be given
a priori - or with another protocol (DHCP, bootp)
-
- File /etc/resolv.conf in unix
- Start -gt settings-gt control panel-gt network
-gtTCP/IP -gt properties in windows
65Default Domains
- When Host issues a query to DNS server, can add
the default domain. - Default domain added to end of ever DNS query
- E.g. default domain is rutgers.edu
- Machine eden automatically extended to
eden.rutgers.edu
66Reverse DNS
- We have the IP address, but want the name
- Use DNS to perform the lookup function
- Special domain, in-addr.arpa domain for reverse
lookups - Internet address is reversed in the lookup
- E.g. 3.13.6.128.in-addr.arpa remus
- Follows least-gt most specific convention