Title: URLs, InetAddresses, and URLConnections
1URLs, InetAddresses, and URLConnections
- High Level Network Programming
- Elliotte Rusty Harold
- elharo_at_sunsite.unc.edu
- http//sunsite.unc.edu/javafaq/URLS.PPT
2We will learn how Java handles
- Internet Addresses
- URLs
- CGI
- URLConnection
- Content and Protocol handlers
3I assume you
- Understand basic Java syntax and I/O
- Have a users view of the Internet
- No prior network programming experience
4Applet Network Security Restrictions
- Applets may
- send data to the code base
- receive data from the code base
- Applets may not
- send data to hosts other than the code base
- receive data from hosts other than the code base
5Some Background
- Hosts
- Internet Addresses
- Ports
- Protocols
6Hosts
- Devices connected to the Internet are called
hosts - Most hosts are computers, but hosts also include
routers, printers, fax machines, soda machines,
bat houses, etc.
7Internet addresses
- Every host on the Internet is identified by a
unique, four-byte Internet Protocol (IP) address.
- This is written in dotted quad format like
199.1.32.90 where each byte is an unsigned
integer between 0 and 255. - There are about four billion unique IP addresses,
but they arent very efficiently allocated
8Domain Name System (DNS)
- Numeric addresses are mapped to names like
"www.blackstar.com" or "star.blackstar.com" by
DNS. - Each site runs domain name server software that
translates names to IP addresses and vice versa - DNS is a distributed system
9The InetAddress Class
- The java.net.InetAddress class represents an IP
address. - It converts numeric addresses to host names and
host names to numeric addresses. - It is used by other network classes like Socket
and ServerSocket to identify hosts
10Creating InetAddresses
- There are no public InetAddress() constructors.
Arbitrary addresses may not be created. - All addresses that are created must be checked
with DNS
11The getByName() factory method
- public static InetAddress getByName(String host)
throws UnknownHostException - InetAddress utopia, duke
- try
- utopia InetAddress.getByName("utopia.poly.edu"
) - duke InetAddress.getByName("128.238.2.92")
-
- catch (UnknownHostException e)
- System.err.println(e)
-
12Other ways to create InetAddress objects
- public static InetAddress getAllByName(String
host) throws UnknownHostException - public static InetAddress getLocalHost() throws
UnknownHostException
13Getter Methods
- public boolean isMulticastAddress()
- public String getHostName()
- public byte getAddress()
- public String getHostAddress()
14Utility Methods
- public int hashCode()
- public boolean equals(Object o)
- public String toString()
15Ports
- In general a host has only one Internet address
- This address is subdivided into 65,536 ports
- Ports are logical abstractions that allow one
host to communicate simultaneously with many
other hosts - Many services run on well-known ports. For
example, http tends to run on port 80
16Protocols
- A protocol defines how two hosts talk to each
other. - The daytime protocol, RFC 867, specifies an ASCII
representation for the time that's legible to
humans. - The time protocol, RFC 868, specifies a binary
representation, for the time that's legible to
computers. - There are thousands of protocols, standard and
non-standard
17IETF RFCs
- Requests For Comment
- Document how much of the Internet works
- Various status levels from obsolete to required
to informational - TCP/IP, telnet, SMTP, MIME, HTTP, and more
- http//ds.internic.net/rfc/
18W3C Standards
- IETF is based on rough consensus and running
code - W3C tries to run ahead of implementation
- IETF is an informal organization open to
participation by anyone - W3C is a vendor consortium open only to companies
19URLs
- A URL, short for "Uniform Resource Locator", is a
way to unambiguously identify the location of a
resource on the Internet.
20Example URLs
- http//www.javasoft.com/
- file///Macintosh20HD/Java/Docs/JDK201.1.120doc
s/api/java.net.InetAddress.html_top_ - http//www.macintouch.com80/newsrecent.shtml
- ftp//ftp.info.apple.com/pub/
- mailtoelharo_at_sunsite.unc.edu
- telnet//utopia.poly.edu
21The Pieces of a URL
- Most URLs can be broken into about five pieces,
not all of which are necessarily present in any
given URL. These are - the protocol
- the host
- the port
- the file
- the ref, section, or anchor
22The java.net.URL class
- A URL object represents a URL.
- The URL class contains methods to
- create new URLs
- parse the different parts of a URL
- get an input stream from a URL so you can read
data from a server - get content from the server as a Java object
23Content and Protocol Handlers
- Content and protocol handlers separate the data
being downloaded from the the protocol used to
download it. - The protocol handler negotiates with the server
and parses any headers. It gives the content
handler only the actual data of the requested
resource. - The content handler translates those bytes into a
Java object like an InputStream or ImageProducer.
24Finding Protocol Handlers
- When you construct a URL object, the virtual
machine looks for a protocol handler that
understands the protocol part of the URL such as
"http" or "mailto". - If no such handler is found, the constructor
throws a MalformedURLException.
25Supported Protocols
- The exact protocols that Java supports vary from
implementation to implementation though http and
file are supported pretty much everywhere. Sun's
JDK 1.1 understands ten - file
- ftp
- gopher
- http
- mailto
- appletresource
- doc
- netdoc
- systemresource
- verbatim
26URL Constructors
- There are four constructors in the java.net.URL
class. All can throw MalformedURLExceptions. - public URL(String u) throws MalformedURLException
- public URL(String protocol, String host, String
file) throws MalformedURLException - public URL(String protocol, String host, int
port, String file) throws MalformedURLException - public URL(URL context, String u) throws
MalformedURLException
27Constructing URL Objects
- Construct a URL object for a complete, absolute
URL like http//www.poly.edu/fall97/grad.htmlcs
like this - try
- URL u new
- URL("http//www.poly.edu/fall97/grad.htmlcs )
-
- catch (MalformedURLException e)
28Constructing URL Objects in Pieces
- You can also construct the URL by passing its
pieces to the constructor, like this - URL u null
- try
- u new URL("http", "www.poly.edu",
"/schedule/fall97/bgrad.htmlcs") -
- catch (MalformedURLException e)
29Including the Port
- URL u null
- try
- u new URL("http", "www.poly.edu", 8000,
"/fall97/grad.htmlcs") -
- catch (MalformedURLException e)
30Relative URLs
- Many HTML files contain relative URLs.
- Consider the page http//sunsite.unc.edu/javafaq/i
ndex.html - On this page a link to books.html" refers to
http//sunsite.unc.edu/javafaq/books.html.
31Constructing Relative URLs
- The fourth constructor creates URLs relative to a
given URL. For example, - try
- URL u1 new URL("http//sunsite.unc.edu/index.h
tml") - URL u2 new URL(u1, books.html")
-
- catch (MalformedURLException e)
- This is particularly useful when parsing HTML.
32Parsing URLs
- The java.net.URL class has five methods to spilt
a URL into its component parts. These are - public String getProtocol()
- public String getHost()
- public int getPort()
- public String getFile()
- public String getRef()
33For example,
- try
- URL u new URL("http//www.poly.edu/fall97/grad
.htmlcs ") - System.out.println("The protocol is "
u.getProtocol()) - System.out.println("The host is "
u.getHost()) - System.out.println("The port is "
u.getPort()) - System.out.println("The file is "
u.getFile()) - System.out.println("The anchor is "
u.getRef()) -
- catch (MalformedURLException e)
34Missing Pieces
- If a port is not explicitly specified in the URL
it's set to -1. This means the default port is to
be used. - If the ref doesn't exist, it's just null, so
watch out for NullPointerExceptions. Better yet,
test to see that it's non-null before using it. - If the file is left off completely, e.g.
http//www.javasoft.com, then it's set to "/".
35Reading Data from a URL
- The openStream() method connects to the server
specified in the URL and returns an InputStream
object fed by the data from that connection. - public final InputStream openStream() throws
IOException - Any headers that precede the actual data are
stripped off before the stream is opened. - Network connections are less reliable and slower
than files. Buffer with a BufferedInputStream or
a BufferedReader.
36import java.net. import java.io. public
class Webcat public static void main(String
args) for (int i 0 i lt args.length
i) try URL u new
URL(argsi) InputStream in
u.openStream() InputStreamReader isr
new InputStreamReader(in) BufferedReader
br new BufferedReader(isr) String
theLine while ((theLine br.readLine())
! null) System.out.println(theLine)
catch (MalformedURLExcepti
on e) System.err.println(e) catch
(IOException e) System.err.println(e)
37CGI
- Common Gateway Interface
- A lot is written about writing server side CGI.
Im going to show you client side CGI. - Well need to explore HTTP a little deeper to do
this
38Normal web surfing uses these two steps
- The browser request a page
- The server sends the page
- Data flows primarily from the server to the
client.
39Forms
- There are times when the server needs to get data
from the client rather than the other way around.
The common way to do this is with a form like
this one
40CGI
- The user types the requested data into the form
and hits the submit button. - The client browser then sends the data to the
server using the Common Gateway Interface, CGI
for short. - CGI uses the HTTP protocol to transmit the data,
either as part of the query string or as separate
data following the MIME header.
41GET and POST
- When the data is sent as a query string included
with the file request, this is called CGI GET. - When the data is sent as data attached to the
request following the MIME header, this is called
CGI POST
42HTTP
- Web browsers communicate with web servers through
a standard protocol known as HTTP, an acronym for
HyperText Transfer Protocol. - This protocol defines
- how a browser requests a file from a web server
- how a browser sends additional data along with
the request (e.g. the data formats it can
accept), - how the server sends data back to the client
- response codes
43A Typical HTTP Connection
- Client opens a socket to port 80 on the server.
- Client sends a GET request including the name and
path of the file it wants and the version of the
HTTP protocol it supports. - The client sends a MIME header.
- The client sends a blank line.
- The server sends a MIME header
- The server sends the data in the file.
- The server closes the connection.
44MIME
- MIME is an acronym for "Multipurpose Internet
Mail Extensions". - an Internet standard defined in RFCs 2045 through
2049 - originally intended for use with email messages,
but has been been adopted for use in HTTP.
45Browser Request MIME Header
- When the browser sends a request to a web server,
it also sends a MIME header. MIME headers contain
name-value pairs, essentially a name followed by
a colon and a space, followed by a value. - Connection Keep-Alive
- User-Agent Mozilla/3.01 (Macintosh I PPC)
- Host www.digitalthink.com80
- Accept image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, /
46Server Response MIME Header
- When a web server responds to a web browser it
sends a MIME header along with the response that
looks something like this - Server Netscape-Enterprise/2.01
- Date Sat, 02 Aug 1997 075246 GMT
- Accept-ranges bytes
- Last-modified Tue, 29 Jul 1997 150646 GMT
- Content-length 2810
- Content-type text/html
47Query Strings
- CGI GET data is sent in URL encoded query strings
- a query string is a set of namevalue pairs
separated by ampersands - AuthorSadie, JulieTitleWomen Composers
- separated from rest of URL by a question mark
48URL Encoding
- Alphanumeric ASCII characters (a-z, A-Z, and 0-9)
and the -_.!'(), punctuation symbols are left
unchanged. - The space character is converted into a plus sign
(). - Other characters (e.g. , , , , , , , and
so on) are translated into a percent sign
followed by the two hexadecimal digits
corresponding to their numeric value.
49For example,
- The comma is ASCII character 44 (decimal) or 2C
(hex). Therefore if the comma appears as part of
a URL it is encoded as 2C. - The query string "AuthorSadie, JulieTitleWomen
Composers" is encoded as - AuthorSadie2CJulieTitleWomenComposers
50The URLEncoder class
- The java.net.URLEncoder class contains a single
static method which encodes strings in
x-www-form-url-encoded format - URLEncoder.encode(String s)
51For example,
- String qs "AuthorSadie, JulieTitleWomen
Composers" - String eqs URLEncoder.encode(qs)
- System.out.println(eqs)
- This prints
- Author3dSadie2cJulie26Title3dWomenComposers
52- String eqs "Author" URLEncoder.encode("Sadie,
Julie") - eqs ""
- eqs "Title"
- eqs URLEncoder.encode("Women Composers")
- This prints the properly encoded query string
- AuthorSadie2cJulieTitleWomenComposers
53GET URLs
- String eqs "Author" URLEncoder.encode("Sadie,
Julie") - eqs ""
- eqs "Title"
- eqs URLEncoder.encode("Women Composers")
- try
- URL u new URL("http//www.superbooks.com/sea
rch.cgi?" eqs) - InputStream in u.openStream()
- //...
-
- catch (IOException e) //...
54URLConnections
- The java.net.URLConnection class is an abstract
class that handles communication with different
kinds of servers like ftp servers and web
servers. - Protocol specific subclasses of URLConnection
handle different kinds of servers. - By default, connections to HTTP URLs use the GET
method.
55URLConnections vs. URLs
- Can send output as well as read input
- Can post data to CGIs
- Can read headers from a connection
56URLConnection five steps
- 1. The URL is constructed.
- 2. The URLs openConnection() method creates the
URLConnection object. - 3. The parameters for the connection and the
request properties that the client sends to the
server are set up. - 4. The connect() method makes the connection to
the server. - 5. The response header information is read using
getHeaderField().
57I/O Across a URLConnection
- Data may be read from the connection in one of
two ways - raw by using the input stream returned by
getInputStream() - through a content handler with getContent().
- Data can be sent to the server using the output
stream provided by getOutputStream().
58For example,
- try
- URL u new URL("http//www.sd98.com/")
- URLConnection uc u.openConnection()
- uc.connect()
- InputStream in uc.getInputStream()
- // read the data...
-
- catch (IOException e) //...
59Reading Header Data
- The getHeaderField(String name) method returns
the string value of a named header field. - Names are case-insensitive.
- If the requested field is not present, null is
returned. - String lm uc.getHeaderField("Last-modified")
60getHeaderFieldKey()
- The keys of the header fields are returned by the
getHeaderFieldKey(int n) method. - The first field is 1.
- If a numbered key is not found, null is returned.
- You can use this in combination with
getHeaderField() to loop through the complete
header
61For example
- String key null
- for (int i1 (key uc.getHeaderFieldKey(i))!nul
l) i) - System.out.println(key " "
uc.getHeaderField(key))
62getHeaderFieldInt() and getHeaderFieldDate()
- These are utility methods that read a named
header and convert its value into an int and a
long respectively. - public int getHeaderFieldInt(String name, int
default) - public long getHeaderFieldDate(String name, long
default)
63- The long returned by getHeaderFieldDate() can be
converted into a Date object using a Date()
constructor like this - String s uc.getHeaderFieldDate("Last-modified",
0) - Date lm new Date(s)
64Six Convenience Methods
- These return the values of six particularly
common header fields - public int getContentLength()
- public String getContentType()
- public String getContentEncoding()
- public long getExpiration()
- public long getDate()
- public long getLastModified()
65- try
- URL u new URL(http//www.sd98.com/)
- URLConnection uc u.openConnection()
- uc.connect()
- String keynull
- for (int n 1 (key uc.getHeaderFieldKey(n
)) ! null n) - System.out.println(key " "
uc.getHeaderField(key)) -
-
- catch (IOException e)
- System.err.println(e)
66Writing data to a URLConnection
- Similar to reading data from a URLConnection.
- First inform the URLConnection that you plan to
use it for output - Before getting the connection's input stream, get
the connection's output stream and write to it. - Commonly used to talk to CGIs that use the POST
method
67Nine Steps
- Construct the URL.
- Call the URLs openConnection() method to create
the URLConnection object. - Pass true to the URLConnections setDoOutput()
method - Invoke setDoInput(true) to indicate that this
URLConnection will also be used for input. - Create the data you want to send, preferably as a
byte array.
68- Call getOutputStream() to get an output stream
object. - Write the byte array calculated in step 5 onto
the stream. - Close the output stream.
- Call getInputStream() to get an input stream
object. Read and write it as usual.
69POST CGIs
- A typical POST request to a CGI looks like this
- POST /cgi-bin/booksearch.pl HTTP/1.0
- Referer http//www.macfaq.com/sampleform.html
- User-Agent Mozilla/3.01 (Macintosh I PPC)
- Content-length 60
- Content-type text/x-www-form-urlencoded
- Host utopia.poly.edu56435
- usernameSadie2CJulierealnameWomenComposers
70A POST request includes
- the POST line
- a MIME header which must include
- content type
- content length
- a blank line that signals the end of the MIME
header - the actual data of the form, encoded in
x-www-form-urlencoded format.
71- A URLConnection for an http URL will set up the
request line and the MIME header for you as long
as you set its doOutput field to true by invoking
setDoOutput(true). - If you also want to read from the connection, you
should set doInput to true with setDoInput(true)
too.
72For example,
- URLConnection uc u.openConnection()
- uc.setDoOutput(true)
- uc.setDoInput(true)
73- The request line and MIME header are sent as
soon as the URLConnection connects. Then use
getOutputStream() to get an output stream on
which you'll write the x-www-form-urlencoded
name-value pairs.
74HttpURLConnection
- java.net.HttpURLConnection is an abstract
subclass of URLConnection that provides some
additional methods specific to the HTTP protocol.
- URL connection objects that are returned by an
http URL will be instances of java.net.HttpURLConn
ection.
75Recall
- a typical HTTP response from a web server begins
like this - HTTP/1.0 200 OK
- Server Netscape-Enterprise/2.01
- Date Sat, 02 Aug 1997 075246 GMT
- Accept-ranges bytes
- Last-modified Tue, 29 Jul 1997 150646 GMT
- Content-length 2810
- Content-type text/html
76Response Codes
- The getHeaderField() and getHeaderFieldKey()
don't return the HTTP response code - After you've connected, you can retrieve the
numeric response code--200 in the above
example--with the getResponseCode() method and
the message associated with it--OK in the above
example--with the getResponseMessage() method.
77- Java 1.0 only supported GET and POST requests to
HTTP servers, but Java 1.1 allows the much
broader range of requests specified in the
HTTP/1.1 specification including GET, POST, HEAD,
OPTIONS, PUT, DELETE, and TRACE. - These are set with the void setRequestMethod(Strin
g method) method. - This method throws a java.net.ProtocolException,
a subclass of IOException, if an unknown protocol
is specified.
78getRequestMethod()
- The getRequestMethod() method returns the string
form of the request method currently set for the
URLConnection. GET is the default method.
79disconnect()
- The void disconnect() method of the
HttpURLConnection class allows you to close the
connection to the web server. - Needed for HTTP/1.1 Keep-alive
80For example,
- try
- URL u new URL("http//www.amnesty.org/")
- HttpURLConnection huc (HttpURLConnection)
u.openConnection() - huc.setRequestMethod("PUT")
- OutputStream os huc.getOutputStream()
- int code huc.getResponseCode()
- if (code gt 200 lt 300)
- // put the data...
-
- huc.disconnect()
-
- catch (IOException e) //...
81usingProxy
- The boolean usingProxy() method returns true if
web connections are being funneled through a
proxy server, false if they're not.
82- The HttpURLConnection class also has two static
methods that affect how all URLConnection objects
interact with web servers. With a true argument,
the HttpURLConnection.setFollowRedirects(boolean
followRedirects) method says that connections
will follow redirect instructions from the web
server. Untrusted applets are not allowed to set
this. The boolean method HttpURLConnection.getFoll
owRedirects() returns true if redirect requests
are honored, false if they're not.
83Redirect Instructions
- Most web servers can be configured to
automatically redirect browsers to the new
location of a page that's moved. - To redirect browsers, a server sends a 300 level
response and a Location header that specifies the
new location of the requested page.
84- GET /elharo/macfaq/index.html HTTP/1.0
- HTTP/1.1 302 Moved Temporarily
- Date Mon, 04 Aug 1997 142127 GMT
- Server Apache/1.2b7
- Location http//www.macfaq.com/macfaq/index.html
- Connection close
- Content-type text/html
- ltHTMLgtltHEADgt
- ltTITLEgt302 Moved Temporarilylt/TITLEgt
- lt/HEADgtltBODYgt
- ltH1gtMoved Temporarilylt/H1gt
- The document has moved ltA HREF"http//www.macfaq.
com/macfaq/index.html"gtherelt/Agt.ltPgt - lt/BODYgtlt/HTMLgt
85- HTML is returned for browsers that don't
understand redirects, but most modern browsers do
not display this and jump straight to the page
specified in the Location header instead. - Because redirects can change the site which a
user is connecting without their knowledge so
redirects are not arbitrarily followed by
URLConnections.
86To Learn More
- Java Network Programming
- OReilly Associates, 1997
- ISBN 1-56592-227-1
- Web Client Programming with Java
- http//www.digitalthink.com/catalog/cs/cs308/index
.html