Title: URLs, InetAddresses, and URLConnections
1URLs, InetAddresses, and URLConnections
- High Level Network Programming
- Elliotte Rusty Harold
- elharo_at_metalab.unc.edu
- http//metalab.unc.edu/javafaq/slides/
2We will learn how Java handles
- Internet Addresses
- URLs
- CGI
- URLConnection
- Content and Protocol handlers
3I assume you
- Understand basic Java syntax and I/O
- Have a users view of the Internet
- No prior network programming experience
4Applet Network Security Restrictions
- Applets may
- send data to the code base
- receive data from the code base
- Applets may not
- send data to hosts other than the code base
- receive data from hosts other than the code base
5Some Background
- Hosts
- Internet Addresses
- Ports
- Protocols
6Hosts
- Devices connected to the Internet are called
hosts - Most hosts are computers, but hosts also include
routers, printers, fax machines, soda machines,
bat houses, etc.
7Internet addresses
- Every host on the Internet is identified by a
unique, four-byte Internet Protocol (IP) address.
- This is written in dotted quad format like
199.1.32.90 where each byte is an unsigned
integer between 0 and 255. - There are about four billion unique IP addresses,
but they arent very efficiently allocated
8Domain Name System (DNS)
- Numeric addresses are mapped to names like
"www.blackstar.com" or "star.blackstar.com" by
DNS. - Each site runs domain name server software that
translates names to IP addresses and vice versa - DNS is a distributed system
9The InetAddress Class
- The java.net.InetAddress class represents an IP
address. - It converts numeric addresses to host names and
host names to numeric addresses. - It is used by other network classes like Socket
and ServerSocket to identify hosts
10Creating InetAddresses
- There are no public InetAddress() constructors.
Arbitrary addresses may not be created. - All addresses that are created must be checked
with DNS
11The getByName() factory method
- public static InetAddress getByName(String host)
throws UnknownHostException - InetAddress utopia, duke
- try
- utopia InetAddress.getByName("utopia.poly.edu"
) - duke InetAddress.getByName("128.238.2.92")
-
- catch (UnknownHostException e)
- System.err.println(e)
-
12Other ways to create InetAddress objects
- public static InetAddress getAllByName(String
host) throws UnknownHostException - public static InetAddress getLocalHost() throws
UnknownHostException
13Getter Methods
- public boolean isMulticastAddress()
- public String getHostName()
- public byte getAddress()
- public String getHostAddress()
14Utility Methods
- public int hashCode()
- public boolean equals(Object o)
- public String toString()
15Ports
- In general a host has only one Internet address
- This address is subdivided into 65,536 ports
- Ports are logical abstractions that allow one
host to communicate simultaneously with many
other hosts - Many services run on well-known ports. For
example, http tends to run on port 80
16Protocols
- A protocol defines how two hosts talk to each
other. - The daytime protocol, RFC 867, specifies an ASCII
representation for the time that's legible to
humans. - The time protocol, RFC 868, specifies a binary
representation, for the time that's legible to
computers. - There are thousands of protocols, standard and
non-standard
17IETF RFCs
- Requests For Comment
- Document how much of the Internet works
- Various status levels from obsolete to required
to informational - TCP/IP, telnet, SMTP, MIME, HTTP, and more
- http//www.faqs.org/rfc/
18W3C Standards
- IETF is based on rough consensus and running
code - W3C tries to run ahead of implementation
- IETF is an informal organization open to
participation by anyone - W3C is a vendor consortium open only to companies
19W3C Standards
- HTTP
- HTML
- XML
- RDF
- MathML
- SMIL
- P3P
20URLs
- A URL, short for "Uniform Resource Locator", is a
way to unambiguously identify the location of a
resource on the Internet.
21Example URLs
- http//java.sun.com/
- file///Macintosh20HD/Java/Docs/JDK201.1.120doc
s/api/java.net.InetAddress.html_top_ - http//www.macintouch.com80/newsrecent.shtml
- ftp//ftp.info.apple.com/pub/
- mailtoelharo_at_metalab.unc.edu
- telnet//utopia.poly.edu
- ftp//mp3mp3_at_138.247.121.6121000/c3a/stuff/mp3/
- http//elharo_at_java.oreilly.com/
- http//metalab.unc.edu/nywc/comps.phtml?categoryC
horalWorks
22The Pieces of a URL
- the protocol, aka scheme
- the authority
- user info
- user name
- password
- host name or address
- port
- the path, aka file
- the ref, aka section or anchor
- the query string
23The java.net.URL class
- A URL object represents a URL.
- The URL class contains methods to
- create new URLs
- parse the different parts of a URL
- get an input stream from a URL so you can read
data from a server - get content from the server as a Java object
24Content and Protocol Handlers
- Content and protocol handlers separate the data
being downloaded from the the protocol used to
download it. - The protocol handler negotiates with the server
and parses any headers. It gives the content
handler only the actual data of the requested
resource. - The content handler translates those bytes into a
Java object like an InputStream or ImageProducer.
25Finding Protocol Handlers
- When the virtual machine creates a URL object, it
looks for a protocol handler that understands the
protocol part of the URL such as "http" or
"mailto". - If no such handler is found, the constructor
throws a MalformedURLException.
26Supported Protocols
- The exact protocols that Java supports vary from
implementation to implementation though http and
file are supported pretty much everywhere. Sun's
JDK 1.1 understands ten - file
- ftp
- gopher
- http
- mailto
- appletresource
- doc
- netdoc
- systemresource
- verbatim
27URL Constructors
- There are four (six in 1.2) constructors in the
java.net.URL class. - public URL(String u) throws MalformedURLException
- public URL(String protocol, String host, String
file) throws MalformedURLException - public URL(String protocol, String host, int
port, String file) throws MalformedURLException - public URL(URL context, String url) throws
MalformedURLException - public URL(String protocol, String host, int
port, String file, URLStreamHandler handler)
throws MalformedURLException - public URL(URL context, String url,
URLStreamHandler handler) throws
MalformedURLException
28Constructing URL Objects
- An absolute URL like http//www.poly.edu/fall97/gr
ad.htmlcs - try
- URL u new
- URL("http//www.poly.edu/fall97/grad.htmlcs")
-
- catch (MalformedURLException e)
29Constructing URL Objects in Pieces
- You can also construct the URL by passing its
pieces to the constructor, like this - URL u null
- try
- u new URL("http", "www.poly.edu",
"/schedule/fall97/bgrad.htmlcs") -
- catch (MalformedURLException e)
30Including the Port
- URL u null
- try
- u new URL("http", "www.poly.edu", 8000,
"/fall97/grad.htmlcs") -
- catch (MalformedURLException e)
31Relative URLs
- Many HTML files contain relative URLs.
- Consider the page http//metalab.unc.edu/javafaq/i
ndex.html - On this page a link to books.html" refers to
http//metalab.unc.edu/javafaq/books.html.
32Constructing Relative URLs
- The fourth constructor creates URLs relative to a
given URL. For example, - try
- URL u1 new URL("http//metalab.unc.edu/index.h
tml") - URL u2 new URL(u1, books.html")
-
- catch (MalformedURLException e)
- This is particularly useful when parsing HTML.
33Parsing URLs
- The java.net.URL class has five methods to split
a URL into its component parts. These are - public String getProtocol()
- public String getHost()
- public int getPort()
- public String getFile()
- public String getRef()
34For example,
- try
- URL u new URL("http//www.poly.edu/fall97/grad
.htmlcs ") - System.out.println("The protocol is "
u.getProtocol()) - System.out.println("The host is "
u.getHost()) - System.out.println("The port is "
u.getPort()) - System.out.println("The file is "
u.getFile()) - System.out.println("The anchor is "
u.getRef()) -
- catch (MalformedURLException e)
35Parsing URLs
- JDK 1.3 adds three more
- public String getAuthority()
- public String getUserInfo()
- public String getQuery()
36Missing Pieces
- If a port is not explicitly specified in the URL
it's set to -1. This means the default port is to
be used. - If the ref doesn't exist, it's just null, so
watch out for NullPointerExceptions. Better yet,
test to see that it's non-null before using it. - If the file is left off completely, e.g.
http//java.sun.com, then it's set to "/".
37Reading Data from a URL
- The openStream() method connects to the server
specified in the URL and returns an InputStream
object fed by the data from that connection. - public final InputStream openStream() throws
IOException - Any headers that precede the actual data are
stripped off before the stream is opened. - Network connections are less reliable and slower
than files. Buffer with a BufferedReader or a
BufferedInputStream.
38Webcat
- import java.net.
- import java.io.
- public class Webcat
- public static void main(String args)
- for (int i 0 i lt args.length i)
- try
- URL u new URL(argsi)
- InputStream in u.openStream()
- InputStreamReader isr new
InputStreamReader(in) - BufferedReader br new BufferedReader(isr)
- String theLine
- while ((theLine br.readLine()) ! null)
- System.out.println(theLine)
-
- catch (IOException e) System.err.println(e
) -
-
39The Bug in readLine()
- What readLine() does
- Sees a carriage return, waits to see if next
character is a line feed before returning - What readLine() should do
- Sees a carriage return, return, throw away next
character if it's a linefeed
40Webcat
- import java.net.
- import java.io.
- public class Webcat
- public static void main(String args)
- for (int i 0 i lt args.length i)
- try
- URL u new URL(argsi)
- InputStream in u.openStream()
- InputStreamReader isr new
InputStreamReader(in) - char c
- while ((c br.read()) ! -1)
- System.out.print(c)
-
- catch (IOException e) System.err.println(e
) -
-
41CGI
- Common Gateway Interface
- A lot is written about writing server side CGI.
Im going to show you client side CGI. - Well need to explore HTTP a little deeper to do
this
42Normal web surfing uses these two steps
- The browser requests a page
- The server sends the page
- Data flows primarily from the server to the
client.
43Forms
- There are times when the server needs to get data
from the client rather than the other way around.
The common way to do this is with a form like
this one
44CGI
- The user types the requested data into the form
and hits the submit button. - The client browser then sends the data to the
server using the Common Gateway Interface, CGI
for short. - CGI uses the HTTP protocol to transmit the data,
either as part of the query string or as separate
data following the MIME header.
45GET and POST
- When the data is sent as a query string included
with the file request, this is called CGI GET. - When the data is sent as data attached to the
request following the MIME header, this is called
CGI POST
46HTTP
- Web browsers communicate with web servers through
a standard protocol known as HTTP, an acronym for
HyperText Transfer Protocol. - This protocol defines
- how a browser requests a file from a web server
- how a browser sends additional data along with
the request (e.g. the data formats it can
accept), - how the server sends data back to the client
- response codes
47A Typical HTTP Connection
- Client opens a socket to port 80 on the server.
- Client sends a GET request including the name and
path of the file it wants and the version of the
HTTP protocol it supports. - The client sends a MIME header.
- The client sends a blank line.
- The server sends a MIME header
- The server sends the data in the file.
- The server closes the connection.
48What the client sends to the server
- GET /javafaq/images/cup.gif
- Connection Keep-Alive
- User-Agent Mozilla/3.01 (Macintosh I PPC)
- Host www.oreilly.com80
- Accept image/gif, image/x-xbitmap, image/jpeg,
/
49MIME
- MIME is an acronym for "Multipurpose Internet
Mail Extensions". - an Internet standard defined in RFCs 2045 through
2049 - originally intended for use with email messages,
but has been been adopted for use in HTTP.
50Browser Request MIME Header
- When the browser sends a request to a web server,
it also sends a MIME header. - MIME headers contain name-value pairs,
essentially a name followed by a colon and a
space, followed by a value. - Connection Keep-Alive
- User-Agent Mozilla/3.01 (Macintosh I PPC)
- Host www.digitalthink.com80
- Accept image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, /
51Server Response MIME Header
- When a web server responds to a web browser it
sends a MIME header along with the response that
looks something like this - Server Netscape-Enterprise/2.01
- Date Sat, 02 Aug 1997 075246 GMT
- Accept-ranges bytes
- Last-modified Tue, 29 Jul 1997 150646 GMT
- Content-length 2810
- Content-type text/html
52Query Strings
- CGI GET data is sent in URL encoded query strings
- a query string is a set of namevalue pairs
separated by ampersands - AuthorSadie, JulieTitleWomen Composers
- separated from rest of URL by a question mark
53URL Encoding
- Alphanumeric ASCII characters (a-z, A-Z, and 0-9)
and the -_.!'(), punctuation symbols are left
unchanged. - The space character is converted into a plus sign
(). - Other characters (e.g. , , , , , , , and
so on) are translated into a percent sign
followed by the two hexadecimal digits
corresponding to their numeric value.
54For example,
- The comma is ASCII character 44 (decimal) or 2C
(hex). Therefore if the comma appears as part of
a URL it is encoded as 2C. - The query string "AuthorSadie, JulieTitleWomen
Composers" is encoded as - AuthorSadie2CJulieTitleWomenComposers
55The URLEncoder class
- The java.net.URLEncoder class contains a single
static method which encodes strings in
x-www-form-url-encoded format - URLEncoder.encode(String s)
56For example,
- String qs "AuthorSadie, JulieTitleWomen
Composers" - String eqs URLEncoder.encode(qs)
- System.out.println(eqs)
- This prints
- Author3dSadie2cJulie26Title3dWomenComposers
57- String eqs "Author" URLEncoder.encode("Sadie,
Julie") - eqs ""
- eqs "Title"
- eqs URLEncoder.encode("Women Composers")
- This prints the properly encoded query string
- AuthorSadie2cJulieTitleWomenComposers
58The URLDecoder class
- In Java 1.2 the java.net.URLDecoder class
contains a single static method which decodes
strings in x-www-form-url-encoded format - URLEncoder.decode(String s)
59GET URLs
- String eqs
- "Author" URLEncoder.encode("Sadie, Julie")
- eqs ""
- eqs "Title"
- eqs URLEncoder.encode("Women Composers")
- try
- URL u new URL("http//www.superbooks.com/sea
rch.cgi?" eqs) - InputStream in u.openStream()
- //...
-
- catch (IOException e) //...
60URLConnections
- The java.net.URLConnection class is an abstract
class that handles communication with different
kinds of servers like ftp servers and web
servers. - Protocol specific subclasses of URLConnection
handle different kinds of servers. - By default, connections to HTTP URLs use the GET
method.
61URLConnections vs. URLs
- Can send output as well as read input
- Can post data to CGIs
- Can read headers from a connection
62URLConnection five steps
- 1. The URL is constructed.
- 2. The URLs openConnection() method creates the
URLConnection object. - 3. The parameters for the connection and the
request properties that the client sends to the
server are set up. - 4. The connect() method makes the connection to
the server. (optional) - 5. The response header information is read using
getHeaderField().
63I/O Across a URLConnection
- Data may be read from the connection in one of
two ways - raw by using the input stream returned by
getInputStream() - through a content handler with getContent().
- Data can be sent to the server using the output
stream provided by getOutputStream().
64For example,
- try
- URL u new URL("http//www.sd99.com/")
- URLConnection uc u.openConnection()
- uc.connect()
- InputStream in uc.getInputStream()
- // read the data...
-
- catch (IOException e) //...
65Reading Header Data
- The getHeaderField(String name) method returns
the string value of a named header field. - Names are case-insensitive.
- If the requested field is not present, null is
returned. - String lm uc.getHeaderField("Last-modified")
66getHeaderFieldKey()
- The keys of the header fields are returned by the
getHeaderFieldKey(int n) method. - The first field is 1.
- If a numbered key is not found, null is returned.
- You can use this in combination with
getHeaderField() to loop through the complete
header
67For example
- String key null
- for (int i1 (key uc.getHeaderFieldKey(i))!nul
l) i) - System.out.println(key " "
uc.getHeaderField(key))
68getHeaderFieldInt() and getHeaderFieldDate()
- These are utility methods that read a named
header and convert its value into an int and a
long respectively. - public int getHeaderFieldInt(String name, int
default) - public long getHeaderFieldDate(String name, long
default)
69- The long returned by getHeaderFieldDate() can be
converted into a Date object using a Date()
constructor like this - String s uc.getHeaderFieldDate("Last-modified",
0) - Date lm new Date(s)
70Six Convenience Methods
- These return the values of six particularly
common header fields - public int getContentLength()
- public String getContentType()
- public String getContentEncoding()
- public long getExpiration()
- public long getDate()
- public long getLastModified()
71- try
- URL u new URL("http//www.sdexpo.com/")
- URLConnection uc u.openConnection()
- uc.connect()
- String keynull
- for (int n 1
- (keyuc.getHeaderFieldKey(n)) ! null
- n)
- System.out.println(key " "
uc.getHeaderField(key)) -
-
- catch (IOException e)
- System.err.println(e)
-
72Writing data to a URLConnection
- Similar to reading data from a URLConnection.
- First inform the URLConnection that you plan to
use it for output - Before getting the connection's input stream, get
the connection's output stream and write to it. - Commonly used to talk to CGIs that use the POST
method
73Eight Steps
- 1. Construct the URL.
- 2. Call the URLs openConnection() method to
create the URLConnection object. - 3. Pass true to the URLConnections setDoOutput()
method - 4. Create the data you want to send, preferably
as a byte array.
74- 5. Call getOutputStream() to get an output stream
object. - 6. Write the byte array calculated in step 5 onto
the stream. - 7. Close the output stream.
- 8. Call getInputStream() to get an input stream
object. Read from it as usual.
75POST CGIs
- A typical POST request to a CGI looks like this
- POST /cgi-bin/booksearch.pl HTTP/1.0
- Referer http//www.macfaq.com/sampleform.html
- User-Agent Mozilla/3.01 (Macintosh I PPC)
- Content-length 60
- Content-type text/x-www-form-urlencoded
- Host utopia.poly.edu56435
- usernameSadie2CJulierealnameWomenComposers
76A POST request includes
- the POST line
- a MIME header which must include
- content type
- content length
- a blank line that signals the end of the MIME
header - the actual data of the form, encoded in
x-www-form-urlencoded format.
77- A URLConnection for an http URL will set up the
request line and the MIME header for you as long
as you set its doOutput field to true by invoking
setDoOutput(true). - If you also want to read from the connection, you
should set doInput to true with setDoInput(true)
too.
78For example,
- URLConnection uc u.openConnection()
- uc.setDoOutput(true)
- uc.setDoInput(true)
79- The request line and MIME header are sent as
soon as the URLConnection connects. Then
getOutputStream() returns an output stream on
which you can write the x-www-form-urlencoded
name-value pairs.
80HttpURLConnection
- java.net.HttpURLConnection is an abstract
subclass of URLConnection that provides some
additional methods specific to the HTTP protocol.
- URL connection objects that are returned by an
http URL will be instances of java.net.HttpURLConn
ection.
81Recall
- a typical HTTP response from a web server begins
like this - HTTP/1.0 200 OK
- Server Netscape-Enterprise/2.01
- Date Sat, 02 Aug 1997 075246 GMT
- Accept-ranges bytes
- Last-modified Tue, 29 Jul 1997 150646 GMT
- Content-length 2810
- Content-type text/html
82Response Codes
- The getHeaderField() and getHeaderFieldKey()
don't return the HTTP response code - After you've connected, you can retrieve the
numeric response code--200 in the above
example--with the getResponseCode() method and
the message associated with it--OK in the above
example--with the getResponseMessage() method.
83HTTP Protocols
- Java 1.0 only supports GET and POST requests to
HTTP servers - Java 1.1/1.2 supports GET, POST, HEAD, OPTIONS,
PUT, DELETE, and TRACE. - The protocol is chosen with the
setRequestMethod(String method) method. - A java.net.ProtocolException, a subclass of
IOException, is thrown if an unknown protocol is
specified.
84getRequestMethod()
- The getRequestMethod() method returns the string
form of the request method currently set for the
URLConnection. GET is the default method.
85disconnect()
- The disconnect() method of the HttpURLConnection
class closes the connection to the web server. - Needed for HTTP/1.1 Keep-alive
86For example,
- try
- URL u new URL("http//www.amnesty.org/")
- HttpURLConnection huc (HttpURLConnection)
u.openConnection() - huc.setRequestMethod("PUT")
- huc.connect()
- OutputStream os huc.getOutputStream()
- int code huc.getResponseCode()
- if (code gt 200 lt 300)
- // put the data...
-
- huc.disconnect()
-
- catch (IOException e) //...
87usingProxy
- The boolean usingProxy() method returns true if
web connections are being funneled through a
proxy server, false if they're not.
88Redirect Instructions
- Most web servers can be configured to
automatically redirect browsers to the new
location of a page that's moved. - To redirect browsers, a server sends a 300 level
response and a Location header that specifies the
new location of the requested page.
89- GET /elharo/macfaq/index.html HTTP/1.0
- HTTP/1.1 302 Moved Temporarily
- Date Mon, 04 Aug 1997 142127 GMT
- Server Apache/1.2b7
- Location http//www.macfaq.com/macfaq/index.html
- Connection close
- Content-type text/html
- ltHTMLgtltHEADgt
- ltTITLEgt302 Moved Temporarilylt/TITLEgt
- lt/HEADgtltBODYgt
- ltH1gtMoved Temporarilylt/H1gt
- The document has moved ltA HREF"http//www.macfaq.
com/macfaq/index.html"gtherelt/Agt.ltPgt - lt/BODYgtlt/HTMLgt
90- HTML is returned for browsers that don't
understand redirects, but most modern browsers do
not display this and jump straight to the page
specified in the Location header instead. - Because redirects can change the site which a
user is connecting without their knowledge so
redirects are not arbitrarily followed by
URLConnections.
91Following Redirects
- HttpURLConnection.setFollowRedirects(true) method
says that connections will follow redirect
instructions from the web server. - Untrusted applets are not allowed to set this.
- HttpURLConnection.getFollowRedirects() returns
true if redirect requests are honored, false if
they're not.
92To Learn More
- Java Network Programming
- OReilly Associates, 1997
- ISBN 1-56592-227-1
- Java I/O
- OReilly Associates, 1999
- ISBN 1-56592-485-1
- Web Client Programming with Java
- http//www.digitalthink.com/catalog/cs/cs308/index
.html
93Questions?