URLs, InetAddresses, and URLConnections - PowerPoint PPT Presentation

About This Presentation
Title:

URLs, InetAddresses, and URLConnections

Description:

Understand basic Java syntax and I/O. Have a user's view of the Internet ... verbatim 1999 Elliotte Rusty Harold. URL Constructors ... – PowerPoint PPT presentation

Number of Views:272
Avg rating:3.0/5.0
Slides: 94
Provided by: ibib
Learn more at: http://www.ibiblio.org
Category:

less

Transcript and Presenter's Notes

Title: URLs, InetAddresses, and URLConnections


1
URLs, InetAddresses, and URLConnections
  • High Level Network Programming
  • Elliotte Rusty Harold
  • elharo_at_metalab.unc.edu
  • http//metalab.unc.edu/javafaq/slides/

2
We will learn how Java handles
  • Internet Addresses
  • URLs
  • CGI
  • URLConnection
  • Content and Protocol handlers

3
I assume you
  • Understand basic Java syntax and I/O
  • Have a users view of the Internet
  • No prior network programming experience

4
Applet Network Security Restrictions
  • Applets may
  • send data to the code base
  • receive data from the code base
  • Applets may not
  • send data to hosts other than the code base
  • receive data from hosts other than the code base

5
Some Background
  • Hosts
  • Internet Addresses
  • Ports
  • Protocols

6
Hosts
  • Devices connected to the Internet are called
    hosts
  • Most hosts are computers, but hosts also include
    routers, printers, fax machines, soda machines,
    bat houses, etc.

7
Internet addresses
  • Every host on the Internet is identified by a
    unique, four-byte Internet Protocol (IP) address.
  • This is written in dotted quad format like
    199.1.32.90 where each byte is an unsigned
    integer between 0 and 255.
  • There are about four billion unique IP addresses,
    but they arent very efficiently allocated

8
Domain Name System (DNS)
  • Numeric addresses are mapped to names like
    "www.blackstar.com" or "star.blackstar.com" by
    DNS.
  • Each site runs domain name server software that
    translates names to IP addresses and vice versa
  • DNS is a distributed system

9
The InetAddress Class
  • The java.net.InetAddress class represents an IP
    address.
  • It converts numeric addresses to host names and
    host names to numeric addresses.
  • It is used by other network classes like Socket
    and ServerSocket to identify hosts

10
Creating InetAddresses
  • There are no public InetAddress() constructors.
    Arbitrary addresses may not be created.
  • All addresses that are created must be checked
    with DNS

11
The getByName() factory method
  • public static InetAddress getByName(String host)
    throws UnknownHostException
  • InetAddress utopia, duke
  • try
  • utopia InetAddress.getByName("utopia.poly.edu"
    )
  • duke InetAddress.getByName("128.238.2.92")
  • catch (UnknownHostException e)
  • System.err.println(e)

12
Other ways to create InetAddress objects
  • public static InetAddress getAllByName(String
    host) throws UnknownHostException
  • public static InetAddress getLocalHost() throws
    UnknownHostException

13
Getter Methods
  • public boolean isMulticastAddress()
  • public String getHostName()
  • public byte getAddress()
  • public String getHostAddress()

14
Utility Methods
  • public int hashCode()
  • public boolean equals(Object o)
  • public String toString()

15
Ports
  • In general a host has only one Internet address
  • This address is subdivided into 65,536 ports
  • Ports are logical abstractions that allow one
    host to communicate simultaneously with many
    other hosts
  • Many services run on well-known ports. For
    example, http tends to run on port 80

16
Protocols
  • A protocol defines how two hosts talk to each
    other.
  • The daytime protocol, RFC 867, specifies an ASCII
    representation for the time that's legible to
    humans.
  • The time protocol, RFC 868, specifies a binary
    representation, for the time that's legible to
    computers.
  • There are thousands of protocols, standard and
    non-standard

17
IETF RFCs
  • Requests For Comment
  • Document how much of the Internet works
  • Various status levels from obsolete to required
    to informational
  • TCP/IP, telnet, SMTP, MIME, HTTP, and more
  • http//www.faqs.org/rfc/

18
W3C Standards
  • IETF is based on rough consensus and running
    code
  • W3C tries to run ahead of implementation
  • IETF is an informal organization open to
    participation by anyone
  • W3C is a vendor consortium open only to companies

19
W3C Standards
  • HTTP
  • HTML
  • XML
  • RDF
  • MathML
  • SMIL
  • P3P

20
URLs
  • A URL, short for "Uniform Resource Locator", is a
    way to unambiguously identify the location of a
    resource on the Internet.

21
Example URLs
  • http//java.sun.com/
  • file///Macintosh20HD/Java/Docs/JDK201.1.120doc
    s/api/java.net.InetAddress.html_top_
  • http//www.macintouch.com80/newsrecent.shtml
  • ftp//ftp.info.apple.com/pub/
  • mailtoelharo_at_metalab.unc.edu
  • telnet//utopia.poly.edu
  • ftp//mp3mp3_at_138.247.121.6121000/c3a/stuff/mp3/
  • http//elharo_at_java.oreilly.com/
  • http//metalab.unc.edu/nywc/comps.phtml?categoryC
    horalWorks

22
The Pieces of a URL
  • the protocol, aka scheme
  • the authority
  • user info
  • user name
  • password
  • host name or address
  • port
  • the path, aka file
  • the ref, aka section or anchor
  • the query string

23
The java.net.URL class
  • A URL object represents a URL.
  • The URL class contains methods to
  • create new URLs
  • parse the different parts of a URL
  • get an input stream from a URL so you can read
    data from a server
  • get content from the server as a Java object

24
Content and Protocol Handlers
  • Content and protocol handlers separate the data
    being downloaded from the the protocol used to
    download it.
  • The protocol handler negotiates with the server
    and parses any headers. It gives the content
    handler only the actual data of the requested
    resource.
  • The content handler translates those bytes into a
    Java object like an InputStream or ImageProducer.

25
Finding Protocol Handlers
  • When the virtual machine creates a URL object, it
    looks for a protocol handler that understands the
    protocol part of the URL such as "http" or
    "mailto".
  • If no such handler is found, the constructor
    throws a MalformedURLException.

26
Supported Protocols
  • The exact protocols that Java supports vary from
    implementation to implementation though http and
    file are supported pretty much everywhere. Sun's
    JDK 1.1 understands ten
  • file
  • ftp
  • gopher
  • http
  • mailto
  • appletresource
  • doc
  • netdoc
  • systemresource
  • verbatim

27
URL Constructors
  • There are four (six in 1.2) constructors in the
    java.net.URL class.
  • public URL(String u) throws MalformedURLException
  • public URL(String protocol, String host, String
    file) throws MalformedURLException
  • public URL(String protocol, String host, int
    port, String file) throws MalformedURLException
  • public URL(URL context, String url) throws
    MalformedURLException
  • public URL(String protocol, String host, int
    port, String file, URLStreamHandler handler)
    throws MalformedURLException
  • public URL(URL context, String url,
    URLStreamHandler handler) throws
    MalformedURLException

28
Constructing URL Objects
  • An absolute URL like http//www.poly.edu/fall97/gr
    ad.htmlcs
  • try
  • URL u new
  • URL("http//www.poly.edu/fall97/grad.htmlcs")
  • catch (MalformedURLException e)

29
Constructing URL Objects in Pieces
  • You can also construct the URL by passing its
    pieces to the constructor, like this
  • URL u null
  • try
  • u new URL("http", "www.poly.edu",
    "/schedule/fall97/bgrad.htmlcs")
  • catch (MalformedURLException e)

30
Including the Port
  • URL u null
  • try
  • u new URL("http", "www.poly.edu", 8000,
    "/fall97/grad.htmlcs")
  • catch (MalformedURLException e)

31
Relative URLs
  • Many HTML files contain relative URLs.
  • Consider the page http//metalab.unc.edu/javafaq/i
    ndex.html
  • On this page a link to books.html" refers to
    http//metalab.unc.edu/javafaq/books.html.

32
Constructing Relative URLs
  • The fourth constructor creates URLs relative to a
    given URL. For example,
  • try
  • URL u1 new URL("http//metalab.unc.edu/index.h
    tml")
  • URL u2 new URL(u1, books.html")
  • catch (MalformedURLException e)
  • This is particularly useful when parsing HTML.

33
Parsing URLs
  • The java.net.URL class has five methods to split
    a URL into its component parts. These are
  • public String getProtocol()
  • public String getHost()
  • public int getPort()
  • public String getFile()
  • public String getRef()

34
For example,
  • try
  • URL u new URL("http//www.poly.edu/fall97/grad
    .htmlcs ")
  • System.out.println("The protocol is "
    u.getProtocol())
  • System.out.println("The host is "
    u.getHost())
  • System.out.println("The port is "
    u.getPort())
  • System.out.println("The file is "
    u.getFile())
  • System.out.println("The anchor is "
    u.getRef())
  • catch (MalformedURLException e)

35
Parsing URLs
  • JDK 1.3 adds three more
  • public String getAuthority()
  • public String getUserInfo()
  • public String getQuery()

36
Missing Pieces
  • If a port is not explicitly specified in the URL
    it's set to -1. This means the default port is to
    be used.
  • If the ref doesn't exist, it's just null, so
    watch out for NullPointerExceptions. Better yet,
    test to see that it's non-null before using it.
  • If the file is left off completely, e.g.
    http//java.sun.com, then it's set to "/".

37
Reading Data from a URL
  • The openStream() method connects to the server
    specified in the URL and returns an InputStream
    object fed by the data from that connection.
  • public final InputStream openStream() throws
    IOException
  • Any headers that precede the actual data are
    stripped off before the stream is opened.
  • Network connections are less reliable and slower
    than files. Buffer with a BufferedReader or a
    BufferedInputStream.

38
Webcat
  • import java.net.
  • import java.io.
  • public class Webcat
  • public static void main(String args)
  • for (int i 0 i lt args.length i)
  • try
  • URL u new URL(argsi)
  • InputStream in u.openStream()
  • InputStreamReader isr new
    InputStreamReader(in)
  • BufferedReader br new BufferedReader(isr)
  • String theLine
  • while ((theLine br.readLine()) ! null)
  • System.out.println(theLine)
  • catch (IOException e) System.err.println(e
    )

39
The Bug in readLine()
  • What readLine() does
  • Sees a carriage return, waits to see if next
    character is a line feed before returning
  • What readLine() should do
  • Sees a carriage return, return, throw away next
    character if it's a linefeed

40
Webcat
  • import java.net.
  • import java.io.
  • public class Webcat
  • public static void main(String args)
  • for (int i 0 i lt args.length i)
  • try
  • URL u new URL(argsi)
  • InputStream in u.openStream()
  • InputStreamReader isr new
    InputStreamReader(in)
  • char c
  • while ((c br.read()) ! -1)
  • System.out.print(c)
  • catch (IOException e) System.err.println(e
    )

41
CGI
  • Common Gateway Interface
  • A lot is written about writing server side CGI.
    Im going to show you client side CGI.
  • Well need to explore HTTP a little deeper to do
    this

42
Normal web surfing uses these two steps
  • The browser requests a page
  • The server sends the page
  • Data flows primarily from the server to the
    client.

43
Forms
  • There are times when the server needs to get data
    from the client rather than the other way around.
    The common way to do this is with a form like
    this one

44
CGI
  • The user types the requested data into the form
    and hits the submit button.
  • The client browser then sends the data to the
    server using the Common Gateway Interface, CGI
    for short.
  • CGI uses the HTTP protocol to transmit the data,
    either as part of the query string or as separate
    data following the MIME header.

45
GET and POST
  • When the data is sent as a query string included
    with the file request, this is called CGI GET.
  • When the data is sent as data attached to the
    request following the MIME header, this is called
    CGI POST

46
HTTP
  • Web browsers communicate with web servers through
    a standard protocol known as HTTP, an acronym for
    HyperText Transfer Protocol.
  • This protocol defines
  • how a browser requests a file from a web server
  • how a browser sends additional data along with
    the request (e.g. the data formats it can
    accept),
  • how the server sends data back to the client
  • response codes

47
A Typical HTTP Connection
  • Client opens a socket to port 80 on the server.
  • Client sends a GET request including the name and
    path of the file it wants and the version of the
    HTTP protocol it supports.
  • The client sends a MIME header.
  • The client sends a blank line.
  • The server sends a MIME header
  • The server sends the data in the file.
  • The server closes the connection.

48
What the client sends to the server
  • GET /javafaq/images/cup.gif
  • Connection Keep-Alive
  • User-Agent Mozilla/3.01 (Macintosh I PPC)
  • Host www.oreilly.com80
  • Accept image/gif, image/x-xbitmap, image/jpeg,
    /

49
MIME
  • MIME is an acronym for "Multipurpose Internet
    Mail Extensions".
  • an Internet standard defined in RFCs 2045 through
    2049
  • originally intended for use with email messages,
    but has been been adopted for use in HTTP.

50
Browser Request MIME Header
  • When the browser sends a request to a web server,
    it also sends a MIME header.
  • MIME headers contain name-value pairs,
    essentially a name followed by a colon and a
    space, followed by a value.
  • Connection Keep-Alive
  • User-Agent Mozilla/3.01 (Macintosh I PPC)
  • Host www.digitalthink.com80
  • Accept image/gif, image/x-xbitmap, image/jpeg,
    image/pjpeg, /

51
Server Response MIME Header
  • When a web server responds to a web browser it
    sends a MIME header along with the response that
    looks something like this
  • Server Netscape-Enterprise/2.01
  • Date Sat, 02 Aug 1997 075246 GMT
  • Accept-ranges bytes
  • Last-modified Tue, 29 Jul 1997 150646 GMT
  • Content-length 2810
  • Content-type text/html

52
Query Strings
  • CGI GET data is sent in URL encoded query strings
  • a query string is a set of namevalue pairs
    separated by ampersands
  • AuthorSadie, JulieTitleWomen Composers
  • separated from rest of URL by a question mark

53
URL Encoding
  • Alphanumeric ASCII characters (a-z, A-Z, and 0-9)
    and the -_.!'(), punctuation symbols are left
    unchanged.
  • The space character is converted into a plus sign
    ().
  • Other characters (e.g. , , , , , , , and
    so on) are translated into a percent sign
    followed by the two hexadecimal digits
    corresponding to their numeric value.

54
For example,
  • The comma is ASCII character 44 (decimal) or 2C
    (hex). Therefore if the comma appears as part of
    a URL it is encoded as 2C.
  • The query string "AuthorSadie, JulieTitleWomen
    Composers" is encoded as
  • AuthorSadie2CJulieTitleWomenComposers

55
The URLEncoder class
  • The java.net.URLEncoder class contains a single
    static method which encodes strings in
    x-www-form-url-encoded format
  • URLEncoder.encode(String s)

56
For example,
  • String qs "AuthorSadie, JulieTitleWomen
    Composers"
  • String eqs URLEncoder.encode(qs)
  • System.out.println(eqs)
  • This prints
  • Author3dSadie2cJulie26Title3dWomenComposers

57
  • String eqs "Author" URLEncoder.encode("Sadie,
    Julie")
  • eqs ""
  • eqs "Title"
  • eqs URLEncoder.encode("Women Composers")
  • This prints the properly encoded query string
  • AuthorSadie2cJulieTitleWomenComposers

58
The URLDecoder class
  • In Java 1.2 the java.net.URLDecoder class
    contains a single static method which decodes
    strings in x-www-form-url-encoded format
  • URLEncoder.decode(String s)

59
GET URLs
  • String eqs
  • "Author" URLEncoder.encode("Sadie, Julie")
  • eqs ""
  • eqs "Title"
  • eqs URLEncoder.encode("Women Composers")
  • try
  • URL u new URL("http//www.superbooks.com/sea
    rch.cgi?" eqs)
  • InputStream in u.openStream()
  • //...
  • catch (IOException e) //...

60
URLConnections
  • The java.net.URLConnection class is an abstract
    class that handles communication with different
    kinds of servers like ftp servers and web
    servers.
  • Protocol specific subclasses of URLConnection
    handle different kinds of servers.
  • By default, connections to HTTP URLs use the GET
    method.

61
URLConnections vs. URLs
  • Can send output as well as read input
  • Can post data to CGIs
  • Can read headers from a connection

62
URLConnection five steps
  • 1. The URL is constructed.
  • 2. The URLs openConnection() method creates the
    URLConnection object.
  • 3. The parameters for the connection and the
    request properties that the client sends to the
    server are set up.
  • 4. The connect() method makes the connection to
    the server. (optional)
  • 5. The response header information is read using
    getHeaderField().

63
I/O Across a URLConnection
  • Data may be read from the connection in one of
    two ways
  • raw by using the input stream returned by
    getInputStream()
  • through a content handler with getContent().
  • Data can be sent to the server using the output
    stream provided by getOutputStream().

64
For example,
  • try
  • URL u new URL("http//www.sd99.com/")
  • URLConnection uc u.openConnection()
  • uc.connect()
  • InputStream in uc.getInputStream()
  • // read the data...
  • catch (IOException e) //...

65
Reading Header Data
  • The getHeaderField(String name) method returns
    the string value of a named header field.
  • Names are case-insensitive.
  • If the requested field is not present, null is
    returned.
  • String lm uc.getHeaderField("Last-modified")

66
getHeaderFieldKey()
  • The keys of the header fields are returned by the
    getHeaderFieldKey(int n) method.
  • The first field is 1.
  • If a numbered key is not found, null is returned.
  • You can use this in combination with
    getHeaderField() to loop through the complete
    header

67
For example
  • String key null
  • for (int i1 (key uc.getHeaderFieldKey(i))!nul
    l) i)
  • System.out.println(key " "
    uc.getHeaderField(key))

68
getHeaderFieldInt() and getHeaderFieldDate()
  • These are utility methods that read a named
    header and convert its value into an int and a
    long respectively.
  • public int getHeaderFieldInt(String name, int
    default)
  • public long getHeaderFieldDate(String name, long
    default)

69
  • The long returned by getHeaderFieldDate() can be
    converted into a Date object using a Date()
    constructor like this
  • String s uc.getHeaderFieldDate("Last-modified",
    0)
  • Date lm new Date(s)

70
Six Convenience Methods
  • These return the values of six particularly
    common header fields
  • public int getContentLength()
  • public String getContentType()
  • public String getContentEncoding()
  • public long getExpiration()
  • public long getDate()
  • public long getLastModified()

71
  • try
  • URL u new URL("http//www.sdexpo.com/")
  • URLConnection uc u.openConnection()
  • uc.connect()
  • String keynull
  • for (int n 1
  • (keyuc.getHeaderFieldKey(n)) ! null
  • n)
  • System.out.println(key " "
    uc.getHeaderField(key))
  • catch (IOException e)
  • System.err.println(e)

72
Writing data to a URLConnection
  • Similar to reading data from a URLConnection.
  • First inform the URLConnection that you plan to
    use it for output
  • Before getting the connection's input stream, get
    the connection's output stream and write to it.
  • Commonly used to talk to CGIs that use the POST
    method

73
Eight Steps
  • 1. Construct the URL.
  • 2. Call the URLs openConnection() method to
    create the URLConnection object.
  • 3. Pass true to the URLConnections setDoOutput()
    method
  • 4. Create the data you want to send, preferably
    as a byte array.

74
  • 5. Call getOutputStream() to get an output stream
    object.
  • 6. Write the byte array calculated in step 5 onto
    the stream.
  • 7. Close the output stream.
  • 8. Call getInputStream() to get an input stream
    object. Read from it as usual.

75
POST CGIs
  • A typical POST request to a CGI looks like this
  • POST /cgi-bin/booksearch.pl HTTP/1.0
  • Referer http//www.macfaq.com/sampleform.html
  • User-Agent Mozilla/3.01 (Macintosh I PPC)
  • Content-length 60
  • Content-type text/x-www-form-urlencoded
  • Host utopia.poly.edu56435
  • usernameSadie2CJulierealnameWomenComposers

76
A POST request includes
  • the POST line
  • a MIME header which must include
  • content type
  • content length
  • a blank line that signals the end of the MIME
    header
  • the actual data of the form, encoded in
    x-www-form-urlencoded format.

77
  • A URLConnection for an http URL will set up the
    request line and the MIME header for you as long
    as you set its doOutput field to true by invoking
    setDoOutput(true).
  • If you also want to read from the connection, you
    should set doInput to true with setDoInput(true)
    too.

78
For example,
  • URLConnection uc u.openConnection()
  • uc.setDoOutput(true)
  • uc.setDoInput(true)

79
  • The request line and MIME header are sent as
    soon as the URLConnection connects. Then
    getOutputStream() returns an output stream on
    which you can write the x-www-form-urlencoded
    name-value pairs.

80
HttpURLConnection
  • java.net.HttpURLConnection is an abstract
    subclass of URLConnection that provides some
    additional methods specific to the HTTP protocol.
  • URL connection objects that are returned by an
    http URL will be instances of java.net.HttpURLConn
    ection.

81
Recall
  • a typical HTTP response from a web server begins
    like this
  • HTTP/1.0 200 OK
  • Server Netscape-Enterprise/2.01
  • Date Sat, 02 Aug 1997 075246 GMT
  • Accept-ranges bytes
  • Last-modified Tue, 29 Jul 1997 150646 GMT
  • Content-length 2810
  • Content-type text/html

82
Response Codes
  • The getHeaderField() and getHeaderFieldKey()
    don't return the HTTP response code
  • After you've connected, you can retrieve the
    numeric response code--200 in the above
    example--with the getResponseCode() method and
    the message associated with it--OK in the above
    example--with the getResponseMessage() method.

83
HTTP Protocols
  • Java 1.0 only supports GET and POST requests to
    HTTP servers
  • Java 1.1/1.2 supports GET, POST, HEAD, OPTIONS,
    PUT, DELETE, and TRACE.
  • The protocol is chosen with the
    setRequestMethod(String method) method.
  • A java.net.ProtocolException, a subclass of
    IOException, is thrown if an unknown protocol is
    specified.

84
getRequestMethod()
  • The getRequestMethod() method returns the string
    form of the request method currently set for the
    URLConnection. GET is the default method.

85
disconnect()
  • The disconnect() method of the HttpURLConnection
    class closes the connection to the web server.
  • Needed for HTTP/1.1 Keep-alive

86
For example,
  • try
  • URL u new URL("http//www.amnesty.org/")
  • HttpURLConnection huc (HttpURLConnection)
    u.openConnection()
  • huc.setRequestMethod("PUT")
  • huc.connect()
  • OutputStream os huc.getOutputStream()
  • int code huc.getResponseCode()
  • if (code gt 200 lt 300)
  • // put the data...
  • huc.disconnect()
  • catch (IOException e) //...

87
usingProxy
  • The boolean usingProxy() method returns true if
    web connections are being funneled through a
    proxy server, false if they're not.

88
Redirect Instructions
  • Most web servers can be configured to
    automatically redirect browsers to the new
    location of a page that's moved.
  • To redirect browsers, a server sends a 300 level
    response and a Location header that specifies the
    new location of the requested page.

89
  • GET /elharo/macfaq/index.html HTTP/1.0
  • HTTP/1.1 302 Moved Temporarily
  • Date Mon, 04 Aug 1997 142127 GMT
  • Server Apache/1.2b7
  • Location http//www.macfaq.com/macfaq/index.html
  • Connection close
  • Content-type text/html
  • ltHTMLgtltHEADgt
  • ltTITLEgt302 Moved Temporarilylt/TITLEgt
  • lt/HEADgtltBODYgt
  • ltH1gtMoved Temporarilylt/H1gt
  • The document has moved ltA HREF"http//www.macfaq.
    com/macfaq/index.html"gtherelt/Agt.ltPgt
  • lt/BODYgtlt/HTMLgt

90
  • HTML is returned for browsers that don't
    understand redirects, but most modern browsers do
    not display this and jump straight to the page
    specified in the Location header instead.
  • Because redirects can change the site which a
    user is connecting without their knowledge so
    redirects are not arbitrarily followed by
    URLConnections.

91
Following Redirects
  • HttpURLConnection.setFollowRedirects(true) method
    says that connections will follow redirect
    instructions from the web server.
  • Untrusted applets are not allowed to set this.
  • HttpURLConnection.getFollowRedirects() returns
    true if redirect requests are honored, false if
    they're not.

92
To Learn More
  • Java Network Programming
  • OReilly Associates, 1997
  • ISBN 1-56592-227-1
  • Java I/O
  • OReilly Associates, 1999
  • ISBN 1-56592-485-1
  • Web Client Programming with Java
  • http//www.digitalthink.com/catalog/cs/cs308/index
    .html

93
Questions?
Write a Comment
User Comments (0)
About PowerShow.com