URL Programming - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

URL Programming

Description:

Author=Sadie, Julie&Title=Women Composers. Is the output is what you need ? ... eqs = URLEncoder.encode('Women Composers' ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 59
Provided by: Jam9107
Category:

less

Transcript and Presenter's Notes

Title: URL Programming


1
URL Programming
2
Agenda
  • What is URL
  • How to Apply URL with Java
  • Encoding and Decoding
  • Access Data from URL through InputStream and
    OutputStream Object !!
  • Precisely controlling by URLConnection and
    HttpURLConnection

3
The java.net.URL class
  • A URL object represents a URL.
  • The URL class contains methods to
  • create new URLs
  • parse the different parts of a URL
  • get an input stream from a URL so you can read
    data from a server
  • get content from the server as a Java object

4
Content and Protocol Handlers
  • Content and protocol handlers separate the data
    being downloaded from the the protocol used to
    download it.
  • The protocol handler negotiates with the server
    and parses any headers. It gives the content
    handler only the actual data of the requested
    resource.
  • The content handler translates those bytes into a
    Java object like an InputStream or ImageProducer.

5
Finding Protocol Handlers
  • When the virtual machine creates a URL object, it
    looks for a protocol handler that understands the
    protocol part of the URL such as "http" or
    "mailto".
  • If no such handler is found, the constructor
    throws a MalformedURLException.

6
Supported Protocols
  • The exact protocols that Java supports vary from
    implementation to implementation though http and
    file are supported pretty much everywhere. Sun's
    JDK 1.1 understands ten
  • file
  • ftp
  • gopher
  • http
  • mailto
  • appletresource
  • doc
  • netdoc
  • systemresource
  • verbatim

7
URL Constructors
  • There are four (six in 1.2) constructors in the
    java.net.URL class.
  • public URL(String u) throws MalformedURLException
  • public URL(String protocol, String host, String
    file) throws MalformedURLException
  • public URL(String protocol, String host, int
    port, String file) throws MalformedURLException
  • public URL(URL context, String url) throws
    MalformedURLException
  • public URL(String protocol, String host, int
    port, String file, URLStreamHandler handler)
    throws MalformedURLException
  • public URL(URL context, String url,
    URLStreamHandler handler) throws
    MalformedURLException

8
Constructing URL Objects (1)
  • An absolute URL like http//entry.hit.edu.tw/jame
    schen/aaa.htmlbk1
  • try
  • URL u new URL("http/entry.hit.edu.tw/jamesc
    hen/aaa.htmlbk1")
  • catch (MalformedURLException e)
  • // take some action !!

9
Constructing URL Objects (2)
  • You can also construct the URL by passing its
    pieces to the constructor, like this
  • URL u null
  • try
  • u new URL("http", entry.hit.edu.tw",
    "/jameschen/aaa.htmlbk1")
  • catch (MalformedURLException e)
  • // some action to take !!

10
Constructing URL Objects (3) -- including the
Port
  • URL u null
  • try
  • u new URL("http", entry.hit.edu.tw", 8000,
    /jameschen/aaa.htmlbk1")
  • catch (MalformedURLException e)
  • // some action !

11
Relative URLs
  • Many HTML files contain relative URLs.

12
Constructing Relative URLs
  • The fourth constructor creates URLs relative to a
    given URL. For example,
  • try
  • URL u1 new URL("http//metalab.unc.edu/index.h
    tml")
  • URL u2 new URL(u1, "books.html")
  • catch (MalformedURLException e)
  • This is particularly useful when parsing HTML.

13
Parsing URLs
  • The java.net.URL class has five methods to split
    a URL into its component parts. These are
  • public String getProtocol()
  • public String getHost()
  • public int getPort()
  • public String getFile()
  • public String getRef()

14
For example,
  • try
  • URL u
  • unew URL("http//entry.hit.edu.tw/jameschen/a
    aa.htmlbk1")
  • System.out.println("The protocol is "
    u.getProtocol())
  • System.out.println("The host is "
    u.getHost())
  • System.out.println("The port is "
    u.getPort())
  • System.out.println("The file is "
    u.getFile())
  • System.out.println("The anchor is "
    u.getRef())
  • catch (MalformedURLException e)

15
Parsing URLs
  • JDK 1.3 adds three more
  • public String getAuthority()
  • public String getUserInfo()
  • public String getQuery()

16
Missing Pieces
  • If a port is not explicitly specified in the URL
    it's set to -1. This means the default port.
  • If the ref doesn't exist, it's just null, so
    watch out for NullPointerExceptions. Better yet,
    test to see that it's non-null before using it.
  • If the file is left off completely, e.g.
    http//java.sun.com, then it's set to "/".

17
Reading Data from a URL Object
  • The openStream() method connects to the server
    specified in the URL and returns an InputStream
    object fed by the data from that connection.
  • public final InputStream openStream() throws
    IOException
  • Any headers that precede the actual data are
    stripped off before the stream is opened.
  • Network connections are less reliable and slower
    than files. Buffer with a BufferedReader or a
    BufferedInputStream.

18
Example Webcat v1
  • import java.net.
  • import java.io.
  • public class Webcat
  • public static void main(String args)
  • for (int i 0 i lt args.length i)
  • try
  • URL u new URL(argsi)
  • InputStream in u.openStream()
  • InputStreamReader isr new
    InputStreamReader(in)
  • BufferedReader br new BufferedReader(isr)
  • String theLine
  • while ((theLine br.readLine()) ! null)
  • System.out.println(theLine)
  • catch (IOException e) System.err.println(e
    )

19
The Bug in readLine()
  • What readLine() does
  • Sees a carriage return, waits to see if next
    character is a line feed before returning
  • What readLine() should do
  • Sees a carriage return, return, throw away next
    character if it's a linefeed

20
Example Webcat v2
  • import java.net.
  • import java.io.
  • public class Webcat
  • public static void main(String args)
  • for (int i 0 i lt args.length i)
  • try
  • URL u new URL(argsi)
  • InputStream in u.openStream()
  • InputStreamReader isr new
    InputStreamReader(in)
  • BufferedReader br new BufferedReader(isr)
  • int c
  • while ((c br.read()) ! -1)
  • System.out.write(c)
  • catch (IOException e) System.err.println(e
    )

21
URL Encoding
  • Alphanumeric ASCII characters (a-z, A-Z, and 0-9)
    and the -_.!'(), punctuation symbols are left
    unchanged.
  • The space character is converted into a plus sign
    ().
  • Other characters (e.g. , , , , , , , and
    so on) are translated into a percent sign()
    followed by the two hexadecimal digits
    corresponding to their numeric value.

22
For example,
  • The comma(,) is ASCII character 44 (decimal) or
    2C (hex). Therefore if the comma appears as part
    of a URL it is encoded as 2C.
  • The query string
  • "AuthorSadie, JulieTitleWomen Composers"
  • is encoded as
  • "AuthorSadie2CJulieTitleWomenComposers"

23
The URLEncoder class
  • The java.net.URLEncoder class contains a single
    static method which encodes strings in
    x-www-form-url-encoded format
  • URLEncoder.encode(String s)

24
For example the wrong one !
  • String qs "AuthorSadie, JulieTitleWomen
    Composers"
  • String eqs URLEncoder.encode(qs)
  • System.out.println(eqs)
  • This output should be
  • Author3dSadie2cJulie26Title3dWomenComposers
  • Is the output is what you need ?

25
For example the correct one !
  • String eqs "Author" URLEncoder.encode("Sadie,
    Julie")
  • eqs ""
  • eqs "Title"
  • eqs URLEncoder.encode("Women Composers")
  • This should print the properly encoded query
    string
  • AuthorSadie2cJulieTitleWomenComposers

26
GET URLs with Query String
  • String eqs
  • "Author" URLEncoder.encode("Sadie, Julie")
  • eqs ""
  • eqs "Title"
  • eqs URLEncoder.encode("Women Composers")
  • try
  • URL u new URL("http//www.superbooks.com/sea
    rch.cgi?" eqs)
  • InputStream in u.openStream()
  • //...
  • catch (IOException e)
  • //...

27
The URLDecoder class
  • In Java 1.2 the java.net.URLDecoder class
    contains a single static method which decodes
    strings in x-www-form-url-encoded format
  • URLEncoder.decode(String s)

28
URLConnections
  • The java.net.URLConnection class is an abstract
    class that handles communication with different
    kinds of servers like ftp servers and web
    servers.
  • Protocol specific subclasses of URLConnection
    handle different kinds of servers.
  • By default, connections to HTTP URLs use the GET
    method.

29
URLConnections vs. URLs
  • Can send output as well as read input
  • Can post data to CGIs
  • Can read headers from a connection

30
URLConnection five steps
  • 1. The URL is constructed.
  • 2. The URLs openConnection() method creates the
    URLConnection object.
  • 3. The parameters for the connection and the
    request properties that the client sends to the
    server are set up.
  • 4. The connect() method makes the connection to
    the server. (optional)
  • 5. The response header information is read using
    getHeaderField().

31
I/O Across a URLConnection
  • Data may be read from the connection in one of
    two ways
  • raw by using the input stream returned by
    getInputStream()
  • through a content handler with getContent()
  • Data can be sent to the server using the output
    stream provided by getOutputStream()

32
Example URLConnection
  • try
  • URL u new URL("http//www.w3c.org/")
  • URLConnection uc u.openConnection()
  • uc.connect()
  • InputStream in uc.getInputStream()
  • // read the data...
  • catch (IOException e)
  • //...

33
Reading Header Data
  • The getHeaderField(String name) method returns
    the string value of a named header field.
  • Names are case-insensitive.
  • If the requested field is not present, null is
    returned.
  • String lm uc.getHeaderField("Last-modified")

34
getHeaderFieldKey()
  • The keys of the header fields are returned by the
    getHeaderFieldKey(int n) method.
  • The first field is 1.
  • If a numbered key is not found, null is returned.
  • You can use this in combination with
    getHeaderField() to loop through the complete
    header

35
Example -- getHeaderField(key) and
getHeaderFieldKey(i)
  • String key null
  • for (int i1 (key uc.getHeaderFieldKey(i))!nul
    l) i)
  • System.out.println(key " "
    uc.getHeaderField(key))

36
getHeaderFieldInt() and getHeaderFieldDate()
  • Utility methods that read a named header and
    convert its value into an int and a long
    respectively.
  • public int getHeaderFieldInt(String name, int
    default)
  • public long getHeaderFieldDate(String name, long
    default)

37
More about getHeaderFieldDate()
  • The long returned by getHeaderFieldDate() can be
    converted from long into a Date object using a
    Date() constructor like this
  • long lm uc.getHeaderFieldDate("Last-modified",
    0)
  • Date lastModified new Date(lm)

38
Six Convenience Methods
  • These return the values of six particularly
    common header fields
  • public int getContentLength()
  • public String getContentType()
  • public String getContentEncoding()
  • public long getExpiration()
  • public long getDate()
  • public long getLastModified()

39
Example
  • try
  • URL u new URL("http//entry.hit.edu.tw/")
  • URLConnection uc u.openConnection()
  • uc.connect()
  • String keynull
  • for(int n 1
  • (keyuc.getHeaderFieldKey(n)) ! null
  • n)
  • System.out.println(key " "
  • uc.getHeaderField(key))
  • catch (IOException e)
  • System.err.println(e)

40
Writing data to a URLConnection
  • Similar to reading data from a URLConnection.
  • First inform the URLConnection that you plan to
    use it for output
  • Before getting the connection's input stream, get
    the connection's output stream and write to it.
  • Commonly used to talk to CGIs that use the POST
    method
  • Must Construct both header and data parts.

41
A POST request includes
  • the POST line
  • a MIME header which must include
  • content type
  • content length
  • a blank line that signals the end of the MIME
    header
  • the actual data of the form, encoded in
    x-www-form-urlencoded format.

42
POST CGIs
  • A typical POST request to a CGI looks like this
  • POST /cgi-bin/booksearch.pl HTTP/1.0
  • Referer http//www.macfaq.com/sampleform.html
  • User-Agent Mozilla/3.01 (Macintosh I PPC)
  • Content-length 60
  • Content-type text/x-www-form-urlencoded
  • Host utopia.poly.edu56435
  • usernameSadie2CJulierealnameWomenComposers

Header
Data
43
Eight Steps for Writing Data
  • 1. Construct the URL.
  • 2. Call the URLs openConnection() method to
    create the URLConnection object.
  • 3. Pass true to the URLConnections setDoOutput()
    method
  • 4. Create the data you want to send, preferably
    as a byte array.
  • 5. Call getOutputStream() to get an output stream
    object.
  • 6. Write the byte array calculated in step 5 onto
    the stream.
  • 7. Close the output stream.
  • 8. Call getInputStream() to get an input stream
    object. Read from it as usual. (optional)

44
Writing data to a URLConnection --
setDoOutput(true), setDoInput(true)
  • A URLConnection for an http URL will set up the
    request line and the MIME header for you as long
    as you set its doOutput field to true by invoking
    setDoOutput(true).
  • If you also want to read from the connection, you
    should set doInput to true with setDoInput(true)
    too.

45
For example
  • URLConnection uc u.openConnection()
  • uc.setDoOutput(true)
  • uc.setDoInput(true)

46
Writing data to a URLConnection --
getOutputStream()
  • The request line and MIME header are sent as
    soon as the URLConnection connects. Then
    getOutputStream() returns an output stream on
    which you can write the x-www-form-urlencoded
    name-value pairs.

47
HttpURLConnection
  • java.net.HttpURLConnection is an abstract
    subclass of URLConnection that provides some
    additional methods specific to the HTTP protocol.
  • URL connection objects that are returned by an
    http URL will be instances of java.net.HttpURLConn
    ection.

48
HttpURLConnection cont.
Server
Client
  • setRequestMethod()
  • Connect()
  • // Response Info
  • getResponseCode()
  • getResponseMessage()
  • // Redirect setting
  • setFollowRedirects(true)
  • getFollowRedirects()
  • disconnect()
  • getRequestMethod()
  • // retrieve form data
  • // send back status info

49
Recall HTTP Response
  • a typical HTTP response from a web server begins
    like this
  • HTTP/1.0 200 OK
  • Server Netscape-Enterprise/2.01
  • Date Sat, 02 Aug 1997 075246 GMT
  • Accept-ranges bytes
  • Last-modified Tue, 29 Jul 1997 150646 GMT
  • Content-length 2810
  • Content-type text/html

50
Get Response Codes and Messages
  • The getHeaderField() and getHeaderFieldKey()
    don't return the HTTP response code
  • After you've connected, you can retrieve the
    numeric response code--200 in the above
    example--with the getResponseCode() method
  • and the message associated with it--OK in the
    above example--with the getResponseMessage()
    method.

51
HTTP Protocols
  • Java 1.0 only supports GET and POST requests to
    HTTP servers
  • Java 1.1/1.2 supports GET, POST, HEAD, OPTIONS,
    PUT, DELETE, and TRACE.
  • The protocol is chosen with the
    setRequestMethod(String method) method.
  • A java.net.ProtocolException, a subclass of
    IOException, is thrown if an unknown protocol is
    specified.

52
getRequestMethod()
  • The getRequestMethod() method returns the string
    form of the request method currently set for the
    URLConnection. GET is the default method.

53
disconnect()
  • The disconnect() method of the HttpURLConnection
    class closes the connection to the web server.
  • Needed for HTTP/1.1 Keep-alive

54
For example,
  • try
  • URL u new URL( "http//www.metalab.unc.edu/jav
    afaq/books.html" )
  • HttpURLConnection huc (HttpURLConnection)
    u.openConnection()
  • huc.setRequestMethod("PUT")
  • huc.connect()
  • OutputStream os huc.getOutputStream()
  • int code huc.getResponseCode()
  • if (code gt 200 lt 300)
  • // put the data...
  • huc.disconnect()
  • catch (IOException e)
  • //...

55
Using Proxy ??
  • The boolean usingProxy() method returns true if
    web connections are being funneled through a
    proxy server, false if they're not.

56
Redirect Instructions
  • Most web servers can be configured to
    automatically redirect browsers to the new
    location of a page that's moved.
  • To redirect browsers, a server sends a 300 level
    response code and a Location header that
    specifies the new location of the requested page.
  • HTML is returned for browsers that don't
    understand redirects, but most modern browsers
    jump straight to the page specified in the
    Location header instead.
  • Because redirects can change the site which a
    user is connecting without their knowledge so
    redirects are not arbitrarily followed by
    URLConnections. (only in HttpURLConnection
    Objects )

57
  • GET /elharo/macfaq/index.html HTTP/1.0
  • HTTP/1.1 302 Moved Temporarily
  • Date Mon, 04 Aug 1997 142127 GMT
  • Server Apache/1.2b7
  • Location http//www.macfaq.com/macfaq/index.html
  • Connection close
  • Content-type text/html
  • ltHTMLgtltHEADgt
  • ltTITLEgt302 Moved Temporarilylt/TITLEgt
  • lt/HEADgtltBODYgt
  • ltH1gtMoved Temporarilylt/H1gt
  • The document has moved ltA HREF"http//www.macfaq.
    com/macfaq/index.html"gtherelt/Agt.ltPgt
  • lt/BODYgtlt/HTMLgt

58
Following Redirects
  • HttpURLConnection.setFollowRedirects(true) method
    says that connections will follow redirect
    instructions from the web server.
  • Un-trusted applets are not allowed to set this.
  • HttpURLConnection.getFollowRedirects()
  • returns true if redirect requests are honored,
    false if they're not.
Write a Comment
User Comments (0)
About PowerShow.com