Dickson K.W. Chiu - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Dickson K.W. Chiu

Description:

Parsing a Web page with Jtidy. Better parser than the standard ... body bgcolor='white' img src='duke.waving.gif' h2 My name is Duke. What is yours? /h2 ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 13
Provided by: kwc7
Category:

less

Transcript and Presenter's Notes

Title: Dickson K.W. Chiu


1
CSIT600b XML Programming XML Programming Guide
Getting Started 2
  • Dickson K.W. Chiu
  • PhD, SMIEEE
  • Reference Sun J2EE 1.4 Tutorial

2
Assignment Overview
Browser access
WAP access
WML
HTML
XSLT2
XSLT3
Jtidy Parser
XSLT1
Existing HTML sites
Info Summary
HTML
J2EE environment
  • XSLT experiment - interactive try and error with
    IE6

3
Parsing a Web page with Jtidy
  • Better parser than the standard Java library
  • Clean up malformed and faulty HTML
  • More functions
  • Installation
  • Download from http//jtidy.sourceforge.net/
  • Put Tidy.jar into H\Sun\AppServer\lib
  • Edit H\Sun\AppServer\j2eetutorial14\examples\comm
    on\targets.xml
  • Add red line for library
  • ltpath id"classpath"gt
  • ltfileset dir"j2ee.home/lib"gt
  • ltinclude name"j2ee.jar"/gt
  • ltinclude name"Tidy.jar"/gt
  • lt/filesetgt
  • lt/pathgt

4
Parsing source code
  • // thanks to the TA, Mr Kawah Wong
  • import javax.servlet.http. import
    javax.servlet.
  • import java.io. import java.net.
  • import org.w3c.tidy. import org.w3c.dom.
  • public class Html2Dom extends HttpServlet
  • public Html2Dom()
  • protected void doGet(HttpServletRequest req,
    HttpServletResponse resp) throws
  • ServletException, IOException
  • String urlStr req.getParameter("url") //
    get the parameter
  • URL url new URL(urlStr) // open the
    connection with that url
  • URLConnection cn url.openConnection()
  • // parse the html file into dom
  • Tidy tidy new Tidy()
  • tidy.setIndentContent(true)
  • tidy.setXHTML(true)
  • tidy.setWraplen(Integer.MAX_VALUE)
  • Document doc tidy.parseDOM(cn.getInputStream
    (), null)
  • // print out the Dom
  • tidy.pprint(doc, resp.getOutputStream())

5
Front Page Interface
  • lthtmlgt
  • ltheadgt
  • lttitlegt
  • CSIT600B HTML to DOM Demo
  • lt/titlegt
  • lt/headgt
  • ltbodygt
  • lth2gtHTML to DOM Demolt/h2gt
  • ltform method"get" action"./result"gt
  • ltlabelgtURL lt/labelgt
  • ltinput type"text" name"url" style"width800"/gt
  • ltinput type"submit" name"submit"
    value"convert"/gt
  • lt/formgt
  • lt/bodygt
  • lt/htmlgt

6
WAR Structure
  • Duplicate directory of example hello2 for testing
  • Include Tidy.jar
  • Alias /result for the servlet
  • Context root /html2dom
  • You dont need the frontpage, e.g.
  • http//localhost7999/html2dom/result?urlhttp//
    www.ust.hk
  • View the source and find it pretty-print with
    indentation

7
Bonus JSP Brief Overview
  • J2EE tutorial Chap 12, 13
  • static data JSP elements, which construct
    dynamic content
  • Custom tag in tag library for reuse
  • Use JavaBeans components for interfacing
  • Properties - Read/write, read-only, or write-only
  • Properties - Simple, which means it contains a
    single value, or indexed, which means it
    represents an array of values
  • For each readable property, the bean must have a
    method of the form  PropertyClass getProperty()
    ...
  • 2 syntaxes - standard and XML
  • Life Cycle
  • Translation and Compilation
  • Execution

8
JSP Translation Compilation
  • Directives are used to control how the Web
    container translates and executes the JSP page.
  • Scripting elements are inserted into the JSP
    page's servlet class.
  • Expression language expressions are passed as
    parameters to calls to the JSP expression
    evaluator.
  • jspsetgetProperty elements are converted into
    method calls to JavaBeans components.
  • jspincludeforward elements are converted into
    invocations of the Java Servlet API.
  • The jspplugin element is converted into
    browser-specific markup for activating an applet.
  • Custom tags are converted into calls to the tag
    handler that implements the custom tag.

9
JSP Example standard vs XML syntax
  • lt_at_ taglib uri"http//java.sun.com/jsp/jstl/core"
  • prefix"c" gt
  • lt_at_ taglib uri"http//java.sun.com/jsp/jstl/funct
    ions"
  • prefix"fn" gt
  • lthtmlgt
  • ltheadgtlttitlegtHellolt/titlegtlt/headgt
  • ltbody bgcolor"white"gt
  • ltimg src"duke.waving.gif"gt
  • lth2gtMy name is Duke. What is yours?lt/h2gt
  • ltform method"get"gt
  • ltinput type"text" name"username"
    size"25"gt
  • ltpgtlt/pgt
  • ltinput type"submit" value"Submit"gt
  • ltinput type"reset" value"Reset"gt
  • lt/formgt
  • ltjspuseBean id"userNameBean"
    class"hello.UserNameBean"
  • scope"request"/gt
  • ltjspsetProperty name"userNameBean"
    property"name"
  • value"param.username" /gt
  • lthtml
  • xmlnsc"http//java.sun.com/jsp/jstl/core"
  • xmlnsfn"http//java.sun.com/jsp/jstl/functions
    " gt
  • ltheadgtlttitlegtHellolt/titlegtlt/headgt
  • ltbody bgcolor"white" /gt
  • ltimg src"duke.waving.gif" /gt
  • lth2gtMy name is Duke. What is yours?lt/h2gt
  • ltform method"get"gt
  • ltinput type"text" name"username" size"25"
    /gt
  • ltpgtlt/pgt
  • ltinput type"submit" value"Submit" /gt
  • ltinput type"reset" value"Reset" /gt
  • lt/formgt
  • ltjspuseBean id"userNameBean"
    class"hello.UserNameBean"
  • scope"request"/gt
  • ltjspsetProperty name"userNameBean"
    property"name"
  • value"param.username" /gt
  • ltcif test"fnlength(userNameBean.name) gt
    0" gt
  • ltjspdirective.include"response.jsp" /gt

10
Bonus JavaBean Brief Overview
  • A constructor that takes no parameters
  • Properties
  • Read/write, read-only, or write-only
  • Simple, which means it contains a single value,
    or indexed, which means it represents an array of
    values
  • For each readable property, the bean must have a
    method of the form  
  • PropertyClass getProperty() ...
  • For each writable property, the bean must have a
    method of the form
  • setProperty(PropertyClass pc) ...

11
Advantages of using the XML syntax
  • You can author a JSP document using one of the
    many XML-aware tools on the market, enabling you
    to ensure that your JSP document is well-formed
    XML.
  • You can validate the JSP document against a
    document type definition (DTD).
  • You can nest and scope namespaces within a JSP
    document.
  • You can use a JSP document for data interchange
    between Web applications

12
Book.jspx XML Example
  • ltbooks
  • xmlnsjsp"http//java.sun.com/JSP/Page"
  • xmlnsc"http//java.sun.com/jsp/jstl/core"
  • gt
  • ltjspuseBean id"bookDB" class"database.BookDB"
    scope"page" gt
  • ltjspsetProperty name"bookDB"
    property"database" value"bookDBAO" /gt
  • lt/jspuseBeangt
  • ltcforEach var"book" begin"0"
    items"bookDB.books"gt
  • ltbook id"book.bookId" gt
  • ltsurnamegtbook.surnamelt/surnamegt
  • ltfirstnamegtbook.firstNamelt/firstnamegt
  • lttitlegtbook.titlelt/titlegt
  • ltpricegtbook.pricelt/pricegt
  • ltyeargtbook.yearlt/yeargt
  • ltdescriptiongtbook.descriptionlt/descriptiongt
  • ltinventorygtbook.inventorylt/inventorygt
  • lt/bookgt
  • lt/cforEachgt
Write a Comment
User Comments (0)
About PowerShow.com