Title: TECH854 XML TECHNOLOGIES Programming with XML
1TECH854XML TECHNOLOGIESProgramming with XML
- Gillian Miller
- gillian_at_ics.mq.edu.au
- Room 378
2PROGRAMMING WITH XML
- Week 5 Parsing XML - SAX
- Week 6 XML to objects - DOM
- Week 7 XML Web Applications
- Week 8 XML Integration, Data Exchange and
Databases - Making XML work in real applications
- This part of the course requires programming (ie
Java) and assumes programming expertise - Plenty of examples and code tempates
3Microsoft
Java
IBM
XML Mediator
Java
Web Online Inventory
XML
Warehouse Oracle
Web Server Apache
HTML
Web Server
Java
XML
Web Online Shopping
XML Transformers
Web Content Management
XML Content
XML is format for data exchange and content
management
4Why JAVA ?
- Portability - Java Virtual machine
- Web and Application Services - J2EE standards
- Object-Oriented Language - logical and easy to
program - Java playing increasing role in e-commerce,
application integration and large information
systems arena - Industrial strength capabilities
- Scaleability
- Security model
- Performance (good middle ground)
- Multithreading
- Java includes extensive library of frameworks and
interfaces (the API) - JDBC, JAXP, Servlets, JSP,
JAXB
5XML and Java Alphabet soup
JavaBean
Java
RMI
DTD
JDK
XML
JDBC
EJB
XML-Schema
Servlets
DOM
Java WSDK
JavaDoc
JAXP
JSP
JDOM
SAX
JAX-RPC
XSLT
JAXM
WSDL
Soap
JAXR
6Java Prerequisites
public class HelloWorld public String
getHello() return Helloo public
static void main (String args) int
j2 System.out.println( Hello World
) System.out.println( getHello() j
)
What is wrong with these programs ?
public class MyCar Color carColor
public void setColor (Color color)
this.carColor color public static
void main(String args) MyCar car
car.setColor(Color.green)
7Java Prerequisites
- Classes, Objects, methods, nulls, base types
- C syntax
- Packages, imports, Javadoc, common libraries
- Inheritance, constructors
- Abstract classes, interfaces
- Event programming, callbacks, listeners
- Exception handling
- Class paths - java compilation
- Competent use of an IDE
- use of java in some context - eg database,
ticketing application, game, client/server
8Parsing XML
TextArea
Orders
HTTP Post
Product
Person
lt?xml version 1.0?gt ltordersgt ltorderitem
personid235 gt ltproductgtchoc
lt/productgt ltqtygt535lt/qtygt
lt/orderitemgt lt/ordersgt
OrderItem
Address
Servlet
DB Connection
We program with objects Objects and Arrows But
XML is simply an ASCII file
9Class ExerciseHow to read in an XML file?
lt?xml version 1.0?gt ltordersgt ltorderitem
personid235 gt ltproductgtchoc
lt/productgt ltqtygt535lt/qtygt
lt/orderitemgt lt/ordersgt
Write some pseudo code to read the XML file and
turn it into an order object ? What are some
issues ?
10XML File
SAX or DOM
Lexical Tokeniser
Parser Grammar Checker
Internal Form
11Lexical scanning/parsing
- At base level - file is sequence of ascii
characters - Scanner (lexical) returns set of tokens and deals
with whitespace - (can it throw all the whitespace away?)
- Parser - checks the XML grammar
- (how many grammars are there?)
- startdocument
- lt?xml version1.0gt - Processing Instruction
- ltorderitemgt - Tag
- lt/orderitemgt - End Tag
- 545 - characters
12Processors
- XML has well developed processors for this task
- Take advantage of this work
- Months of development time plus wide use in
market place - Issues - new standards (eg what about schema)
- Complexities - CData, external entities,
parameter entities, namespaces - Recall why XML has taken off as data interchange
format - Because we already have processors and agreed
syntax, we do not have to reinvent wheel of
syntax, grammars and parsing of one-off formats !!
13XML Parsers
- XML Processors
- Parsers determine well-formedness of document
- Optionally determine validity
- Two common models
- SAX Simple API for XML
- DOM the Document Object Model
14DOM versus SAX
xml.fujitsu.com/en/tech/dom/
15SAX Parser
- SAX provides an event based interface to the
parser. - User callbacks are associated with events for
- startElement
- endElement
- characters (text)
- startDocument etc
- The callbacks are passed the data associated with
the tag or the text of the character data. - Good for large XML files - especially where
processing required is linear. - For us this is lower level - good place to start
- however SAX requires much more programming than
other XML processors
16Step 1 - Obtain Processor
- Apache Xerces http//xml.apache.org
- Others
- Sun Microsystems Crimson
- Microsofts MSXML Parser
- IBM XML4J
- Oracle XML Parser
- Check support of SAX 2.0, DOM Level 2
- To use Xerces with Java
- Download and unzip - eg C\xerces-1_4_3
- Include in CLASSPATH
- set CLASSPATH.c\xerces-1_4_3\xerces.jar
17http//java.sun.com/xml/jaxp/dist/1.1/docs/tutoria
l/sax/2a_echo.html
18Step 2 - UnderstandFramework
DefaultHandler
http//java.sun.com/xml/jaxp/dist/1.1/docs/tutoria
l/overview/3_apis.html
19SAX API Framework
- SaxParserFactory - creates instance of factory
- SaxParser - does the work, you pass it your
DefaultHandler class - DefaultHandler - wrapper for 4 classes below
- you extend this class and implement methods you
require - ContentHandler - interface - most of the work -
methods such as startElement, endElement - ErrorHandler - handles errors - 3 methods error,
warning, fatalError - DTDHandler - DTD entities, you probably wont need
- EntityResolver - resolve external entities, you
probably wont need - SAXReader - only if you want to get low level
getXMLReader events
20Work Through Example
- We will work through an example from Sun
- This is a good tutorial for you to work through
later, and is available online - The code is detailed and intricate, you will need
to study again at your leisure. However the
tutorial covers many detailed aspects of the SAX
API and is a valuable resource. - Files - echo2.java, slideSample01.xml
http//java.sun.com/xml/jaxp/dist/1.1/docs/tutoria
l/sax/2a_echo.html
21Imports
import javax.xml.parsers.
import org.xml.sax.helpers.DefaultHandler
import org.xml.sax.
22Main Body
you extend DefaultHandler !
get instance of your class DefaultHandler
get instance of SAXParserFactory
then get the SAXParser
This does the Work handler calls itself
using callbacks
23Event Programming
- "An event driven program is just a bunch of
objects laying around waiting for an event to
happen." - Do initial setup (register handlers)
- Start program
- Program waits for event to happen
- When event happens, event handler springs into
action - Program then waits for next event
- Some Java examples - WindowListener,
ButtonHandler, SAX
24SAX Event Handlers
- startDocument()
- endDocument()
- startElement( String uri, String localname,
- String qname, Attributes atts)
- endElement(String name)
- characters (char ch, int start, int length)
- ignorableWhitespace(char ch, int start, int
length) - setDocumentLocator (Locator locator)
- uri - namespace URI
- localName - unprefixed name
- qname - with prefix eg oraelement
25Back to Echo Program
These methods are part of the Echo2 class
defined earlier
Recall line - saxParser.parse( new
File(argv0), handler) Registers this object
as the handler Then when events occur, call
thyself
26Echo Program
startElement
attrs.getLength()
attrs.getValue(i)
27Echo Program
endElement
characters
why StringBuffer ??
28Utilities
29Results of running ECHO2
30Document Locator
- public void setDocumentLocator(Locator loc)
- try
- out.write("LOCATOR")
- out.write("SYS ID " loc.getSystemId()
) - out.flush()
- catch (IOException e) // Ignore
errors -
- gtgt LOCATOR SYS ID fileltpathgt/../samples/slideSam
ple01.xml - Store locator so when callback occurs, you can
have access to file information - Must only be used within scope of the document
parse
31try, catch, throws, exception
- Javas inbuilt mechanisms for when things go
wrong - can not convert a string to a number, sql errors,
file IO - an Exception object is generated - thrown
- your job is to catch exception
- statements that cause a problem are encased in a
try statement - SAX Errors - fatalError, error, warning
32Exceptions
try // statements that could cause a
problem catch (SaxException) // error
statements
Exception Methods getMessage() getLineNumber
() getSystemId ()
33SAX Exceptions
- warnings
- usually related to DTD, informative, eg element
defined twice - fatalError
- necessitates stopping parser - eg not well-formed
- error (Non Fatal error)
- related to DTD, validating error
- Default is to keep going - you may wish to
override - public void error(SaxParseException exc) throws
SAXException - System.out.println(Parsing Warning\n
- Line exc.getLineNumber() \n
URI exc.getSystemId() \n - Message exc.getMessage())
- throw new SAXException(Warning
encountered)
34Introducing DTDs to XML
- If you introduce a DTD your parser will behave
slightly differently
extra CHARS
35DTD Processing
- Without DTD whitespace is returned
- With DTD, whitespace can be ignored when element
structure is known - BUT YOU May wish to preserve or track white space
- eg to preserve document indenting
- Use method ignorableWhiteSpace
- public void ignorableWhitespace char buf, int
offset, int Len) - throws SAXException
- nl()
- emit("IGNORABLE")
-
36Validating Documents
- Validating Documents
- Schema or DTD must be present
- the ignorableWhitespace method is invoked
whenever possible - Note that a DTD is processed even if it is not
validated - To turn validation on, you must use the
validation feature
37Using a Validating Parser
public static void main(String argv) if
(argv.length ! 1) ... // Do not use
the default (non-validating) parser // Use
the validating parser SAXParserFactory
factory SAXParserFactory.newInstance()
factory.setValidating(true) try
...
see echo10.java
38Features
- A feature is a flag used by processor to indicate
whether a certain type of processing should
occur. eg in JAXP use - factory.setNamespaceAware(true)
- factory.setValidating(true)
- Note that this is not related to the DTDHandler
interface
39LexicalHandler
- Sometimes you may wish to preserve the original
XML document as it is - eg entities lt rather than lt
- comments
- CDATA
- There is an interface called LexicalHandler
- comment(String comment)Passes comments to the
application. - startCDATA(), endCDATA()Tells when a CDATA
section is starting and ending, which tells your
application what kind of characters to expect the
next time characters() is called. - startEntity(String name), endEntity(String
name)Gives the name of a parsed entity. - startDTD(String name, String publicId, String
systemId), endDTD()Tells when a DTD is being
processed, and identifies it.
40JAXP
- JAXP - Java API For XML Processing
- makes it easier to process XML data
- Abstract Layer between program and SAX, DOM, XSL
- Provides namespace support
- The examples so far have used JAXP, which means
imports and factory calls are easier - Underneath, it still uses the DOM and SAX API
41Using SAX Without JAXP
import org.xml.sax.helpers.XMLReaderFactory impor
t org.xml.sax.XMLReader import
org.xml.sax.helpers.DefaultHandler import
org.w3c.doc.Document DefaultHandler
handler String filename XMLReader parser
XMLReaderFactory.createXMLReader() parser.setCont
entHandler(handler) // parser.setDTDHandler(handl
er) // parser.setErrorHandler(handler) parser.pa
rse(filename) // parser.setFeature(http//xml.o
rg/sax/features/namespaces, true) parser.setFeat
ure(http//xml.org/sax/features/validation,
true)
p57, Maruyama et al
42SAX 1 - Now Deprecated
Used to have to specify all methods in interface
documentHandler, HandlerBase, AttributeList,
XMLReader methods have been replaced by
equivalences in SAX 2.
http//developer.java.sun.com/developer/technicalA
rticles/xml/JavaTechandXML/
43SAX 1 - DocumentHandler
public class myXMLHandler implements
DocumentHandler public void
setDocumentLocator(Locator loc) public
void startDocument() public void
endDocument() public void
startElement (String tag, AttributeList atts)
public void endElement (String tag)
public void characters (char ch, int
start, int len) public void
ignorableWhiteSpace(char ch, int start, int
length) public void processingInstructi
on(String target, String data) and to
call XMLReader xmlReader
XMLReaderFactory.createXMLReader(PARSER_NAME) xml
Reader.setContentHandler (myXMLHandler) xmlReader
.parse ( filename.xml)
In SAX2 , this has been replaced by
ContentHandler or better still DefaultHandler
(an ADAPTER class)
44Notes
- Often StartElement and EndElement will end up
doing a lot of the processing work - You will end up with a set of nested case
statements for each element type - You may need something extra to keep track or
where you are - Consider using a Stack or a set of state
constants
45Use of States
protected static final int ROOT 0, CATALOG 1
public void startElement(String uri,
String localName, String tag, Attributes atts)
if ( tag.equals(Catalog) state
ROOT) state CATALOG else if
(tag..equals(Product) state CATALOG)
state PRODUCT id
atts.getValue(id) text
null .. public void endElement(String uri,
String lname, String tag) if
(tag.equals(Catalog) state CATALOG
..
Marchal, p23 - 25
46Some Java References
Bradley, Millspaugh, Programming with Java
Bruce Eckel Thinking in Java
Horstmann Core Java
High Level Introduction
Online as PDF
Course notes http//www.comp.mq.edu.au/courses/com
p833/ password is comp833a http//www.comp.mq.edu
.au/courses/comp824/ password is comp824dis
47Java and XML References
McLaughlin Java XML, 2nd e
Benoit Marchal Applied XML Solutions
Maruyama et al XML and Java - Second Edition
48Java Documentation Online start using it !
49Tools
- Computer, Internet, monitor (windows/unix)
- Text Editor (Notepad, vi)
- Command/ms-dos window
- XML Tools
- Xerces
- Java Development Kit
- can be downloaded from Sun - also check out the
Java Web Services Development kit - IDE -
- Bluej is in labs - it is small with nice editor -
can download from www.bluej.org for home use - Netbeans - free and more sophisticated - includes
servlets - www.netbean.org - JBuilder - in labs - need to register to get key
- JDeveloper - in labs - can use for compilation -
we cant change the library paths
50Example - Primitive Types
public class MyBasicVars ( public static void
main ( String args ) double pi
3.14 int j 1 boolean more true while (
more ) int jsquare j j System.out.print
ln ( Square is jsquare) j j 1 if
( j gt 10 ) more false
double varname int varname boolean varname
while (cond)
if (cond)