SAX - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

SAX

Description:

used for 'one-pass' processing. do not need to build entire ... stream-based processing. SAX parsers are small and fast because they only process relevant data ... – PowerPoint PPT presentation

Number of Views:368
Avg rating:3.0/5.0
Slides: 32
Provided by: xpad
Category:
Tags: sax | inspection

less

Transcript and Presenter's Notes

Title: SAX


1
SAX
  • The Simple API for XML

2
the SAX concept
  • stream-based processing
  • application code applied to each chunk of XML as
    it is parsed
  • incremental processing
  • discard irrelevant information immediately
  • data-structure flexibility
  • flexible architecture for interoperability with
    other components
  • flexible internal data structure design

3
SAX packages
  • org.xml.sax
  • core SAX interfaces and exception classes
  • two concrete classes
  • SAX1 and SAX2 support
  • core to all SAX distributions
  • org.xml.sax.helpers
  • utility implementations of core interfaces
  • org.xml.sax.ext
  • SAX extension handlers (extra events)

4
SAX2 distributions
  • Aelfred2
  • SAX2 version of original Aelfred parser
  • part of GNU JAXP Library
  • Crimson
  • reference implementation for JAXP
  • distributed with JDK 1.4
  • part of Apache Project
  • Xerces v2
  • also part of Apache project

5
Using SAX
producer
consumer
  • initiate a parser (producer)
  • stream events to a ContentHandler (consumer)
  • may also implement
  • ErrorHandler
  • DTDHandler
  • EntityResolver

6
SAX Architecture
Class to Handle Content
event stream
SAX parser
parse
ContentHandler Interface
XML Source
7
Event Stream
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltcharacters seriesfatherted"gt
  • ltcharactergt
  • Ted Crilly
  • lt/charactergt
  • ltcharacter jobcurate"gt
  • Dougal
  • ltInlinegtRollerbladinglt/Inlinegt
  • McGuire
  • lt/charactergt
  • lt/charactergt

8
ContentHandler Interface
  • Methods for handling events generated by parsing
    the XML includes
  • startElement(String uri, String localName, String
    qName, Attributes atts)
  • Fires whenever and element is found
  • characters(char ch, int start, int length)
  • Fires when character data found

9
SAX Example
  • Write a SAX program to count features in an XML
    document
  • Input
  • XML document
  • Output
  • Number of elements
  • Number of attributes
  • Number of Processing Instructions
  • Number of characters in text data

10
SAX example
import org.xml.sax. import org.xml.sax.helpers.
import javax.xml.parsers. import
java.io.IOException public class
DocumentStatistics public static void
main(String args) XMLReader parser try
SAXParserFactory factory factory
SAXParserFactory.newInstance () parser
factory.newSAXParser().getXMLReader()
set up the parser
11
SAX example
catch (FactoryConfigurationError e)
System.err.println (e.getMessage ())
return catch (ParserConfigurationException e)
System.err.println (e.getMessage ())
return catch (SAXException e)
System.err.println ("no SAX parser") return
handle exceptions
12
SAX example
parser.setContentHandler (new XMLCounter ())
for (int i 0 i lt args.length i) try
parser.parse (args i) catch
(SAXParseException e) // not well-formed
System.out.println (e.getMessage ())
catch (SAXException e) // some other
error System.out.println (e.getMessage ())
catch (IOException e) //
error at lower level System.out.println
(e.getMessage ())
parse and pass to content handler
13
SAX example
import org.xml.sax. public class XMLCounter
implements ContentHandler private int
numberOfElements private int
numberOfAttributes private int
numberOfProcessingInstructions private int
numberOfCharacters public void
startDocument() throws SAXException
numberOfElements 0 numberOfAttributes
0 numberOfProcessingInstructions 0
numberOfCharacters 0
define the content handler
14
SAX example
define the action when an element is detected
// this method counts the number of
elements public void startElement (String
namespaceURI, String localName, String
qName, Attributes atts) throws SAXException
numberOfElements numberOfAttributes
atts.getLength()
15
SAX example
define the action when characters are detected
public void characters(char ch, int
start, int length) throws SAXException
numberOfCharacters length
16
SAX example
define the action when ignorableWhitespace is
detected
public void ignorableWhitespace(char ch, int
start, int length) throws SAXException
numberOfCharacters length
17
SAX example
define the action when a processingInstruction is
detected
public void processingInstruction( String
target, String data) throws SAXException
numberOfProcessingInstructions
18
SAX example
define the action when the end of the document is
detected
public void endDocument() throws SAXException
System.out.println ("Number of elements "
numberOfElements) System.out.println ("Number
of attributes " numberOfAttributes)
System.out.println ("Number of characters "
numberOfCharacters) System.out.println ("Number
of processing instructions
numberOfProcessingInstructions)
19
SAX example
implement the rest of the ContentHandler interface
public void endElement(String namespaceURI,
String localName, String qName) throws
SAXException public void endPrefixMapping(Strin
g prefix) throws SAXException public void
setDocumentLocator(Locator locator) public
void skippedEntity(String name) throws
SAXException public void
startPrefixMapping(String prefix, String uri)
throws SAXException
20
XML Programming Models
21
different approaches
  • XML as text
  • inspection, creation, modification using text
    editors (TextPad, Emacs, Notepad. etc)
  • search and replace using regular expressions
    (Perl, Java, etc)
  • text processors can be used in conjunction with
    other tools such as XSLT
  • NB text processors must support Unicode

22
different approaches
  • XML as a stream of events (e.g. SAX)
  • events-based parsers produce an event stream
  • use a finite state machine model to process these
    events
  • can be tricky to program
  • used for one-pass processing
  • do not need to build entire document structure
  • fast and efficient

23
event stream from XML
ltnamegtltgivengtSandylt/givengtltfamilygtBrownleelt/family
gtlt/namegt might result in the following event
stream startElementname startElementgiven conten
tSandy endElementgiven startElementfamily conte
ntBrownlee endElementfamily endElementname
A
B
C
D
24
stream-based processing
  • SAX parsers are small and fast because they only
    process relevant data
  • smaller code is more robust and more secure
  • scales well to multiple process calls e.g. on a
    web server
  • fits well with streaming data across a network

25
data structure flexibility
  • SAX can parse XML data into suitable structures
    for other components
  • e.g. EDI format, messaging formats
  • specialised, strongly-typed data structures are
    essential for many purposes (DOM too generic)

26
SAX drawbacks
  • no random access to XML data
  • forward-only pass in document order
  • cannot refer to downstream data
  • like referring to objects in client-side
    JavaScript
  • re-scanning acceptable for small files or cached
    data

27
different approaches
  • XML as a tree
  • well-formed XML has a natural tree structure
  • tree programming models provide an API to
    manipulate the tree
  • XPath, DOM, Infoset, PSVI
  • manipulable model of the entire document
  • easier to program, costly on memory
  • navigation can be cumbersome

28

XSLT approach
  • XSLT used for format conversions
  • May be used in conjunction with SAX and DOM for
    pre- and post-processing
  • XSLT parsers available as stand-alone or
    embeddable components
  • cumbersome for complex processing

29
DOM approach
  • Powerful API for complex data manipulation
  • complex data structure processing
  • tree-walking
  • manipulation of DOM interfaces
  • memory-hungry
  • large documents create many nodes
  • multiple simultaneous access
  • tree-structure may not be relevant to processing
  • simple processing of data items
  • location irrelevant

30
SAX vs DOM memory consumption
  • typical DOM implementation allocates 10bytes of
    memory per byte of XML data to build the DOM tree
  • 3Mb (mid-sized) data file requires 30Mb memory!
  • SAX only puts relevant content into data
    structures in memory

31
XML processing issues
  • parser differences
  • parsers handle content differently
  • omission of comments
  • replacement of entity references
  • non-validating parsers may not retrieve external
    DTD
  • use of comments
  • only for human-readable information
  • not for illegitimate content (c.f. HTML)
  • parsers may well ignore comments
Write a Comment
User Comments (0)
About PowerShow.com