Title: Using XQuery and XSLT on NonXML Data
1Using XQuery and XSLT on Non-XML Data
- XML 2007 1445 on 04 Dec 2007
- Tony Lavinio (alavinio(at)datadirect.com)
2Tinkertoy, TOGL, LEGO
- Everything in my childhood always connected
together nicely. - Then I discovered life was not filled with
interchangeable parts. - Lets do something about it.
3TSaxon Replaced XML with HTML Parser
java -jar saxon.jar -H html-doc style-doc
- How simple is that?
- Just replace the parser!
- Can we generalize that mechanism?
4TSaxon Replaced XML with HTML Parser
java -jar saxon.jar -H html-doc style-doc
- How simple is that?
- Just replace the parser!
- Can we generalize that mechanism?
- It would be like a universal adapter
5GUI Tools Demo
- Open CSV file with notepad.exe
- Open CSV file in GUI tool as XML
- Open EDI file with notepad.exe
- Open EDI file in GUI tool as XML
6How does it work?
7The URIResolver Interface
8How Were Going To Do It
- Typically, catalogs allow reaching otherwise
unreachable resources. - In this case, we are using them to reach normally
reachable resources, but also to transform them
mid-flight. - Non-XML ? Converter ? XML ? XQuery/XSLT
- or
- XQuery/XSLT ? XML ? Converter ? Non-XML
9Why Were Going To Do It
- We Must Transform Data
- XSLT and XQuery are excellent tools for
transforming. Use the proper tool for the job. - But they work with XML, and much of the world
isnt XML. - Really, they work with the XML data model, and
XML is just a convenient representation, right?
So they should be able to work with anything
XMLish anything tree-shaped or even square.
10Use the Source, Luke
- Source
- ?StreamSource
- Simplest to codejust write text.
- ?DOMSource
- Nothing good to say about DOM.
- ?SAXSource
- Offers a push event interface.
- ?StAXSource
- Offers a pull interface, harder to implement.
- (StAX and possibly SAX can provide string
pooling, which can also help performance.)
11Comma-Separated Value file sample
(What is this fascination we have with lists of
books in our demos?) (And, no, Walden Two is not
the sequel to Walden, despite what modern
movie-goers might think.)
12CSV Demo from Command Line
- java -cp binsaxon9.jarnet.sf.saxon.Query-r
com.ddtek.xml2007.CSVResolver-s
x-csvfile///c/XML_2007/books.txt-u"lthtmlgtltbo
dygt.lt/bodygtlt/htmlgt" - java -cp binsaxon9.jarnet.sf.saxon.Transform-r
com.ddtek.xml2007.CSVResolver-s
x-csvfile///c/XML_2007/books.txt-utable.xsl
13table.xsl
- lt?xml version"1.0" encoding"ASCII"?gt
- ltxsltransform xmlnsxsl"http//www.w3.org/1999/X
SL/Transform" version"2.0"gt - ltxsloutput method"html" encoding"ASCII"/gt
- ltxsltemplate match"/"gt
- lthtmlgt
- ltbodygt
- ltxslcopy-of select"."/gt
- lt/bodygt
- lt/htmlgt
- lt/xsltemplategt
- lt/xsltransformgt
14EDI X12 file sample
- ISA0000ZZISACUST9
0892541100600706071458U00401820
0Pgt'GSFAGSCUST95137624388200706071458820X
004010'ST9970001'AK1AG38'AK9A111'SE4000
1'GE1820'IEA1820'
15EDI Demo from Command Line
- java -cp binsaxon9.jarnet.sf.saxon.Query-r
com.ddtek.xml2007.MultiResolver-s
x-edifile///c/XML_2007/997.x12-u"for i in
(/X12/GS/GS08, ltxgt-lt/xgt, /X12/ST/ST01) return
i/text()"!omit-xml-declarationyes - java -cp binsaxon9.jarnet.sf.saxon.Transform-r
com.ddtek.xml2007.MultiResolver-s
x-edifile///c/XML_2007/997.x12-uedi.xsl
16edi.xsl
- lt?xml version"1.0"?gt
- ltxslstylesheet version"2.0" xmlnsxsl"http//ww
w.w3.org/1999/XSL/Transform"gt - ltxsloutput method"text"/gt
- ltxsltemplate match"/"gt
- ltxslvalue-of select"X12/GS/GS08"/gt
- ltxsltextgt-lt/xsltextgt
- ltxslvalue-of select"X12/ST/ST01"/gt
- lt/xsltemplategt
- lt/xslstylesheetgt
17How the CSV Resolver Works
- Look for URI with x-csv scheme
- If not found, return null which means use default
URI handling - If found, strip off the x-csv and take the
remainder as a URI - And instead of just returning a stream
containing that - Build a new stream that reads from that,
transforms it, and returns that.
18How the Multi Resolver Works
- Just like the CSV Resolver,
- Except looks for and dispatches multiple schemes
- x-csv for comma-separated-value files
- x-edi for EDI X12 files
19A StreamSource Converter
- Implements a java.io.InputStream or
java.io.Reader interface - When the caller calls read() or read(...), pull
from underlying stream - ...Translate on-the-fly enough characters to
satisfy the request (at least one) - And return that converted text instead of the
original text.
20A SAXSource Converter
- SAXSource is a little easier to write, because
instead of data being pulled through, you push it
at your convenience - Just implement a org.xml.sax.XMLReader
- When you see your parse() method get called, read
that input as fast as you please and call the
methods on the ContentHandler (and maybe
LexicalHandler, etc.) you were given
21Please, Dont Do This
- If driving the input via SAXSource, please dont
do this to start your XML - content.processingInstruction( "xml", "version'
1.0' encoding'utf-8'") - because
22Its not a PI!
23References
- DataDirect XQuery blog
- http//www.xml-connection.com/
- DataDirect XML Converters
- http//www.xmlconverters.com/
- Stylus Studio
- http//www.stylusstudio.com/
- Saxonica
- http//www.saxonica.com/
- XML Catalogs
- http//www.oasis-open.org/committees/
?download.php/14809/xml-catalogs.html - The official web site for SAX
- http//www.saxproject.org/
24Explanation of the attached code
.zip file with source and data.
- Dont read this now! This is reference for after
the conference. - CSVResolver is the simple CSV resolver. It is a
subset of the MultiResolver. - MultiResolver is a resolver that can handle CSV
or EDI. - CsvToXmlSaxReader reads CSV files and emits them
as a series of SAX events.See also
EdiToXmlSaxReader for the EDI equivalent.This is
a push interface, where we push the data
through. See also CsvToXmlStreamReader, which is
a pull interface since the caller pulls, or
requests data from us.StAX is a very effective
interface that offers event-driven access like
SAX but uses a pull paradigm like Reader. It
often results in superior performance, at the
cost of a considerably larger API and
consequently more complicated implementation. I
didn't do a demo of one. Maybe next year. - EdiToXmlSaxReader reads EDI files and emits them
as a series of SAX events. See also
CsvToXmlSaxReader for the CSV equivalent. - CsvToXmlStreamReader reads CSV files and emits
them characters through the java.io.Reader
interface.This is a pull interface, since the
caller is asking us for the data.See also
CsvToXmlSaxReader, which is a push interface,
where we do the driving. - JaxpXsltDemo uses JAXP to drive XSLT through a
converter. - SaxonXQueryDemo uses Saxon to drive XQuery
through a converter. - DataDirectXQueryDemo does the same but using
DataDirect XQuery. - DemoCSVtoXML is a little program that opens a
file and prints the contents.Then it does it
again, but inserts the CsvToXmlStreamReader which
catches the content and converts it into XML
character text in-flight. This just proves the
CsvToXmlStreamReader works. - XmlSaxReaderBase This is the code that is common
to the CsvToXmlSaxReader and EdiToXmlSaxReader
classes.
25Questions?