XML Tools - PowerPoint PPT Presentation

About This Presentation
Title:

XML Tools

Description:

(annotated) Well-formedness checks & reference expansion. DTD or XML schema. storage ... (new Sequence('cs.xml')).child('gradstudent').child('name').print ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 26
Provided by: lambd
Learn more at: https://lambda.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: XML Tools


1
XML Tools
  • Leonidas Fegaras

2
XML Processing
Well-formedness checks reference expansion
document parser
document validator
application
XML infoset
XML infoset (annotated)
XML document
DTD or XML schema
storage system
3
Tools for XML Processing
  • DOM a language-neutral interface for
    manipulating XML data
  • requires that the entire document be in memory
  • SAX push-based stream processing
  • hard to write non-trivial applications
  • XPath a declarative tree-navigation language
  • beautiful and easy to use
  • is part of many other languages
  • XSLT a language for transforming XML based on
    templates
  • very ugly!
  • XQuery full-fledged query language
  • influenced by OQL
  • XmlPull pull-based stream processing
  • far better than SAX, but not a standard yet

4
DOM
  • The Document Object Model (DOM) is a platform-
    and language-neutral interface that allows
    programs and scripts to dynamically access and
    update the content and structure of XML
    documents.
  • The following is part of the DOM interface
  • public interface Node
  • public String getNodeName ()
  • public String getNodeValue ()
  • public NodeList getChildNodes ()
  • public NamedNodeMap getAttributes ()
  • public interface Element extends Node
  • public Node getElementsByTagName ( String name
    )
  • public interface Document extends Node
  • public Element getDocumentElement ()
  • public interface NodeList
  • public int getLength ()
  • public Node item ( int index )

5
DOM Example
  • import java.io.File
  • import javax.xml.parsers.
  • import org.w3c.dom.
  • class Test
  • public static void main ( String args ) throws
    Exception
  • DocumentBuilderFactory dbf DocumentBuilderFacto
    ry.newInstance()
  • DocumentBuilder db dbf.newDocumentBuilder()
  • Document doc db.parse(new File("depts.xml"))
  • NodeList nodes doc.getDocumentElement().getChil
    dNodes()
  • for (int i0 iltnodes.getLength() i)
  • Node n nodes.item(i)
  • NodeList ndl n.getChildNodes()
  • for (int k0 kltndl.getLength() k)
  • Node m ndl.item(k)
  • if ( (m.getNodeName() "dept")
  • (m.getFirstChild().getNodeValue() "cse")
    )
  • NodeList ncl ((Element)
    m).getElementsByTagName("tel")
  • for (int j0 jltncl.getLength() j)

/dept/text()cse/tel/text()
6
Better Programming
  • import java.io.File
  • import javax.xml.parsers.
  • import org.w3c.dom.
  • import java.util.Vector
  • class Sequence extends Vector
  • Sequence () super()
  • Sequence ( String filename ) throws Exception
  • super()
  • DocumentBuilderFactory dbf
  • DocumentBuilderFactory.newInstance()
  • DocumentBuilder db dbf.newDocumentBuilder()
  • Document doc db.parse(new File(filename))
  • add((Object) doc.getDocumentElement())

Sequence child ( String tagname )
Sequence result new Sequence() for
(int i 0 iltsize() i) Node n
(Node) elementAt(i) NodeList c
n.getChildNodes() for (int k 0
kltc.getLength() k) if (c.item(k).getNodeName(
).equals(tagname)) result.add((Object)
c.item(k)) return result
void print () for (int i 0
iltsize() i) System.out.println(e
lementAt(i).toString())
class DOM public static void main ( String
args ) throws Exception (new
Sequence("cs.xml")).child("gradstudent").child("na
me").print()
7
SAX
  • SAX is a Simple API for XML that allows you to
    process a document as it's being read
  • in contrast to DOM, which requires the entire
    document to be read before it takes any action)
  • The SAX API is event based
  • The XML parser sends events, such as the start or
    the end of an element, to an event handler, which
    processes the information

8
Parser Events
  • Receive notification of the beginning of a
    document
  • void startDocument ()
  • Receive notification of the end of a document
  • void endDocument ()
  • Receive notification of the beginning of an
    element
  • void startElement ( String namespace, String
    localName,
  • String qName, Attributes atts )
  • Receive notification of the end of an element
  • void endElement ( String namespace, String
    localName,
  • String qName )
  • Receive notification of character data
  • void characters ( char ch, int start, int
    length )

9
SAX Example a Printer
  • import java.io.FileReader
  • import javax.xml.parsers.
  • import org.xml.sax.
  • import org.xml.sax.helpers.
  • class Printer extends DefaultHandler
  • public Printer () super()
  • public void startDocument ()
  • public void endDocument () System.out.println(
    )
  • public void startElement ( String uri, String
    name,
  • String tag, Attributes atts )
  • System.out.print(lt tag gt)
  • public void endElement ( String uri, String
    name, String tag )
  • System.out.print(lt/ tag gt)
  • public void characters ( char text, int
    start, int length )
  • System.out.print(new String(text,start,lengt
    h))

10
The Child Handler
  • class Child extends DefaultHandler
  • DefaultHandler next // the next handler in
    the pipeline
  • String ptag // the tagname of the child
  • boolean keep // are we keeping or skipping
    events?
  • short level // the depth level of the
    current element
  • public Child ( String s, DefaultHandler n )
  • super()
  • next n ptag s
  • keep false level 0
  • public void startDocument () throws
    SAXException
  • next.startDocument()
  • public void endDocument () throws
    SAXException
  • next.endDocument()

11
The Child Handler (cont.)
  • public void startElement ( String nm, String
    ln, String qn, Attributes a ) throws SAXException
  • if (level 1)
  • keep ptag.equals(qn)
  • if (keep)
  • next.startElement(nm,ln,qn,a)
  • public void endElement ( String nm, String
    ln, String qn ) throws SAXException
  • if (keep)
  • next.endElement(nm,ln,qn)
  • if (--level 1)
  • keep false
  • public void characters ( char text, int
    start, int length ) throws SAXException
  • if (keep)
  • next.characters(text,start,length)

12
Forming the Pipeline
  • class SAX
  • public static void main ( String args )
    throws Exception
  • SAXParserFactory pf SAXParserFactory.new
    Instance()
  • SAXParser parser pf.newSAXParser()
  • DefaultHandler handler
  • new Child("gradstudent",
  • new Child("name",
  • new Printer()))
  • parser.parse(new InputSource(new
    FileReader("cs.xml")),
  • handler)

Childname
Printer
SAX parser
Childgradstudent
13
Example
  • Input Stream
  • ltdepartmentgt
  • ltdeptnamegt
  • Computer Science
  • lt/deptnamegt
  • ltgradstudentgt
  • ltnamegt
  • ltlastnamegt
  • Smith
  • lt/lastnamegt
  • ltfirstnamegt
  • John
  • lt/firstnamegt
  • lt/namegt
  • lt/gradstudentgt
  • ...
  • lt/departmentgt

SAX Events SD SE department SE deptname C
Computer Science EE deptname SE gradstudent SE
name SE lastname C Smith EE lastname SE
firstname C John EE firstname EE name EE
gradstudent ... EE department ED
Child gradstudent
Child name
Printer
14
XmlPull
  • Unlike SAX, you pull events from document
  • Create a pull parser
  • XmlPullParser xpp
  • xpp factory.newPullParser()
  • Pull the next event xpp.getEventType()
  • Type of events
  • START_TAG
  • END_TAG
  • TEXT
  • START_DOCUMENT
  • END_DOCUMENT
  • More information at
  • http//www.xmlpull.org/

15
Better XmlPull Events
  • class Attributes
  • public String names
  • public String values
  • abstract class Event
  • class StartTag extends Event
  • public String tag
  • public Attributes attributes
  • class EndTag extends Event
  • public String tag
  • class CData extends Event
  • public String text
  • class EOS extends Event

16
Iterators
  • import org.xmlpull.v1.XmlPullParser
  • import org.xmlpull.v1.XmlPullParserFactory
  • abstract class Iterator
  • abstract public void open () // open the
    stream iterator
  • abstract public void close () // close the
    stream iterator
  • abstract public Event next () // get the
    next tuple from stream
  • abstract class Filter extends Iterator
  • Iterator input

17
Document Reader
  • class Document extends Iterator
  • String path
  • int state
  • FileReader reader
  • XmlPullParser xpp
  • static XmlPullParserFactory factory
  • Event getEvent ()
  • int eventType xpp.getEventType()
  • if (eventType XmlPullParser.START_TAG)
  • int len xpp.getAttributeCount()
  • String names new Stringlen
  • String values new Stringlen
  • for (int i 0 iltlen i)
  • namesi xpp.getAttributeName(i)
  • valuesi xpp.getAttributeValue(i)
  • return new StartTag(xpp.getName(),new
    Attributes(names,values))
  • else if (eventType XmlPullParser.END_TAG
    )
  • return new EndTag(xpp.getName())

18
Document Reader (cont.)
  • public void open ()
  • reader new FileReader(path)
  • xpp factory.newPullParser()
  • xpp.setInput(reader)
  • state 0
  • public void close ()
  • reader.close()
  • public Event next ()
  • if (state gt 0)
  • state
  • if (state 2)
  • return new EOS()
  • Event e getEvent()
  • if (xpp.getEventType() ! XmlPullParser.END_DOCUM
    ENT)
  • xpp.next()
  • return e

19
The Child Iterator
  • class Child extends Filter
  • String tag
  • short nest // the nesting level of the
    event
  • boolean keep // are we in keeping mode?
  • public void open () keep false nest 0
    input.open()
  • public Event next ()
  • while (true)
  • Event t input.next()
  • if (t instanceof EOS)
  • return t
  • else if (t instanceof StartTag)
  • if (nest 1)
  • keep tag.equals(((StartTag) t).tag)
  • if (!keep)
  • continue
  • else if (t instanceof EndTag)
  • if (--nest 1 keep)
  • keep false

20
XSL Transformation
  • A stylesheet specification language for
    converting XML documents into various forms (XML,
    HTML, plain text, etc).
  • Can transform each XML element into another
    element, add new elements into the output file,
    or remove elements.
  • Can rearrange and sort elements, test and make
    decisions about which elements to display, and
    much more.
  • Based on XPath
  • ltxslstylesheet version1.0
  • xmlnsxslhttp//www.w3.org/1999/XSL/Transform
    gt
  • ltstudentsgt
  • ltxslcopy-of select//student/name/gt
  • lt/studentsgt
  • lt/xslstylesheetgt

21
XSLT Templates
  • XSL uses XPath to define parts of the source
    document that match one or more predefined
    templates.
  • When a match is found, XSLT will transform the
    matching part of the source document into the
    result document.
  • The parts of the source document that do not
    match a template will end up unmodified in the
    result document (they will use the default
    templates).
  • Form
  • ltxsltemplate matchXPath expressiongt
  • lt/xsltemplategt
  • The default (implicit) templates visit all nodes
    and strip out all tags
  • ltxsltemplate match/gt
  • ltxslapply-templates/gt
  • lt/xsltemplategt
  • ltxsltemplate matchtext()_at_"gt
  • ltxslvalue-of select./gt
  • lt/xsltemplategt

22
Other XSLT Elements
  • ltxslvalue-of selectXPath expression/gt
  • select the value of an XML element and add it to
    the output stream of the transformation, e.g.
    ltxslvalue-of select"//books/book/author"/gt.
  • ltxslcopy-of selectXPath expression/gt
  • copy the entire XML element to the output stream
    of the transformation.
  • ltxslapply-templates matchXPath expression/gt
  • apply the template rules to the elements that
    match the XPath expression.
  • ltxslelement nameXPath expressiongt
    lt/xslelementgt
  • add an element to the output with a tag-name
    derived from the XPath.
  • Example
  • ltxslstylesheet version 1.0
  • xmlnsxslhttp//www.w3.org/1999/XSL/Tra
    nsformgt
  • ltxsltemplate match"employee"gt
  • ltbgt ltxslapply-templates select"node()"/gt
    lt/bgt
  • lt/xsltemplategt
  • ltxsltemplate match"surname"gt
  • ltigt ltxslvalue-of select"."/gt lt/igt
  • lt/xsltemplategt
  • lt/xslstylesheetgt

23
Copy the Entire Document
  • ltxslstylesheet version 1.0
  • xmlnsxslhttp//www.w3.org/1999/XSL/Transfo
    rmgt
  • ltxsltemplate match/"gt
  • ltxslapply-templates/gt
  • lt/xsltemplategt
  • ltxsltemplate matchtext()"gt
  • ltxslvalue-of select./gt
  • lt/xsltemplategt
  • ltxsltemplate match"gt
  • ltxslelement namename(.)gt
  • ltxslapply-templates/gt
  • lt/xslelementgt
  • lt/xsltemplategt
  • lt/xslstylesheetgt

24
More on XSLT
  • Conflict resolution more specific templates
    overwrite more general templates. Templates are
    assigned default priorities, but they can be
    overwritten using priorityn in a template.
  • Modes can be used to group together templates. No
    mode is an empty mode.
  • ltxsltemplate match modeAgt
  • ltxslapply-templates modeB/gt
  • lt/xsltemplategt
  • Conditional and loop statements
  • ltxslif testXPath predicategt body lt/xslifgt
  • ltxslfor-each selectXPathgt body
    lt/xslfor-eachgt
  • Variables can be used to name data
  • ltxslvariable namexgt value lt/xslvariablegt
  • Variables are used as x in XPaths.

25
Using XSLT
  • import javax.xml.parsers.
  • import org.xml.sax.
  • import org.w3c.dom.
  • import javax.xml.transform.
  • import javax.xml. . transform.dom.
  • import javax.xml.transformstream.
  • import java.io.
  • class XSLT
  • public static void main ( String argv )
    throws Exception
  • File stylesheet new File("x.xsl")
  • File xmlfile new File("a.xml")
  • DocumentBuilderFactory dbf DocumentBuilderFacto
    ry.newInstance()
  • DocumentBuilder db dbf.newDocumentBuilder()
  • Document document db.parse(xmlfile)
  • StreamSource stylesource new
    StreamSource(stylesheet)
  • TransformerFactory tf TransformerFactory.newIns
    tance()
  • Transformer transformer tf.newTransformer(style
    source)
  • DOMSource source new DOMSource(document)
Write a Comment
User Comments (0)
About PowerShow.com