TECH854 XML TECHNOLOGIES Programming with XML REVISION Week 13 - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

TECH854 XML TECHNOLOGIES Programming with XML REVISION Week 13

Description:

Tech854 Week 13 Summary - Part A. XML. PROGRAMMING WITH XML. Week 5 : Parsing XML - SAX. Week 6 : XML to objects - DOM ... out.println(' body bgcolor=white ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 56
Provided by: gillian91
Category:

less

Transcript and Presenter's Notes

Title: TECH854 XML TECHNOLOGIES Programming with XML REVISION Week 13


1
TECH854XML TECHNOLOGIESProgramming with
XMLREVISIONWeek 13
  • Gillian Miller

2
PROGRAMMING WITH XML
  • Week 5 Parsing XML - SAX
  • Week 6 XML to objects - DOM
  • Week 7 XML Web Applications
  • Week 8 XML Integration, Data Exchange and
    Databases
  • Making XML work in real applications
  • This part of the course required programming (ie
    Java) and assumes programming expertise
  • Plenty of examples and code tempates
  • Application in Assignment 2

3
Microsoft
Java
IBM
XML Mediator
Java
Web Online Inventory
XML
Warehouse Oracle
Web Server Apache
HTML
Web Server
Java
XML
Web Online Shopping
XML Transformers
Web Content Management
XML Content
XML is format for data exchange and content
management
4
Class ExerciseHow to read in an XML file?
lt?xml version 1.0?gt ltordersgt ltorderitem
personid235 gt ltproductgtchoc
lt/productgt ltqtygt535lt/qtygt
lt/orderitemgt lt/ordersgt
Write some pseudo code to read the XML file and
turn it into an order object ? What are some
issues ?
5
Processors
  • XML has well developed processors for this task
  • Take advantage of this work
  • Months of development time plus wide use in
    market place
  • Issues - new standards (eg what about schema)
  • Complexities - CData, external entities,
    parameter entities, namespaces
  • Recall why XML has taken off as data interchange
    format
  • Because we already have processors and agreed
    syntax, we do not have to reinvent wheel of
    syntax, grammars and parsing of one-off formats !!

6
DOM versus SAX
xml.fujitsu.com/en/tech/dom/
7
SAX Parser
  • SAX provides an event based interface to the
    parser.
  • User callbacks are associated with events for
  • startElement
  • endElement
  • characters (text)
  • startDocument etc
  • The callbacks are passed the data associated with
    the tag or the text of the character data.
  • Good for large XML files - especially where
    processing required is linear.
  • For us this is lower level - good place to start
    - however SAX requires much more programming than
    other XML processors

8
Step 2 - UnderstandFramework
DefaultHandler
http//java.sun.com/xml/jaxp/dist/1.1/docs/tutoria
l/overview/3_apis.html
9
SAX Event Handlers
  • startDocument()
  • endDocument()
  • startElement( String uri, String localname,
  • String qname, Attributes atts)
  • endElement(String name)
  • characters (char ch, int start, int length)
  • ignorableWhitespace(char ch, int start, int
    length)
  • setDocumentLocator (Locator locator)
  • uri - namespace URI
  • localName - unprefixed name
  • qname - with prefix eg oraelement

10
Back to Echo Program
These methods are part of the Echo2 class
defined earlier
Recall line - saxParser.parse( new
File(argv0), handler) Registers this object
as the handler Then when events occur, call
thyself
11
Echo Program
endElement
characters
why StringBuffer ??
12
Exceptions
try // statements that could cause a
problem catch (SaxException) // error
statements
Exception Methods getMessage() getLineNumber
() getSystemId ()
13
JAXP
  • JAXP - Java API For XML Processing
  • makes it easier to process XML data
  • Abstract Layer between program and SAX, DOM, XSL
  • Provides namespace support
  • The examples so far have used JAXP, which means
    imports and factory calls are easier
  • Underneath, it still uses the DOM and SAX API

14
DOM - Document Object Model
  • Origins in W3C
  • DOM is specification to represent content and
    model of XML documents across all programming
    languages and tools
  • Based on tree model
  • A set of bindings and classes that use the DOM
    itself
  • When to use
  • If changing XML file - inserting or deleting
    elements or changing structure
  • Navigating to parts of XML file
  • Complex hierarchies
  • Memory intensive

15
Class ExerciseHow to process Trees
Design a simple class and methods for trees Now
write pseudo program to pretty print the tree
16
DOM Framework
17
DOM API Framework
  • org.w3c.dom - w3c Dom package
  • javax.xml.parsers - Java DOM API
  • DocumentBuilderFactory - creates instance of
    factory
  • DocumentBuilder - does the work
  • parse() - you call parse to do the work
  • Document - the top level document
  • Node - any sort of node in tree
  • NodeList - a list of nodes (children)
  • NamedNodeMap - use to get attributes
  • SAXParseException, SAXException,
    ParserConfigurationExeception

18

19
Node Methods
  • short getNodeType ()
  • Node.DOCUMENT_NODE, Node.ELEMENT_NODE,
    Node.TEXT_NODE etc
  • String getNodeName ()
  • String getNodeValue()
  • NodeList getChildNodes ()
  • Boolean hasChildNodes ()
  • NamedNodeMap getAttributes ()

20
nodeName and nodeValue
21
Start the recursion
theDocument builder.parse( new
File(argv0) ) // start your
processing here doit.printTree
(theDocument, argv0)
call from main
public void printTree(Document doc, String
fname) System.out.println("Pretty
Printing Tree " fname) // Start the
ball rolling from the top
treeRecursivePrint (doc, "") public
void treeRecursivePrint (Node node, String level)
System.out.println (level
prettyString (node)) .
printTree ()
doc becomes Node !
treeRecursivePrint ( node, level )
22
Recurse down tree
public void treeRecursivePrint (Node node,
String level) System.out.println
(level prettyString (node))
// now recurse attributes if
(node.getNodeType() node.ELEMENT_NODE)
NamedNodeMap attributes
node.getAttributes () if (attributes
! null) for (int i0
iltattributes.getLength() i)
treeRecursivePrint ((Node) attributes.item(i),
levelindent"gt")
// process children if
(node.hasChildNodes () ) NodeList
nodes node.getChildNodes() if
(nodes ! null) for (int i0
iltnodes.getLength() i)
treeRecursivePrint (nodes.item(i), level
indent)
// end treeRecursivePrint
recurse through attributes
recurse through children
23
Simple Creation
public static void main (String argv )
throws IOException, DOMException,
ParserConfigurationException
DomPPrint myDomPPrint new DomPPrint()
DocumentBuilderFactory dbf DocumentBuilderFactor
y.newInstance () DocumentBuilder db
dbf.newDocumentBuilder () Document doc
db.newDocument () Element root
doc.createElement ("root") Attr tmp
doc.appendChild (root)
System.out.println("Created root")
Element header doc.createElement("header")
header.appendChild (doc.createTextNode("This
is header")) root.appendChild (header)
root.appendChild (doc.createTextNode
("\nTextual contents of root
element\n ")) root.appendChild
(doc.createElement ("footer"))
SimpleDomCreate.java doc.createElement(tag) doc
.createTextNode(txt) node.appendChild
(childnode)
24
Deleting
// Now delete some children to show how
it is dome // Using for loops can get you
into trouble Node delNode root
while (delNode.hasChildNodes())
delNode.removeChild(delNode.getFirstChild())
System.out.println("After Deleting
children of root DELgt")
myDomPPrint.treeRecursivePrint(doc, "DELgt")
Must use the following construct to delete
children Note that for loops fail to
remove other children NodeList is live
Note that attributes must be deleted separately
25

Browsers
Web Server

Servlet Engine
Internet
Servlet
IE
HTTP
DB Server XML files XSLT files
TCP/IP
Netscape
Clients, HTTP and Servers
26
HTML lthtmlgtltbodygt lth1gtMy Malllt/h1gt lttablegtlttrgtlttd
gtProductlt/tdgt
XML lt?xml version1.0gt ltcatalog nameMy
Mallgt ltproductgt
Web Server
http
Web Browser
Servlet Engine
html
XML
XSLT
DataBase
  • Servlets encapuslate HTTP
  • Servlets read Client Data
  • (Forms, request headers)
  • Generate Results
  • Send Results back to client
  • (Headers, Status Results, HTML)

27
Servlets - Hello World 1
import java.io. import javax.servlet. import
javax.servlet.http. public class HelloWorld 1
extends HttpServlet public void
doGet(HttpServletRequest request,
HttpServletResponse response) throws
ServletException, IOException PrintWriter
out response.getWriter()
out.println("Hello World 1")
imports
extend HttpServlet
doGet request response
http//localhost8080/servlet/HelloWorld1
28
Servlets - Hello World 2
public class HelloWorld2 extends HttpServlet
public void doGet(HttpServletRequest
request, HttpServletResponse response) throws
IOException, ServletException
response.setContentType("text/html") PrintWriter
out response.getWriter() out.println("lthtmlgt"
) out.println("ltheadgt") String title "Hello
World" out.println("lttitlegt" title
"lt/titlegt") out.println("lt/headgt") out.println
("ltbody bgcolorwhitegt") out.println("lth1gtltfont
color\"green\"gt" title "lt/fontgtlt/h1gt") Stri
ng param request.getParameter("param") if
(param ! null)
out.println("Thanks for the lovely param 'ltbgt"
param "lt/bgt'") out.println("lt/bodygt")o
ut.println("lt/htmlgt")
setContentType getWriter
req.getParameter (parmname)
29
Web Interactivity via Forms
  • FORMS
  • Enhanced HTML documents to collect information
  • Form inputs
  • text, password, radio, checkboxes, textareas,
    selection lists, option lists
  • Submit buttons
  • method Post (usual)
  • method Get
  • Web Server processes the form inputs then
    composes the reply
  • Servlets
  • Generate the Form HTML dynamically
  • Process the result using the DoPost method

30
Issue
  • HTTP is stateless
  • Each doGet, doPost are single requests
  • HTTP does not keep track of clients
  • How to keep track of user state ?
  • eg shopping cart, accumulating information,
    saving re-entry of passwords etc
  • In CGI had to use cookies or fancy URLs or hidden
    fields in form
  • Servlets have high level functionality to deal
    with session information

31
HTTPSession
  • Session Object
  • unique for each client
  • Under hood maintained by Java using cookies or
    URL
  • Accessing the session object from HTTPRequest
  • HTTPSession session req.getSession(boolean
    create)
  • Testing the session object
  • if (session.IsNew())
  • if (session Null)
  • We can add and get objects from the session
    object
  • session.putValue (obj-name, myobject)
  • later
  • myobject session.getvalue(obj-name)

32
Framework Design Issues
  • Three-Tier Architecture A common architecture
    for servlet based applications.
  • the application logic is implemented in a set of
    helper classes.
  • Methods on objects of these classes are invoked
    by the service methods in the servlets.
  • http//www.subrahmanyam.com/articles/servlets/Serv
    letIssues.html

33
The Presentation Nightmare
  • Java Servlets contain embedded HTML statements
    with presentation code everywhere
  • What happens if you want to redesign the look and
    feel of the site?
  • What happens if you want to be browser specific ?
    Work with PDAs?
  • A maintenance nightmare !!!!!
  • JSP only helps marginally

34
The case for XSLT
  • XSLT can be used to separate data, program, logic
    and presentation
  • XML and XSLT can be developed independently of
    servlet code
  • Modularity of presentation improves maintenance
  • Can target multiple client devices via different
    stylesheets
  • Weakness
  • adds layer of abstraction and extra step, slower
    runtime performance
  • Java and XSLT, Burke, OReilly chap 4, 6

35
XSLT Conceptual Model
Servlet (controller)
request
HTML (view)
response
XSLT Processor
XML (Model)
XSLT Stylesheets (view)
36
XML - Query and Databases
  • SemiStructured Information
  • XML Query
  • XML Persistence
  • XML Data design
  • Native XML Databases
  • XML enabling and relational databases
  • Future directions

37
Convergence Of Disparate Data Frameworks
Object Model
Relational Model
Document Model
Semi-Structured Data
XML
XML - allows representation of information from
previously disparate worlds - database centric
(everything is a relation) (everything is an
object), document centric XML alllows convergence
38
The Semistructured Data Model
Bib
o1
complex object
paper
paper
book
references
o12
o24
o29
references
references
author
page
author
year
author
title
http
title
title
publisher
author
author
author
o43
25
96
1997
last
firstname
atomic object
firstname
lastname
first
lastname
243
206
Serge
Abiteboul
Victor
122
133
Vianu
Object Exchange Model (OEM)
39
X Query
  • New full powered query language for XML with
    both document-centric and data-centric
    capabilites
  • Expressive power
  • Relational joins
  • Navigation and hierarchy structure
  • Compositionality (node sets)
  • Reconstruction of new node sets
  • Combining documents
  • Filtering, sorting, functions
  • see Maier, Database desiderata for query
    languages

40
Example
Make an alphabetic list of publishers. Within
each publisher, make a list of books, each
containing a title and a price, in descending
order by price. ltpublisher_listgt FOR p IN
distinct(document("bib.xml")//publisher)
RETURN ltpublishergt ltnamegt
p/text() lt/namegt FOR b IN
document("bib.xml")//bookpublisher p
RETURN ltbookgt b/title b/price
lt/bookgt SORTBY(price DESCENDING)
lt/publishergt SORTBY(name)
lt/publisher_listgt
41
XQuery FLWR Expressions
  • A FLWR expression binds some expressions, applies
    a predicate, and constructs a new result.
  • expr can contain FLWR expressions
  • nested building blocks

FOR and LET clauses generate a list of tuples of
bound expressions, preserving document order.
WHERE clause applies a predicate, eliminating
some of the tuples
RETURN clause is executed for each surviving
tuple, generating an ordered list of outputs
42
List the titles of books published by Morgan
Kaufmann in 1998. FOR b IN document("bib.xml")//
book WHERE b/publisher "Morgan Kaufmann
AND b/year "1998" RETURN b/title
List each publisher and the average price of its
books. FOR p IN distinct(document("bib.xml")//pu
blisher) LET a avg(document("bib.xml")//book
publisher p/price) RETURN ltpublishergt
ltnamegt p/text() lt/namegt ltavgpricegt
a lt/avgpricegt lt/publishergt
43
Constructing Elements
ltbook isbn"isbn-0060229357"gt lttitlegtHarold
and the Purple Crayonlt/titlegt ltauthorgt
ltfirstgtCrockettlt/firstgt
ltlastgtJohnsonlt/lastgt lt/authorgt lt/bookgt
for i in //book RETURN ltexamplegt ltpgt Here is a
query. lt/pgt lteggt i//title lt/eggt ltpgt Here is
the result of the above query. lt/pgt lteggt i //
title lt/eggt ltsurnamegt i / author /
last(text() lt/surnamegt lt/examplegt
ltexamplegt ltpgt Here is a query. lt/pgt lteggt
i//title lt/eggt ltpgt Here is the result of the
above query. lt/pgt lteggtlttitlegtHarold and the
Purple Crayonlt/titlegtlt/eggt ltsurnamegtJohnsonlt/su
rnamegt lt/examplegt
44
FOR versus LET
  • FOR
  • Binds node variables ? iteration
  • LET
  • Binds collection variables ? one value

Returns ltresultgt ltbookgt...lt/bookgtlt/resultgt
ltresultgt ltbookgt...lt/bookgtlt/resultgt ltresultgt
ltbookgt...lt/bookgtlt/resultgt ...
FOR x IN document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
Returns ltresultgt ltbookgt...lt/bookgt
ltbookgt...lt/bookgt
ltbookgt...lt/bookgt ... lt/resultgt
LET x IN document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
45
SQL - Expressive Power
  • XQuery uses a for let where .. result
    syntax for ? SQL from where ?
    SQL where result ? SQL select let
    allows temporary variables, and has no
    equivalence in SQL - let binds to a set of nodes
  • groupby has no equivalence yet in XQuery

46
SQL Joins
ltresultgt for u in document("users.xml")/
/user_tuple for i in document("items.xml")//i
tem_tuple where u/rating gt "C" and
i/reserve_price gt 1000 and i/offered_by
u/userid return ltwarninggt
u/name u/rating
i/description
i/reserve_price lt/warninggt
lt/resultgt
1.4.4.3 Q3 Find cases where a user with a rating
worse (alphabetically, greater) than "C" is
offering an item with a reserve price of more
than 1000.
47
Is XML a Database ?
  • XML is a collection of data
  • XML is self-describing, portable and has rich
    expressiveness to represent any tree or graph
    structure
  • but verbose, needs parsing
  • XML provides
  • elementary storage (XML documents)
  • schemas (DTD.s, XML Schema)
  • query languages (XQuery, XPath)
  • programming interfaces (SAX, DOM, JDOM)

48
Storing XML
  • Flat Files
  • lightweight data storage
  • OK for some applications (configuration files,
    small in-house systems)
  • Relational Systems
  • Universal Systems (relational true XML
    extensions)
  • evolving to truer XML model
  • Native XML Database
  • Other categories
  • middleware, XML Servers, XML Appplication Servers
    (Zope, Cocoon), Content management (Documentum,
    Vignette), caching systems

http//www.rpbourret.com/xml/XMLDatabaseProds.htm
49
Pure Relational Databases
  • Convert your XML file to relational design
  • For each complex element, create a table CE and
    primary key
  • For each element with mixed content, create a
    separate table to store PCData, with link back to
    parent element using foreign key
  • For each single element and attribute, create a
    column in table CE
  • Repeat for each complex child element, using
    foreign keys to link back to parent

or do proper analysis using ERA !!
50
Containment or Pointers
containment
pointers
51
Storing XML in Relational Databases - CLOB
  • Store as string (Character Large Object)
  • E.g. store each top level element as a string
    field of a tuple in a database
  • Use a separate relation for each top-level
    element type
  • E.g. account, customer, depositor
  • Indexing
  • Store values of subelements/attributes as extra
    indexes
  • Benefits
  • Can store any XML data even without DTD
  • As long as there are many top-level elements in a
    document, strings are small compared to full
    document, allowing faster access to individual
    elements.
  • Drawback Need to parse strings to access values
    inside the elements parsing is slow.

52
Storing XML as Relations
  • Tree representation model XML data as general
    tree and store using relations
  • nodes(id, type, label, value)
  • child (child-id, parent-id)
  • Each element/attribute is given a unique
    identifier
  • Type (element_or_attribute), labeltag,
    valuecontent
  • child notes the parent-child relationships in
    the tree
  • Can add an extra attribute to child to record
    ordering of children
  • Benefit Can store any XML data, even without DTD
  • Drawbacks
  • Data is broken up into too many pieces,
    increasing space overheads
  • Even simple queries require a large number of
    joins, which can be slow

53
The Semistructured Data Model
Bib
o1
complex object
paper
paper
book
references
o12
o24
o29
references
references
author
page
author
year
author
title
http
title
title
publisher
author
author
author
o43
25
96
1997
last
firstname
atomic object
firstname
lastname
first
lastname
243
206
Serge
Abiteboul
Victor
122
133
Vianu
Object Exchange Model (OEM)
54
Mismatches between XML/RDBMS
  • XML
  • Data in single hierarchy
  • Nodes have elements and/or attribute values
  • Elements are nested
  • Elements are ordered
  • Schema optional
  • Direct storage/retrieval of simple docs
  • Query with XML standards
  • RDBMS (normalized)
  • Data in multiple tables
  • Cells have single value
  • Atomic cell values
  • Row/column order not defined
  • Schema required
  • Joins necessary to retrieve simple docs
  • Query with SQL retrofitted for XML

Michael Champion, Storing XML in Databases
But XML hierarchy does not address issues
of redundant information or different nestings -
the whole rationale for RDBMS !!
55
XML Universal Databases
  • Future trends
  • Relational databases are being XML enabled to
    become truer XML repositories
  • Moving beyond simple XML wrappers around
    relations
  • and/or storing XML as LOB (binary, character)
  • Trend towards unification of XML content and
    relational data
  • New XML architectures and features (dynamic XML
    Views, automatic mappings)
  • see for example - Seybold report, Oracle XML DB
    Uniting XML Content and Data, 2002
  • XTables - Bridging Relational Technologu and XML
    - IBM 2002
Write a Comment
User Comments (0)
About PowerShow.com