Title: XML
1XML
XML is based on SGML the Standard Generalised
Markup Language and like SGML is concerned with
content rather than presentation. XML is fast
becoming the basis for data storage and
communication on the web and a number of
specialised markup languages have been
developed, based on the XML standard such as -
XHTML - HTML in XML format with a more
strict definition. WML -
Wireless markup language for use inmobile devices
using WAP. SMIL - Synchronized
Multimedia Integration Language. MathML
Mathematics Markup Language. CML
Chemical Markup Language.
P. Martin - 2001 - XML - 1
2XML - Validation
Unlike HTML, XML demands that - Tags
and attributes in correct case. Tags
properly terminated. Tags properly
nested. Attribute values enclosed in
quotes. An XML document is said to be -
Well formed - if it follows the above rules.
Valid - if it conforms to a document-type
definition (DTD) A DTD is defined in the DOCTYPE
entry of an XML document, e.g.
SYSTEM refers to a local definition, PUBLIC to a
DTD published on the web, in which case the URL
is preceeded by a 'well known URI'.
P. Martin - 2001 - XML - 2
3XML - Content Namespaces
The special characters - lt, gt, , ' and " can be
embedded in XML documents by inclusion in a CDATA
(character data) section as follows -
As users can define their own tags, there is the
possibility of naming clashes when using xml from
different sources. To prevent this, one or more
namespaces are defined in the top-level tag as
follows -
A namespace definition without a prefix defines
the default namespace for the document.
P. Martin - 2001 - XML - 3
4Document Type Definition
DTD's are essentially based on EBNF (extended
backus-naur form). They can be defined in-line in
an XML document -
or, in an external document, e.g. for XHTML -
They consist essentially of ELEMENT and ATTRIBUTE
definitions which define the allowed structure of
the document in terms of the ordering of elements
(tags), the content of element bodies and the
allowed attributes within tags.
P. Martin - 2001 - XML - 4
5Element Definition
lt!ELEMENT name (composition) gt With element
names a, b, c, and d - An empty element
a - lt!ELEMENT
a EMPTYgt Element a containing sequence
b, c, d - lt!ELEMENT a (b,c,d) gt
Element a containing either b, c or d -
lt!ELEMENT a (b c d) gt Element a
containing a repetiton of one or more b's
followed by c or d - lt!ELEMENT a (
b,(c d)) gt Element a containing
parseable character data (containing xml
markup)-
lt!ELEMENT a (PCDATA) gt Element a
containing character data (not including lt,gt,,'
or ") -
lt!ELEMENT
a (CDATA) gt Nested elements described by
bracketed expressions and repetition indicated by
- zero or more, one or more, ?
zero or one.
P. Martin - 2001 - XML - 5
6Attribute Definitions
lt!ATTLIST element-name attribute-name att-type
att-qualifiers gt Attribute types (att-type) can
be one of - CDATA - general character data
excluding special characters. ID - a unique
element identifier. IDREF - reference to an ID
defined elsewhere in the document. ENTITY -
defined using an ENTITY tag (see below)
ENTITIES - a space separated list of
entity-names. NMTOKEN - a character string
consisting of alphanumerics, colons,
periods, hyphens and
underscores only. enumerated - a list of values
e.g. (v1 v2 v3 ) followed by a
default. Attribute qualifiers (att-qualifiers)-
REQUIRED - an attribute must be supplied with
the tag. IMPLIED - can be omitted, value may be
deduced. FIXED - followed by the constant value.
P. Martin - 2001 - XML - 6
7Entity Definitions
lt!ENTITY entity-name "entity value" gt or
lt!ENTITY entity-name SYSTEM "file-name"
gt allows for the definition of XML entities.
These are similar to html entities and can be
referenced within a document by referring to them
as - entity-name Parameter entities
which allow document content to be parameterised
are defined using - lt!ENTITY entity-name
"entity-value" gt Useful in conditional sections
within DTD's e.g. -
P. Martin - 2001 - XML - 7
8XML/DTD example
In-line DTD
XML document
P. Martin - 2001 - XML - 8
9XML Schemas
DTD's do not allow accurate definition of xml
content - the main content types being CDATA and
PCDATA. Differentiating between text and
different numeric types is not possible. A more
recent development is to use XML Schemas to
provide document type definitions. These, as well
as providing a rich set of data types
are themselves XML documents. Two major schema
models have been developed - W3C XML
Schema - still under development.
Microsoft XML Schema - more advanced at this
time. Whilst XML Schema definitions are at an
early stage of evolution, we will take a brief
look at Microsoft's Schema model.
P. Martin - 2001 - XML - 9
10Microsoft XML Schema
The principle tags in the Microsoft model are
Schema and ElementType e.g. -
myMessage contains (optionally) a greeting,
consisting of text and/or messages any number of
times followed by a number of messages.
P. Martin - 2001 - XML - 10
11ElementType tag
ElementType tag attributes - content - values
can be empty, eltOnly, textOnly or mixed
(default) dttype - defines the data type - many
different types supported. name - name of the
element. model - closed (only elements defined
in schema are allowed) or open
(default). order - order of child elements -
one, seq (default for eltOnly) or
many (default for mixed). ElementType allowed
child elements - description -
documentation. datatype - specifies a data type
for the ElementType element. element - child
element name. group - defines a child element
group, order and frequency. AttributeType -
defines an attribute attribute - specifies an
attribute for an element.
P. Martin - 2001 - XML - 11
12element and group tags
element tag attributes - type - name of child
element minOccurs - minimum number of
occurrences - 0 or 1 - default 1 maxOccurs -
1 or - default 1 unless ElementType content
is mixed. group tag attributes - order -
one, seq or many. minOccurs - 0 or 1 - default
1. maxOccurs - 1 or - default 1 unless
ElementType content is mixed
P. Martin - 2001 - XML - 12
13AttributeType and attribute tags
AttributeType tag attributes - default -
attributes default value. dttype - attributes
data type. dtvalue - values for an enumeration
data type. name - attribute name. required -
yes or no - default is no. attribute tag
attributes - default - default value which
overrides any defined in AttributeType. type -
name of the AttributeType for the
attribute. required - yes or no - default is no.
P. Martin - 2001 - XML - 13
14Schema data types
The following is a sample of a few of the Schema
data types defined by Microsoft - boolean -
value 0(false) or 1(true) char - single
character string - series of characters float -
a real number int - an integer value date -
date in format YYYY-MM-DD time - time in format
HHMMSS id - text uniquely defining an element
or attribute idref - a reference to an
id enumeration - a series of values defining a
type.
P. Martin - 2001 - XML - 14
15Bookstore DTD as an XML Schema
P. Martin - 2001 - XML - 15
16Parseing XML documents
Two techniques are used for parseing XML
documents - DOM (document object model)
based parser These build a tree structure in
memory as the document is parsed and provide an
API to manipulate the tree. Two examples of such
parsers are Microsoft's msxml and Sun's JAXP.
SAX (simple API for xml) parsers SAX
based parsers generate events as tags are
encountered but do not build a tree. The
document is processed as it is read rather than
afterwards. Suitable more when document is not
to be modified.
P. Martin - 2001 - XML - 16
17Using a DOM XML Parser
Using Javascript within a web page and making use
of the Mcrosoft xml parser provides a simple way
of processing XML content. The following example
illustrates a program to extract structural
information from the DOM tree.
P. Martin - 2001 - XML - 17
18Javascript DOM API Nodes Types
DOM Parsers construct a tree containing nodes.
The node types are as shown in this list of
defined constants -
Some of the document methods return a Nodelist
object which has the methods and properties shown
here -
P. Martin - 2001 - XML - 18
19Javascript DOM API - Nodes
Node methods and properties are as follows-
P. Martin - 2001 - XML - 19
20Javascript DOM API - Document
At the top-level, the document object has the
following properties and methods -
P. Martin - 2001 - XML - 20
21Javascript DOM API - Element
Element objects have the following properties and
methods -
P. Martin - 2001 - XML - 21
22Javascript DOM API - Attribute
Finally, attributes have the following properties
and text has a split method -
This API allows xml documents to be loaded from
the server, parsed, transformed (e.g. into HTML)
modified and possibly returned to the server.
P. Martin - 2001 - XML - 22
23Processing a DOM tree in JAXP -1
P. Martin - 2001 - XML - 23
24Processing a DOM tree in JAXP -2
P. Martin - 2001 - XML - 24
25XPath trees
To refer to nodes within an xml DOM tree, the
simple string manipulation language XPath is
used. This provides notation and functions to
refer to subsets of the DOM tree. Although in
the DOM API, attributes of element nodes and
values of attributes or tag body content (e.g.
text) are themselves represented by nodes, this
is not true of the XPath view of the tree, e.g. -
an XPath tree
as a DOM tree
P. Martin - 2001 - XML - 25
26XPath expressions
All node path expressions are relative to a
context or current node. It is possible to refer
backwards or forwards from the context node to
other nodes, if they exist. Hence for the xml
given here, with the context node as reference-
self reference parent library child book anc
estor library ancestor-or-self library
reference descendant book, author,
title descendant-or-self reference, book,
author,title following loan following-siblin
g loan preceding none preceeding-sibling none
attribute type namespace none
P. Martin - 2001 - XML - 26
27XPath expressions
Nodes are of type element, attribute or namespace
and the path selector just given is followed by a
node test with the following meanings -
select all nodes of the same
type node() select all nodes text() select
all text nodes comment() select all comment
nodes processing-instruction() select all
processing instruction nodes ltnode namegt select
all nodes with ltnode namegt These node tests are
appended to the selector by a symbol, e.g.
child e.g. child/childtext() selects
all text node grandchildren of current node Some
common abbreviations are - child assumed
if no selector given attribute _at_ /descendant
-or-selfnode()/ // selfnode() . parentn
ode() ..
P. Martin - 2001 - XML - 27
28Node Sets
The path expressions seen so far produce sets on
nodes, which can be manipulated by functions and
operators as follows - Operators - union
of two nodesets / specifies the root of the xml
document Functions - last() returns the
number of nodes in the set position() returns
position number of current node in
nodeset count(nodeset) returns the number of
nodes in the nodeset id(string) returns element
node with matching ID etc String functions
- concat( ) concatenates two or more string
arguments starts-with() returns true if the
second (character) arg starts the first
(string) one. Predicates can be included in path
expressions enclosed in square brackets .
P. Martin - 2001 - XML - 28
29Example use of XPath
With the following xml -
These path expressions when used as an
argument to - doc.selectNodes(".....")
produce - Expression Result /catalog/cd/p
rice all price nodes /catalog/cd all cd
nodes /catalog/cd0 first cd node /catalog/cd/pr
ice/text() text from price nodes /catalog/cdprice
gt 10.80 cd's with price gt10.80 /catalog/cdprice
gt10.80/price price nodes for cd's with
price gt 10.80
P. Martin - 2001 - XML - 29
30XSL(T) XML Stylesheets
XSL and XSLT are xml standards which allow for
the creation of stylesheets - which behave very
much like cascading stylesheets (CSS) with html,
for transforming and formatting xml
documents. XSLT XML transformation stylesheets
allow (for example) an xml document to be
presented as an html document, with at the same
time performing some processing on the document
such as selecting only certain nodes and sorting
them into a required order. The following
example takes a cd collection database held as an
XML file and generates an html table containing
cetain selected items in sorted order, with
specific one of interest highlighted -
P. Martin - 2001 - XSL - 1
31CD database example - 1
First, an extract from the database -
P. Martin - 2001 - XSL - 2
32CD database example - 2
Then the stylesheet -
P. Martin - 2001 - XSL - 3
33CD database example - 3
Finally, using ActiveX objects to transform the
xml using the stylesheet-
P. Martin - 2001 - XSL - 4
34Sports XML transform - 1
XSLT can also be used to alter an XML document as
the following example illustrates -
P. Martin - 2001 - XSL - 5
35Sports XML transform - 2
The following stylesheet transforms the
structure of the XML document -
P. Martin - 2001 - XSL - 6
36Sports XML transform - 3
The resulting XML document then becomes -
P. Martin - 2001 - XSL - 7
37Importing including stylesheets
Stylesheets cascade rather like CSS ones. If an
XML document includes or imports a stylesheet,
the stylesheet controls the transformation of the
xml. If the stylesheet itself imports or
includes other stylesheets, the templates defined
in each complement (or override) each
other. Importing and Including differ as follows
- import - imported templates have a higher
precedence than locally defined
ones. include - included templates have the
same level of precedence the one defined
latest in sequence is the one used.
P. Martin - 2001 - XSL - 8
38Example of importing stylesheet
The following example illustrates the use of
import - First the stylesheet
intro.xsl - Then the xml
document - Finally the
resulting html -
P. Martin - 2001 - XSL - 9
39Stylesheet elements
P. Martin - 2001 - XSL - 10