Title: XML Language Family Detailed Examples
1XML Language FamilyDetailed Examples
- Most information contained in these slide comes
from http//www.w3.org, http//www.zvon.org/ - These slides are intended to be used as a
tutorial on XML and related technologies - Slide authorJürgen Mangler (juergen.mangler_at_univ
ie.ac.at) - This section contains examples on
- XPath,
- XPointer
2XPath is the result of an effort to provide a
common syntax and semantics for functionality
shared between XSL Transformations XSLT and
XPointer. The primary purpose of XPath is to
address parts of an XML document.
- XPath uses a compact, non-XML syntax to
facilitate use of XPath within URIs and XML
attribute values. - XPath operates on the abstract, logical structure
of an XML document, rather than its surface
syntax. - XPath gets its name from its use of a path
notation as in URLs for navigating through the
hierarchical structure of an XML document.
3- In addition to its use for addressing, XPath is
also designed to feature a natural subset that
can be used for matching (testing whether or not
a node matches a pattern) this use of XPath is
described in XSLT (next chapter). - XPath models an XML document as a tree of nodes.
There are different types of nodes, including
element nodes, attribute nodes and text nodes.
4- The basic XPath syntax is similar to filesystem
addressing. If the path starts with the slash / ,
then it represents an absolute path to the
required element.
/AAA/CCC Select all elements CCC which are
children of the root element AAAÂ Â Â Â Â Â ltAAAgt
          ltBBB/gt           ltCCC/gt
          ltBBB/gt           ltBBB/gt
          ltDDDgt                ltBBB/gt
          lt/DDDgt           ltCCC/gt      lt/AAAgt
/AAA Select the root element AAAÂ Â Â Â Â Â ltAAAgt
          ltBBB/gt           ltCCC/gt
          ltBBB/gt           ltBBB/gt
          ltDDDgt                ltBBB/gt
          lt/DDDgt           ltCCC/gt      lt/AAAgt
5- If the path starts with // then all elements in
the document, that fulfill the criteria following
//, are selected.
//DDD/BBB Select all elements BBB which are
children of DDD      ltAAAgt           ltBBB/gt
          ltDDDgt                ltBBB/gt
          lt/DDDgt           ltCCCgt
               ltDDDgt                     ltBBB/gt
                    ltBBB/gt                lt/DDDgt
          lt/CCCgt      lt/AAAgt
//BBB Select all elements BBB Â Â Â Â Â Â ltAAAgt
          ltBBB/gt           ltDDDgt
               ltBBB/gt           lt/DDDgt
          ltCCCgt                ltDDDgt
                    ltBBB/gt                     lt
BBB/gt                lt/DDDgt           lt/CCCgt
     lt/AAAgt
6- The star selects all elements located by the
preceeding path
////BBB Select all elements BBB which have 3
ancestors      ltAAAgt           ltCCCgt
               ltDDDgt                     ltBBB/gt
               lt/DDDgt           lt/CCCgt
          ltCCCgt                ltDDDgt
                    ltBBB/gt                lt/DDDgt
          lt/CCCgt      lt/AAAgt
/AAA/CCC/DDD/ Select all elements enclosed by
elements /AAA/CCC/DDDÂ Â Â Â Â Â ltAAAgt
          ltBBB/gt           ltDDDgt
               ltBBB/gt           lt/DDDgt
          ltCCCgt                ltDDDgt
                    ltBBB/gt                     lt
BBB/gt                lt/DDDgt           lt/CCCgt
     lt/AAAgt
7- The expression in square brackets can further
specify an element. A number in the brackets
gives the position of the element in the selected
set. The function last() selects the last element
in the selection.
/papers/paperlast() Select the last BBB child
of element AAA      ltpapersgt           ltpaper
author"motschnig"/gt           ltpaper
author"derntl"/gt           ltpaper
author"motschnig"/gt           ltpaper
author"mangler"gt      lt/papersgt
/papers/paper1 Select the first BBB child of
element AAA      ltpapersgt           ltpaper
author"motschnig"gt           ltpaper
author"derntl"/gt           ltpaper
author"motschnig"/gt           ltpaper
author"mangler"/gt      lt/papersgt
8- Attributes are specified by _at_ prefix.
//student_at_matnr Select BBB elements which have
attribute id      ltstudentsgt           ltstudent
matnr"9506264"/gt           ltstudent
matnr"0002843"/gt           ltstudent
name"Hauer"/gt           ltstudent/gt
     lt/studentsgt
//_at_matnr Select all attributes _at_matnr      ltstud
entsgt           ltstudent matnr"9506264"/gt
          ltstudent matnr"0002843"/gt
          ltstudent name"Hauer"/gt
          ltstudent/gt      lt/studentsgt
//studentnot(_at_) Select BBB elements without an
attribute      ltstudentsgt           ltstudent
id"9506264"/gt           ltstudent id"0002843"/gt
          ltstudent name"Koegler"/gt
          ltstudent/gt      lt/studentsgt
//student_at_ Select BBB elements which have any
attribute      ltstudentsgt           ltstudent
id"9506264"/gt           ltstudent id"0002843"/gt
          ltstudent name"Hauer"/gt
          ltstudent/gt      lt/studentsgt
9- Values of attributes can be used as selection
criteria. Function normalize-space removes
leading and trailing spaces and replaces
sequences of whitespace characters by a single
space.
//studentnormalize-space(_at_name)'hauer' Select
BBB elements which have an attribute name with
value bbb, leading and trailing spaces are
removed before comparison       ltstudentsgt
          ltstudent matnr"9506264"/gt
          ltstudent name" hauer "/gt
          ltstudent name"hauer"/gt
     lt/studentsgt
//student_at_id'b1' Select BBB elements which
have attribute id with value b1Â Â Â Â Â Â ltstudentsgt
          ltBBB matnr"9506264"/gt
          ltBBB name" hauer "/gt           ltBBB
name"hauer"/gt      lt/studentsgt
10- Function count() counts the number of selected
elements
//count()3 Select elements which have 3
children    ltAAAgt           ltCCCgt
               ltBBB/gt                ltBBB/gt
               ltBBB/gt           lt/CCCgt
          ltDDDgt                ltBBB/gt
          lt/DDDgt           ltEEEgt
               ltCCC/gt           lt/EEEgt
     lt/AAAgt
//count(BBB)2 Select elements which have two
children BBB      ltAAAgt           ltCCCgt
               ltBBB/gt           lt/CCCgt
          ltDDDgt                ltBBB/gt
               ltBBB/gt           lt/DDDgt
          ltEEEgt                ltCCC/gt
               ltDDD/gt           lt/EEEgt
     lt/AAAgt
11- Several paths can be combined with separator
("" stands for "or", like the logical or
operator in C).
/AAA/EEE //DDD/CCC /AAA //BBB Number of
combinations is not restricted    ltAAAgt
          ltBBB/gt           ltCCC/gt
          ltDDDgt                ltCCC/gt
          lt/DDDgt           ltEEE/gt      lt/AAAgt
AAA/EEE //BBB Select all elements BBB and
elements EEE which are children of root element
AAA      ltAAAgt           ltBBB/gt
          ltCCC/gt           ltDDDgt
               ltCCC/gt           lt/DDDgt
          ltEEE/gt      lt/AAAgt
12Axes are a sophisticated concept in XML to find
out which nodes relate to each other and how.
ltparentgt ltpreceding-sibling/gt
ltpreceding-sibling/gt ltnodegt
ltdescendant/gt ltdescendant/gt lt/nodegt
ltfollowing-sibling/gt ltfollowing-sibling/gt ltpa
rentgt
preceding- sibling
following- sibling
descendant
descendant
The above example illustrates how axes work.
Starting with node an axe would select the equal
named nodes. This example is also the base for
the next two pages.
13- The following main axes are available
- the child axis contains the children of the
context node - the descendant axis contains the descendants of
the context node a descendant is a child or a
child of a child and so on thus the descendant
axis never contains attribute or namespace nodes - the parent axis contains the parent of the
context node, if there is one - the following-sibling axis contains all the
following siblings of the context node if the
context node is an attribute node or namespace
node, the following-sibling axis is empty - the preceding-sibling axis contains all the
preceding siblings of the context node if the
context node is an attribute node or namespace
node, the preceding-sibling axis is empty - (http//www.w3.org/TR/xpathaxes)
14- The child axis contains the children of the
context node. The child axis is the default axis
and it can be omitted. - The descendant axis contains the descendants of
the context node a descendant is a child or a
child of a child and so on thus the descendant
axis never contains attribute or namespace nodes.
//CCC/descendantDDD Select elements DDD which
have CCC among its ancestors      ltCCCgt
          ltDDDgt                ltEEEgt
                    lt/DDDgt                lt/EEEgt
          lt/DDDgt      lt/CCCgt
/AAA Equivalent of /childAAA Â Â Â Â Â ltAAAgt
          ltBBB/gt           ltCCC/gt      lt/AAAgt
15- XPointer is intended to be the basis of fragment
identifiers only for the text/xml and
application/xml media types (they can point only
to documents of these types). - Pointing to fragments of remote documents is
analogous to the use of anchors in HTML. Roughly
documentxpointer()
ltlink xmlnsxlink"http//www.w3.org/2000/xlink"gt
   xlinktype"simple"gt xlinkhref"mydocu
ment.xmlxpointer(//AAA/BBB1)"gt lt/linkgt
16- If there are forbidden characters in your
expression, you must deal with them somehow. - When XPointer appears in an XML document, special
characters must be escaped according to
directions in XML.
- The characters lt or must be escaped using lt
and amp. - Any unbalanced parenthesis must be escaped using
circumflex ()
ltlink xmlnsxlink"http//www.w3.org/1999/xlink"
xlinktype"simple" xlinkhref"test.xmlxpointe
r(//AAA position() lt 2)"gt Bzw. xlinkhref"tes
t.xmlxpointer(string-range('(text
in'))"gt lt/linkgt
17- If your elements have an ID-type attribute, you
can address them directly using the value of the
ID-type attribute. (Don't forget you must have
an attribute defined as an ID type in your DTD!) - Using ID-type attributes, you can easily include
or jump to parts of documents. - The example below selects node with id("b1").
xpointer(id("b1")) ltbookgt   ltbook id"b1"
name"XML"gtBad book.lt/bookgt   ltbook id"b2"
name"JAVA"gt Good book.
ltadditionalgtMakes me sleep like a
baby.lt/additionalgt   lt/bookgt   ltbook id"123"
name"42"gtAll answers on only one
page.lt/bookgtlt/bookgt
18- The specification defines one full form and one
shorthand form (which is an abbreviation of the
full one).
- Short Form /1/2/3
- Full Form xpointer(/1/2/3)
ltAAAgt   ltBBB myid"b1" bbb"111"gtText in the
first element BBB.lt/BBBgt   ltBBB myid"b2"
bbb"222"gt Text in another element BBB.
     ltDDD ddd"999"gtText in more nested
element.lt/DDDgt      ltDDD ddd"888"gtText in more
nested element.lt/DDDgt      ltDDD ddd"777"gtText
in more nested element.lt/DDDgt   lt/BBBgt   ltCCC
ccc"123" xxx"321"gtAgain some text in some
element.lt/CCCgt lt/AAAgt
19- A location of type point is defined by a node,
called the container node (node that contains the
point), and a non-negative integer, called the
index. - (//AAA, //AAA/BBB are the container nodes, 1,
2 is used if more than one container node of
the same name exists)
xpointer(start-point(//AAA)) xpointer(start-point(
range(//AAA/BBB1))) ltAAAgt?   ltBBB
bbb"111"gtlt/BBBgt   ltBBB bbb"222"gt ltDDD
ddd"999"gtlt/DDDgt   lt/BBBgt    ltCCC ccc"123"
xxx"321"/gt lt/AAAgt
ltAAAgt   ltBBB bbb"111"gtlt/BBBgt   ltBBB
bbb"222"gt ltDDD ddd"999"gtlt/DDDgt   lt/BBBgt?
   ltCCC ccc"123" xxx"321"/gt lt/AAAgt
xpointer(end-point(range(//AAA/BBB2)))
xpointer(start-point(range(//AAA/CCC)))
20- When the container node of a point is of a node
type that cannot have child nodes (such as text
nodes, comments, and processing instructions),
then the index is an index into the characters of
the string-value of the node such a point is
called a character-point. - You can use this to write a link that behaves
like a search function. It always jumps to the
first appearance of a string, e.g. the word
"another".
xpointer(start-point(string-range(//,'another',
2, 0))) ltAAAgt   ltBBB bbb"111"gtText in the
first element BBB.lt/BBBgt   ltBBB bbb"222"gt
Text in a?nother element BBB. ltDDD
ddd"999"gtText in more nested element.lt/DDDgt   lt/
BBBgt   ltCCC ccc"123" xxx"321"gtAgain some text
in some element.lt/CCCgtlt/AAAgt
21- The range function returns ranges covering the
locations in the argument location-set. For each
location x in the argument location-set, a range
location representing the covering range of x is
added to the result location set.
The range-inside function returns ranges covering
the contents of the locations in the argument
location-set.
xpointer(range(//AAA/BBB2)) ltAAAgt   ltBBB
bbb"111"/gt ltBBB bbb"222"gt  Text in
another element BBB.  lt/BBBgt    ltCCC ccc"123"
xxx"321"/gtlt/AAAgt
xpointer(range-inside(//AAA/BBB2)) ltAAAgt   ltBB
B bbb"111"/gt ltBBB bbb"222"gt  Text in
another element BBB.  lt/BBBgt    ltCCC ccc"123"
xxx"321"/gtlt/AAAgt
22- For each location x in the argument location-set,
end-point adds a location of type point to the
result location-set. That point represents the
end point of location x.
xpointer(end-point(string-range(//AAA/BBB,'another
')))  ltAAAgt   ltBBB bbb"111"gtText in the first
element BBB.lt/BBBgt   ltBBB bbb"222"gt Text
in another? element BBB. ltDDD
ddd"999"gtText in more nested element.lt/DDDgt   lt/
BBBgt   ltCCC ccc"123" xxx"321"gtAgain some text
in some element.lt/CCCgt lt/AAAgt