Title: Querying XML: XPath, XQuery, and XSLT
1Querying XML XPath, XQuery, and XSLT
- Zachary G. Ives
- University of Pennsylvania
- CIS 550 Database Information Systems
- October 27, 2005
Some slide content courtesy of Susan Davidson,
Dan Suciu, Raghu Ramakrishnan
2Reminders
- Homework 4 due 11/3
- XQuery
- Project plan due 11/3
- Milestones
- Division of responsibilities
- For non-RSS projects proposal including scope,
milestones, and what you plan to demonstrate - Recitation Friday 1130-1230, Levine 512
3Talks of Interest
- Today
- Anastassia Ailamaki, CMU, FATES
Automatically-tuned Database Storage Management,
IRCS (3401 Walnut) Room 470 _at_ 3PM - Tomorrow as part of DB Information Retrieval
Day - Anastassia Ailamaki, CMU, StagedDB Designing
Database Servers for New Hardware Trends, Wu and
Chen _at_ 11AM - Sam Madden, MIT, Data Management for Next
Generation Wireless Sensor Networks, Wu and Chen
_at_ 1230PM - Andrei Broder, Yahoo!, The next stage in Web IR
From query based Information Retrieval to context
driven Information Supply , Wu and Chen _at_ 230PM
4Querying XML
- How do you query a directed graph? a tree?
- The standard approach used by many XML,
semistructured-data, and object query languages - Define some sort of a template describing
traversals from the root of the directed graph - In XML, the basis of this template is called an
XPath
5XPaths
- In its simplest form, an XPath is like a path in
a file system - /mypath/subpath//morepath
- The XPath returns a node set representing the XML
nodes (and their subtrees) at the end of the path - XPaths can have node tests at the end, returning
only particular node types, e.g., text(),
processing-instruction(), comment(), element(),
attribute() - XPath is fundamentally an ordered language it
can query in order-aware fashion, and it returns
nodes in order
6Sample XML
- lt?xml version"1.0" encoding"ISO-8859-1" ?gt
- ltdblpgt
- ltmastersthesis mdate"2002-01-03"
key"ms/Brown92"gt - Â ltauthorgtKurt P. Brownlt/authorgt
- Â lttitlegtPRPL A Database Workload
Specification Languagelt/titlegt - Â ltyeargt1992lt/yeargt
- Â ltschoolgtUniv. of Wisconsin-Madisonlt/schoolgt
- Â lt/mastersthesisgt
- ltarticle mdate"2002-01-03" key"tr/dec/SRC1997-
018"gt - Â lteditorgtPaul R. McJoneslt/editorgt
- Â lttitlegtThe 1995 SQL Reunionlt/titlegt
- Â ltjournalgtDigital System Research Center
Reportlt/journalgt - Â ltvolumegtSRC1997-018lt/volumegt
- Â ltyeargt1997lt/yeargt
- Â lteegtdb/labs/dec/SRC1997-018.htmllt/eegt
- Â lteegthttp//www.mcjones.org/System_R/SQL_Reunio
n_95/lt/eegt - Â lt/articlegt
7XML Data Model Visualized
attribute
root
p-i
element
Root
text
dblp
?xml
mastersthesis
article
mdate
mdate
key
key
author
title
year
school
2002
editor
title
year
journal
volume
ee
ee
2002
1992
1997
The
ms/Brown92
tr/dec/
PRPL
Digital
db/labs/dec
Univ.
Paul R.
Kurt P.
SRC
http//www.
8Some Example XPath Queries
- /dblp/mastersthesis/title
- /dblp//editor
- //title
- //title/text()
9Context Nodes and Relative Paths
- XPath has a notion of a context node its
analogous to a current directory - . represents this context node
- .. represents the parent node
- We can express relative paths
- subpath/sub-subpath/../.. gets us back to the
context node - By default, the document root is the context node
10Predicates Selection Operations
- A predicate allows us to filter the node set
based on selection-like conditions over
sub-XPaths - /dblp/articletitle Paper1
- which is equivalent to
- /dblp/article./title/text() Paper1
11Axes More Complex Traversals
- Thus far, weve seen XPath expressions that go
down the tree (and up one step) - But we might want to go up, left, right, etc.
- These are expressed with so-called axes
- selfpath-step
- childpath-step parentpath-step
- descendantpath-step ancestorpath-step
- descendant-or-selfpath-step ancestor-or-selfpa
th-step - preceding-siblingpath-step following-siblingpa
th-step - precedingpath-step followingpath-step
- The previous XPaths we saw were in abbreviated
form
12Querying Order
- We saw in the previous slide that we could query
for preceding or following siblings or nodes - We can also query a node for its position
according to some index - fnfirst() , fnlast() return index of 0th last
element matching the last step - fnposition() gives the relative count of the
current node - childarticlefnposition() fnlast()
13Users of XPath
- XML Schema uses simple XPaths in defining keys
and uniqueness constraints - XQuery
- XSLT
- XLink and XPointer, hyperlinks for XML
14XQuery
- A strongly-typed, Turing-complete XML
manipulation language - Attempts to do static typechecking against XML
Schema - Based on an object model derived from Schema
- Unlike SQL, fully compositional, highly
orthogonal - Inputs outputs collections (sequences or bags)
of XML nodes - Anywhere a particular type of object may be used,
may use the results of a query of the same type - Designed mostly by DB and functional language
people - Attempts to satisfy the needs of data management
and document management - The database-style core is mostly complete (even
has support for NULLs in XML!!) - The document keyword querying features are still
in the works shows in the order-preserving
default model
15XQuerys Basic Form
- Has an analogous form to SQLs SELECT..FROM..WHERE
..GROUP BY..ORDER BY - The model bind nodes (or node sets) to
variables operate over each legal combination of
bindings produce a set of nodes - FLWOR statement note case sensitivity!
- for iterators that bind variables
- let collections
- where conditions
- order by order-conditions (older version was
SORTBY) - return output constructor
16Iterations in XQuery
- A series of (possibly nested) FOR statements
assigning the results of XPaths to variables - for root in document(http//my.org/my.xml)
- for sub in root/rootElement,
- sub2 in sub/subElement,
- Something like a template that pattern-matches,
produces a binding tuple - For each of these, we evaluate the WHERE and
possibly output the RETURN template - document() or doc() function specifies an input
file as a URI - Old version was document now doc but it
depends on your XQuery implementation
17Two XQuery Examples
- ltroot-taggt
- for p in document(dblp.xml)/dblp/proceedings,
- yr in p/yr
- where yr 1999
- return ltprocgt p lt/procgt
- lt/root-taggt
- for i in document(dblp.xml)/dblp/inproceedings
author/text() John Smith - return ltsmith-papergt
- lttitlegt i/title/text() lt/titlegt
- ltkeygt i/_at_key lt/keygt
- i/crossref
- lt/smith-papergt
18Nesting in XQuery
- Nesting XML trees is perhaps the most common
operation - In XQuery, its easy put a subquery in the
return clause where you want things to repeat! - for u in document(dblp.xml)/universities
- where u/country USA
- return ltms-theses-99gt
- u/title
- for mt in u/../mastersthesis
- where mt/year/text() 1999 and
____________ - return mt/title
- lt/ms-theses-99gt
19Collections Aggregation in XQuery
- In XQuery, many operations return collections
- XPaths, sub-XQueries, functions over these,
- The let clause assigns the results to a variable
- Aggregation simply applies a function over a
collection, where the function returns a value
(very elegant!) - let allpapers document(dblp.xml)/dblp/articl
e - return ltarticle-authorsgt
- ltcountgt fncount(fndistinct-values(allpapers/
authors)) lt/countgt - for paper in doc(dblp.xml)/dblp/article
- let pauth paper/author
- return ltpapergt paper/title
- ltcountgt fncount(pauth) lt/countgt
- lt/papergt
- lt/article-authorsgt
20Collections, Ctd.
- Unlike in SQL, we can compose aggregations and
create new collections from old - ltresultgt
- let avgItemsSold fnavg(for order in
document(my.xml)/orders/orderlet totalSold
fnsum(order/item/quantity)return
totalSold)return avgItemsSold - lt/resultgt
21Distinct-ness
- In XQuery, DISTINCT-ness happens as a function
over a collection - But since we have nodes, we can do duplicate
removal according to value or node - Can do fndistinct-values(collection) to remove
duplicate values, or fndistinct-nodes(collection)
to remove duplicate nodes - for years in fndistinct-values(doc(dblp.xml)//
year/text() - return years
22Sorting in XQuery
- SQL actually allows you to sort its output, with
a special ORDER BY clause (which we havent
discussed, but which specifies a sort key list) - XQuery borrows this idea
- In XQuery, what we order is the sequence of
result tuples output by the return clause - for x in document(dblp.xml)/proceedings
- order by x/title/text()
- return x
23What If Order Doesnt Matter?
- By default
- SQL is unordered
- XQuery is ordered everywhere!
- But unordered queries are much faster to answer
- XQuery has a way of telling the query engine to
avoid preserving order - unordered for x in (mypath)
24Querying Defining Metadata Cant Do This in
SQL
- Can get a nodes name by querying node-name()
- for x in document(dblp.xml)/dblp/
- return node-name(x)
- Can construct elements and attributes using
computed names - for x in document(dblp.xml)/dblp/,
- year in x/year,
- title in x/title/text(),
- element node-name(x)
- attribute year- year title
-
25XQuery Summary
- Very flexible and powerful language for XML
- Clean and orthogonal can always replace a
collection with an expression that creates
collections - DB and document-oriented (we hope)
- The core is relatively clean and easy to
understand - Turing Complete well talk more about XQuery
functions soon
26XSL(T) The Bridge Back to HTML
- XSL (XML Stylesheet Language) is actually divided
into two parts - XSLFO formatting for XML
- XSLT a special transformation language
- Well leave XSLFO for you to read off
www.w3.org, if youre interested - XSLT is actually able to convert from XML ? HTML,
which is how many people do their formatting
today - Products like Apache Cocoon generally translate
XML ? HTML on the server side
27A Different Style of Language
- XSLT is based on a series of templates that match
different parts of an XML document - Theres a policy for what rule or template is
applied if more than one matches (its not what
youd think!) - XSLT templates can invoke other templates
- XSLT templates can be nonterminating (beware!)
- XSLT templates are based on XPath matches, and
we can also apply other templates (potentially to
selected XPaths) - Within each template, we describe what should be
output - (Matches to text default to outputting it)
28An XSLT Stylesheet
- ltxslstylesheet version1.1gt
- ltxsltemplate match/dblpgt
- lthtmlgtltheadgtThis is DBLPlt/headgt
- ltbodygt
- ltxslapply-templates /gt
- lt/bodygt
- lt/htmlgt
- lt/xsltemplategt
- ltxsltemplate matchinproceedingsgt
- lth2gtltxslapply-templates selecttitle /gtlt/h2gt
- ltpgtltxslapply-templates selectauthor/gtlt/pgt
- lt/xsltemplategt
-
- lt/xslstylesheetgt
29Results of XSLT Stylesheet
- ltdblpgt
- ltinproceedingsgt
- lttitlegtPaper1lt/titlegt
- ltauthorgtSmithlt/authorgt
- lt/inproceedingsgt
- ltinproceedingsgt
- ltauthorgtChakrabartilt/authorgt
- ltauthorgtGraylt/authorgt
- lttitlegtPaper2lt/titlegt
- lt/inproceedingsgt
- lt/dblpgt
- lthtmlgtltheadgtThis Is DBLPlt/headgt
- ltbodygt
- lth2gtPaper1lt/h2gt
- ltpgtSmithlt/pgt
- lth2gtPaper2lt/h2gt
- ltpgtChakrabartilt/pgt
- ltpgtGraylt/pgt
- lt/bodygt
- lt/htmlgt
30What XSLT Can and Cant Do
- XSLT is great at converting XML to other formats
- XML ? diagrams in SVG HTML LaTeX
-
- XSLT doesnt do joins (well), it only works on
one XML file at a time, and its limited in
certain respects - Its not a query language, really
- But its a very good formatting language
- Most web browsers (post Netscape 4.7x) support
XSLT and XSL formatting objects - But most real implementations use XSLT with
something like Apache Cocoon - You may want to use XSL/XSLT for your projects
see www.w3.org/TR/xslt for the spec
31Querying XML
- Weve seen three XML manipulation formalisms
today - XPath the basic language for projecting and
selecting (evaluating path expressions and
predicates) over XML - XQuery a statically typed, Turing-complete XML
processing language - XSLT a template-based language for transforming
XML documents - Each is extremely useful for certain applications!
32Views in SQL and XQuery
- A view is a named query
- We use the name of the view to invoke the query
(treating it as if it were the relation it
returns) - SQL
- CREATE VIEW V(A,B,C) AS
- SELECT A,B,C FROM R WHERE R.A 123
- XQuerydeclare function V() as element(content)
- for r in doc(R)/root/tree,
- a in r/a, b in r/b, c in r/c
- where a 123
- return ltcontentgta, b, clt/contentgt
-
Using the views
SELECT FROM V, RWHERE V.B 5 AND V.C R.C
for v in V()/content, r in doc(r)/root/tree
where v/b r/breturn v
33Whats Useful about Views
- Providing security/access control
- We can assign users permissions on different
views - Can select or project so we only reveal what we
want! - Can be used as relations in other queries
- Allows the user to query things that make more
sense - Describe transformations from one schema (the
base relations) to another (the output of the
view) - The basis of converting from XML to relations or
vice versa - This will be incredibly useful in data
integration, discussed soon - Allow us to define recursive queries
34Materialized vs. Virtual Views
- A virtual view is a named query that is actually
re-computed every time it is merged with the
referencing query - CREATE VIEW V(A,B,C) AS
- SELECT A,B,C FROM R WHERE R.A 123
- A materialized view is one that is computed once
and its results are stored as a table - Think of this as a cached answer
- These are incredibly useful!
- Techniques exist for using materialized views to
answer other queries - Materialized views are the basis of relating
tables in different schemas
SELECT FROM V, RWHERE V.B 5 AND V.C R.C
35Views Should Stay Fresh
- Views (sometimes called intensional relations)
behave, from the perspective of a query language,
exactly like base relations (extensional
relations) - But theres an association that should be
maintained - If tuples change in the base relation, they
should change in the view (whether its
materialized or not) - If tuples change in the view, that should reflect
in the base relation(s)
36View Maintenance and the View Update Problem
- There exist algorithms to incrementally recompute
a materialized view when the base relations
change - We can try to propagate view changes to the base
relations - However, there are lots of views that arent
easily updatable - We can ensure views are updatable by enforcing
certain constraints (e.g., no aggregation),but
this limits the kinds of views we can have!
R
A B C
1 2 4
1 2 3
2 2 4
2 2 3
B C
2 4
2 3
A B
1 2
2 2
S
R?S
delete?
37Next Time
- Can we have views in XML over tables in
relations? - Or vice versa?
- What other things can we use views for