Querying XML: XPath, XQuery, and XSLT - PowerPoint PPT Presentation

About This Presentation
Title:

Querying XML: XPath, XQuery, and XSLT

Description:

... language: it can query in order-aware fashion, and it returns nodes in order ... features are still in the works shows in the order-preserving default model ... – PowerPoint PPT presentation

Number of Views:612
Avg rating:3.0/5.0
Slides: 38
Provided by: zack4
Category:
Tags: xml | xslt | querying | xpath | xquery

less

Transcript and Presenter's Notes

Title: Querying XML: XPath, XQuery, and XSLT


1
Querying XML XPath, XQuery, and XSLT
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 550 Database Information Systems
  • October 27, 2005

Some slide content courtesy of Susan Davidson,
Dan Suciu, Raghu Ramakrishnan
2
Reminders
  • Homework 4 due 11/3
  • XQuery
  • Project plan due 11/3
  • Milestones
  • Division of responsibilities
  • For non-RSS projects proposal including scope,
    milestones, and what you plan to demonstrate
  • Recitation Friday 1130-1230, Levine 512

3
Talks of Interest
  • Today
  • Anastassia Ailamaki, CMU, FATES
    Automatically-tuned Database Storage Management,
    IRCS (3401 Walnut) Room 470 _at_ 3PM
  • Tomorrow as part of DB Information Retrieval
    Day
  • Anastassia Ailamaki, CMU, StagedDB Designing
    Database Servers for New Hardware Trends, Wu and
    Chen _at_ 11AM
  • Sam Madden, MIT, Data Management for Next
    Generation Wireless Sensor Networks, Wu and Chen
    _at_ 1230PM
  • Andrei Broder, Yahoo!, The next stage in Web IR
    From query based Information Retrieval to context
    driven Information Supply , Wu and Chen _at_ 230PM

4
Querying XML
  • How do you query a directed graph? a tree?
  • The standard approach used by many XML,
    semistructured-data, and object query languages
  • Define some sort of a template describing
    traversals from the root of the directed graph
  • In XML, the basis of this template is called an
    XPath

5
XPaths
  • In its simplest form, an XPath is like a path in
    a file system
  • /mypath/subpath//morepath
  • The XPath returns a node set representing the XML
    nodes (and their subtrees) at the end of the path
  • XPaths can have node tests at the end, returning
    only particular node types, e.g., text(),
    processing-instruction(), comment(), element(),
    attribute()
  • XPath is fundamentally an ordered language it
    can query in order-aware fashion, and it returns
    nodes in order

6
Sample XML
  • lt?xml version"1.0" encoding"ISO-8859-1" ?gt
  • ltdblpgt
  • ltmastersthesis mdate"2002-01-03"
    key"ms/Brown92"gt
  •   ltauthorgtKurt P. Brownlt/authorgt
  •   lttitlegtPRPL A Database Workload
    Specification Languagelt/titlegt
  •   ltyeargt1992lt/yeargt
  •   ltschoolgtUniv. of Wisconsin-Madisonlt/schoolgt
  •   lt/mastersthesisgt
  • ltarticle mdate"2002-01-03" key"tr/dec/SRC1997-
    018"gt
  •   lteditorgtPaul R. McJoneslt/editorgt
  •   lttitlegtThe 1995 SQL Reunionlt/titlegt
  •   ltjournalgtDigital System Research Center
    Reportlt/journalgt
  •   ltvolumegtSRC1997-018lt/volumegt
  •   ltyeargt1997lt/yeargt
  •   lteegtdb/labs/dec/SRC1997-018.htmllt/eegt
  •   lteegthttp//www.mcjones.org/System_R/SQL_Reunio
    n_95/lt/eegt
  •   lt/articlegt

7
XML Data Model Visualized
attribute
root
p-i
element
Root
text
dblp
?xml
mastersthesis
article
mdate
mdate
key
key
author
title
year
school
2002
editor
title
year
journal
volume
ee
ee
2002
1992
1997
The
ms/Brown92
tr/dec/
PRPL
Digital
db/labs/dec
Univ.
Paul R.
Kurt P.
SRC
http//www.
8
Some Example XPath Queries
  • /dblp/mastersthesis/title
  • /dblp//editor
  • //title
  • //title/text()

9
Context Nodes and Relative Paths
  • XPath has a notion of a context node its
    analogous to a current directory
  • . represents this context node
  • .. represents the parent node
  • We can express relative paths
  • subpath/sub-subpath/../.. gets us back to the
    context node
  • By default, the document root is the context node

10
Predicates Selection Operations
  • A predicate allows us to filter the node set
    based on selection-like conditions over
    sub-XPaths
  • /dblp/articletitle Paper1
  • which is equivalent to
  • /dblp/article./title/text() Paper1

11
Axes More Complex Traversals
  • Thus far, weve seen XPath expressions that go
    down the tree (and up one step)
  • But we might want to go up, left, right, etc.
  • These are expressed with so-called axes
  • selfpath-step
  • childpath-step parentpath-step
  • descendantpath-step ancestorpath-step
  • descendant-or-selfpath-step ancestor-or-selfpa
    th-step
  • preceding-siblingpath-step following-siblingpa
    th-step
  • precedingpath-step followingpath-step
  • The previous XPaths we saw were in abbreviated
    form

12
Querying Order
  • We saw in the previous slide that we could query
    for preceding or following siblings or nodes
  • We can also query a node for its position
    according to some index
  • fnfirst() , fnlast() return index of 0th last
    element matching the last step
  • fnposition() gives the relative count of the
    current node
  • childarticlefnposition() fnlast()

13
Users of XPath
  • XML Schema uses simple XPaths in defining keys
    and uniqueness constraints
  • XQuery
  • XSLT
  • XLink and XPointer, hyperlinks for XML

14
XQuery
  • A strongly-typed, Turing-complete XML
    manipulation language
  • Attempts to do static typechecking against XML
    Schema
  • Based on an object model derived from Schema
  • Unlike SQL, fully compositional, highly
    orthogonal
  • Inputs outputs collections (sequences or bags)
    of XML nodes
  • Anywhere a particular type of object may be used,
    may use the results of a query of the same type
  • Designed mostly by DB and functional language
    people
  • Attempts to satisfy the needs of data management
    and document management
  • The database-style core is mostly complete (even
    has support for NULLs in XML!!)
  • The document keyword querying features are still
    in the works shows in the order-preserving
    default model

15
XQuerys Basic Form
  • Has an analogous form to SQLs SELECT..FROM..WHERE
    ..GROUP BY..ORDER BY
  • The model bind nodes (or node sets) to
    variables operate over each legal combination of
    bindings produce a set of nodes
  • FLWOR statement note case sensitivity!
  • for iterators that bind variables
  • let collections
  • where conditions
  • order by order-conditions (older version was
    SORTBY)
  • return output constructor

16
Iterations in XQuery
  • A series of (possibly nested) FOR statements
    assigning the results of XPaths to variables
  • for root in document(http//my.org/my.xml)
  • for sub in root/rootElement,
  • sub2 in sub/subElement,
  • Something like a template that pattern-matches,
    produces a binding tuple
  • For each of these, we evaluate the WHERE and
    possibly output the RETURN template
  • document() or doc() function specifies an input
    file as a URI
  • Old version was document now doc but it
    depends on your XQuery implementation

17
Two XQuery Examples
  • ltroot-taggt
  • for p in document(dblp.xml)/dblp/proceedings,
  • yr in p/yr
  • where yr 1999
  • return ltprocgt p lt/procgt
  • lt/root-taggt
  • for i in document(dblp.xml)/dblp/inproceedings
    author/text() John Smith
  • return ltsmith-papergt
  • lttitlegt i/title/text() lt/titlegt
  • ltkeygt i/_at_key lt/keygt
  • i/crossref
  • lt/smith-papergt

18
Nesting in XQuery
  • Nesting XML trees is perhaps the most common
    operation
  • In XQuery, its easy put a subquery in the
    return clause where you want things to repeat!
  • for u in document(dblp.xml)/universities
  • where u/country USA
  • return ltms-theses-99gt
  • u/title
  • for mt in u/../mastersthesis
  • where mt/year/text() 1999 and
    ____________
  • return mt/title
  • lt/ms-theses-99gt

19
Collections Aggregation in XQuery
  • In XQuery, many operations return collections
  • XPaths, sub-XQueries, functions over these,
  • The let clause assigns the results to a variable
  • Aggregation simply applies a function over a
    collection, where the function returns a value
    (very elegant!)
  • let allpapers document(dblp.xml)/dblp/articl
    e
  • return ltarticle-authorsgt
  • ltcountgt fncount(fndistinct-values(allpapers/
    authors)) lt/countgt
  • for paper in doc(dblp.xml)/dblp/article
  • let pauth paper/author
  • return ltpapergt paper/title
  • ltcountgt fncount(pauth) lt/countgt
  • lt/papergt
  • lt/article-authorsgt

20
Collections, Ctd.
  • Unlike in SQL, we can compose aggregations and
    create new collections from old
  • ltresultgt
  • let avgItemsSold fnavg(for order in
    document(my.xml)/orders/orderlet totalSold
    fnsum(order/item/quantity)return
    totalSold)return avgItemsSold
  • lt/resultgt

21
Distinct-ness
  • In XQuery, DISTINCT-ness happens as a function
    over a collection
  • But since we have nodes, we can do duplicate
    removal according to value or node
  • Can do fndistinct-values(collection) to remove
    duplicate values, or fndistinct-nodes(collection)
    to remove duplicate nodes
  • for years in fndistinct-values(doc(dblp.xml)//
    year/text()
  • return years

22
Sorting in XQuery
  • SQL actually allows you to sort its output, with
    a special ORDER BY clause (which we havent
    discussed, but which specifies a sort key list)
  • XQuery borrows this idea
  • In XQuery, what we order is the sequence of
    result tuples output by the return clause
  • for x in document(dblp.xml)/proceedings
  • order by x/title/text()
  • return x

23
What If Order Doesnt Matter?
  • By default
  • SQL is unordered
  • XQuery is ordered everywhere!
  • But unordered queries are much faster to answer
  • XQuery has a way of telling the query engine to
    avoid preserving order
  • unordered for x in (mypath)

24
Querying Defining Metadata Cant Do This in
SQL
  • Can get a nodes name by querying node-name()
  • for x in document(dblp.xml)/dblp/
  • return node-name(x)
  • Can construct elements and attributes using
    computed names
  • for x in document(dblp.xml)/dblp/,
  • year in x/year,
  • title in x/title/text(),
  • element node-name(x)
  • attribute year- year title

25
XQuery Summary
  • Very flexible and powerful language for XML
  • Clean and orthogonal can always replace a
    collection with an expression that creates
    collections
  • DB and document-oriented (we hope)
  • The core is relatively clean and easy to
    understand
  • Turing Complete well talk more about XQuery
    functions soon

26
XSL(T) The Bridge Back to HTML
  • XSL (XML Stylesheet Language) is actually divided
    into two parts
  • XSLFO formatting for XML
  • XSLT a special transformation language
  • Well leave XSLFO for you to read off
    www.w3.org, if youre interested
  • XSLT is actually able to convert from XML ? HTML,
    which is how many people do their formatting
    today
  • Products like Apache Cocoon generally translate
    XML ? HTML on the server side

27
A Different Style of Language
  • XSLT is based on a series of templates that match
    different parts of an XML document
  • Theres a policy for what rule or template is
    applied if more than one matches (its not what
    youd think!)
  • XSLT templates can invoke other templates
  • XSLT templates can be nonterminating (beware!)
  • XSLT templates are based on XPath matches, and
    we can also apply other templates (potentially to
    selected XPaths)
  • Within each template, we describe what should be
    output
  • (Matches to text default to outputting it)

28
An XSLT Stylesheet
  • ltxslstylesheet version1.1gt
  • ltxsltemplate match/dblpgt
  • lthtmlgtltheadgtThis is DBLPlt/headgt
  • ltbodygt
  • ltxslapply-templates /gt
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt
  • ltxsltemplate matchinproceedingsgt
  • lth2gtltxslapply-templates selecttitle /gtlt/h2gt
  • ltpgtltxslapply-templates selectauthor/gtlt/pgt
  • lt/xsltemplategt
  • lt/xslstylesheetgt

29
Results of XSLT Stylesheet
  • ltdblpgt
  • ltinproceedingsgt
  • lttitlegtPaper1lt/titlegt
  • ltauthorgtSmithlt/authorgt
  • lt/inproceedingsgt
  • ltinproceedingsgt
  • ltauthorgtChakrabartilt/authorgt
  • ltauthorgtGraylt/authorgt
  • lttitlegtPaper2lt/titlegt
  • lt/inproceedingsgt
  • lt/dblpgt
  • lthtmlgtltheadgtThis Is DBLPlt/headgt
  • ltbodygt
  • lth2gtPaper1lt/h2gt
  • ltpgtSmithlt/pgt
  • lth2gtPaper2lt/h2gt
  • ltpgtChakrabartilt/pgt
  • ltpgtGraylt/pgt
  • lt/bodygt
  • lt/htmlgt

30
What XSLT Can and Cant Do
  • XSLT is great at converting XML to other formats
  • XML ? diagrams in SVG HTML LaTeX
  • XSLT doesnt do joins (well), it only works on
    one XML file at a time, and its limited in
    certain respects
  • Its not a query language, really
  • But its a very good formatting language
  • Most web browsers (post Netscape 4.7x) support
    XSLT and XSL formatting objects
  • But most real implementations use XSLT with
    something like Apache Cocoon
  • You may want to use XSL/XSLT for your projects
    see www.w3.org/TR/xslt for the spec

31
Querying XML
  • Weve seen three XML manipulation formalisms
    today
  • XPath the basic language for projecting and
    selecting (evaluating path expressions and
    predicates) over XML
  • XQuery a statically typed, Turing-complete XML
    processing language
  • XSLT a template-based language for transforming
    XML documents
  • Each is extremely useful for certain applications!

32
Views in SQL and XQuery
  • A view is a named query
  • We use the name of the view to invoke the query
    (treating it as if it were the relation it
    returns)
  • SQL
  • CREATE VIEW V(A,B,C) AS
  • SELECT A,B,C FROM R WHERE R.A 123
  • XQuerydeclare function V() as element(content)
  • for r in doc(R)/root/tree,
  • a in r/a, b in r/b, c in r/c
  • where a 123
  • return ltcontentgta, b, clt/contentgt

Using the views
SELECT FROM V, RWHERE V.B 5 AND V.C R.C
for v in V()/content, r in doc(r)/root/tree
where v/b r/breturn v
33
Whats Useful about Views
  • Providing security/access control
  • We can assign users permissions on different
    views
  • Can select or project so we only reveal what we
    want!
  • Can be used as relations in other queries
  • Allows the user to query things that make more
    sense
  • Describe transformations from one schema (the
    base relations) to another (the output of the
    view)
  • The basis of converting from XML to relations or
    vice versa
  • This will be incredibly useful in data
    integration, discussed soon
  • Allow us to define recursive queries

34
Materialized vs. Virtual Views
  • A virtual view is a named query that is actually
    re-computed every time it is merged with the
    referencing query
  • CREATE VIEW V(A,B,C) AS
  • SELECT A,B,C FROM R WHERE R.A 123
  • A materialized view is one that is computed once
    and its results are stored as a table
  • Think of this as a cached answer
  • These are incredibly useful!
  • Techniques exist for using materialized views to
    answer other queries
  • Materialized views are the basis of relating
    tables in different schemas

SELECT FROM V, RWHERE V.B 5 AND V.C R.C
35
Views Should Stay Fresh
  • Views (sometimes called intensional relations)
    behave, from the perspective of a query language,
    exactly like base relations (extensional
    relations)
  • But theres an association that should be
    maintained
  • If tuples change in the base relation, they
    should change in the view (whether its
    materialized or not)
  • If tuples change in the view, that should reflect
    in the base relation(s)

36
View Maintenance and the View Update Problem
  • There exist algorithms to incrementally recompute
    a materialized view when the base relations
    change
  • We can try to propagate view changes to the base
    relations
  • However, there are lots of views that arent
    easily updatable
  • We can ensure views are updatable by enforcing
    certain constraints (e.g., no aggregation),but
    this limits the kinds of views we can have!

R
A B C
1 2 4
1 2 3
2 2 4
2 2 3
B C
2 4
2 3
A B
1 2
2 2
S
R?S
delete?
37
Next Time
  • Can we have views in XML over tables in
    relations?
  • Or vice versa?
  • What other things can we use views for
Write a Comment
User Comments (0)
About PowerShow.com