Title: Introduction to XML
1Introduction to XML
2Acknowledgements
- The slides on XML introduction are built from
slides of many people - Peter Buneman, Susan Davidson, Zack Ives, Wang
Chiew Tan, Mary Fernandez, Michael Benedikt,
Juliana Freire, Arnaud Sahuguet, Daniela
Florescu, Donald Kossmann
3Why XML
- HTML standard format to display unstructured web
data for human reading - XML is the confluence of two factors
- The Web needed a more declarative format for
data, trying to describe the meaning of the data
using extended tags - Database people needed a more flexible data
interchange format - XML covers the continuous spectrum from
unstructured documents to structured data - XML universal format to represent
(semi-)structured web data for machine
processing. - interoperability
- self-describing
- extensibility
4Where does XML data come from?
- Published by relational databases
- Produced by information extraction/text mining
over unstructured data - Obtained by integrating data from different
sources (which can have different schemas) - Data whose schema can change over time
- Obtained by calling web services (amazon, google,
weather.com .) - XML is the format of the web data that is
processed by programs
5XML Extensible Markup Language
- http//www.w3.org/XML/
- XML 1.0, W3C Recommendation Feb '98
- XML is just a syntax,
- But a standardized, extensible syntax
- Compared with HTML, XML allows specification of
new dialects by inventing tags.
6An XML Document Example
-
-
- Fugitive, The
-
-
- Roger
Ebert gives two thumbs - up! A fun action
movie, Harrison Ford at his best. -
-
-
- The standard hollywood
summer movie strikes back. -
- 183,752,965
-
-
- X Files,The
- 4
-
-
Mixed Content
- Two basic components
- Tags / structure / meta data / markup
- Text / value / data
Attribute
7Tags
- Describes the meaning of the text.
- Tags come in pairs start tags and end tags.
- E. g . ...
- They must be properly nested
- ... ... ---
good - ... ... --- bad
- The region between start tag and end tag defines
an element.
8Structure of XML Data
- Nesting tags can be used to express various
structures. - I.e. Elements are nested.
- E.g.
Fugitive, The
9Structure of XML Data (cont.)
- We can represent a list by using the same
- tag repeatedly
- Order matters!
... ... ...
...
10Attributes
- We can have attributes with name and value pairs
within a start tag. - E.g.
- Alternatively, we can represent the information
as a nested element (instead of an attribute) - E.g. 1993
- Differences between elements and attributes
- Elements are ordered, attributes are not
- For an element, each attribute has a distinct
name - Which one to choose?
11Text and Mixed Content
- XML has only one basic type -- text.
- It is bounded by tags e.g.
- The Big Sleep
-
- XML text is called PCDATA (for parsed character
data). - Can be mixed with other subelements --- Mixed
Content - Roger Ebert gives
two thumbs - up! A fun action movie, Harrison
Ford at his best. - Mixed content is very useful for document data
- People speak in sentences. XML can preserve the
structure of natural language, while adding
semantic markup that can be interpreted by
machines.
12Continuous spectrum between text, semi-structured
data, and structured data
- Roger Ebert gives two thumbs up! The Fugitive is
a fun action movie, Harrison Ford at his best. - two thumbs up!
The Fugitive is a fun action movie, Harrison Ford
at his best. -
- Roger Ebert
-
- two thumbs up
- The Fugitive
- action
- Harrison Ford
-
-
13Representing XML Data as Trees
-
-
- Fugitive, The
-
-
- Roger
Ebert gives two thumbs
up! A fun action movie,
Harrison Ford at his best. -
-
-
- The standard hollywood
summer movie strikes back. -
14XML and Relational Data (I)
- XML easily encodes relations
XML Fugitive,
The 1993
Andrew Davis
Are they equivalent?
What about order?
15XML and Relational Data (II)
Are there other XML formats to encode these two
relations?
16XML and Relational Data (III)
What about Fugitive,
The 1993
Andrew Davis
Harrison Ford Tommy
Lee Jones
17XML and Relational Data (IV)
- Relational data
- Killer application banking industry
- Invented as a mathematically clean abstract data
model - Philosophy schema first, then data
- Strict rules for data normalization, flat tables
- Order is irrelevant, textual data supported but
not primary goal
- XML
- First killer application publishing industry
- Invented as a syntax for data, only later an
abstract data model - Philosophy data can exist with or without
schema, or with multiple schemas - No data normalization, flexibility is a must
- Order may be very important, textual data support
a primary goal
18Sources of XML data
- Inter-application communication data (e.g. Web
Services) - Mobile devices communication data
- Logs
- Web syndication (RSS)
- Metadata (e.g. Schema, WSDL, XMP)
- Presentation data (e.g. XHTML)
- Documents (e.g. Word)
- Views of other sources of data
- e.g. Relational, LDAP(Lightweight Directory
Access Protocol) , CSV (comma-separated values),
Excel, etc. - Sensor data
19XML Dialects in Vertical Application Domains
Basically everywhere!
- HealthCare Level Seven (HL7)
- Geography Markup Language (GML)
- Systems Biology Markup Language (SBML)
- Digital photography metadata (XMP)
- Extensible Financial Reporting Markup Language
(XFRML) - MusicXML,
- Spacecraft Markup Language (SML)
- Bank Internet Payment System (BIPS),
- Bioinformatic Sequence Markup Language (BSML),
- Chemical Markup Language (CML),
- Electronic Business XML Initiative (ebXML),
- FinXML, Financial Information eXchange protocol
(FIX), - Scalable Vector Graphics (SVG),
- Real Estate Listing Markup Language (RELML), . .
. - More at http//xml.coverpages.org/gen-apps.html
- http//www.xml.org/xml/industry_industrysectors.js
p
20RSS
- RSS 2.0 (Really Simple Syndication) web feed
formats used to publish frequently updated
digital content, such as blogs, news feeds. - RSS delivers its information as an XML file
called an "RSS feed", "webfeed", "RSS stream", or
"RSS channel". - RSS Users 'subscribes' to a feed by supplying to
their reader a link to the feed - Feed readers (or aggregators) client softwares
that regularly check a list of feeds on behalf of
a user, pull and display any updated content that
they find. - RSS readers/ feed readers / feed aggregators /
news readers / search aggregators - Google Reader, My Yahoo!, Bloglines, web
browsers
21XForms The next generation of web forms
- http//www.w3.org/TR/xforms/
- Benefits
- device-neutral
- platform-independent
- excellent XML integration can create and be
created from XML documents - Provide common features (e.g. validation using
XML schema and query languages) without scripting
22Microsoft Office in XML
- Office 2003 was able to import/export all
documents into XML - Office 2007 models the documents NATIVELY in XML
(Microsoft Office Open XML, i.e. OOXML) - Examples of vocabularies and schemas
- WordprocessingML (the XML file format for Word
2003), SpreadsheetML (Excel 2003), and
DataDiagramingML (Visio 2003)
23Web Services
- Web service a software system designed to
support interoperable machine to machine
interaction over a network. - Web service protocol stack
- Service transport HTTP, SMTP, ..
- XML messaging SOAP(Simple Object Access
Protocol) - Service description WSDL(web service descripton
language), an XML based language that provides a
model for describing web services - Service discovery UDDI (universal description,
discovery and integration), an XML based registry
for business to list themselves on the Internet
24XML Isnt Enough on Its Own
- Its is just a data format, but we care about
data! - How to design XML format for a given domain?
- Sometimes more structure about the data is
helpful. - How can we know when we are getting garbage?
- How can we query the data?
- How can we understand what we got?
25XML Standards Landscape
- Schema languages
- DTDs http//www.w3schools.com/dtd/default.asp
- XML Schemas http//www.w3.org/XML/Schema
- Programming APIs
- DOM (Document Object Model) http//www.w3.org/DOM
/ - SAX (Simple API for XML) http//www.saxproject.or
g/ - JAXP (Java API for XML Processing)
http//java.sun.com/webservices/jaxp/ - Query languages
- XPath http//www.w3.org/TR/xpath
- XQuery http//www.w3.org/TR/xquery/
- XSLT http//www.w3.org/TR/xslt
- Standard organizations
- W3C (the World Wide Web Consortium)
- OASIS (Organization for the Advancement of
Structured Information Standards)
26Comparison with RDBMS
- XML documents
- DTD, XML Schema (optional)
- DOM and SAX API
- XPath, XQuery, XSLT
Relational databases Relational
Schema (required) JDBC / ODBC SQL
27XML Schema
- Schema a model of the data
- Structural definitions
- Type definitions
- Defaults
- Useful for
- validating data
- facilitate the writing of applications that
process data - facilitate the writing of queries
- designing storage and query evaluation strategies
- defining prior agreements between parties for
data exchange - mapping to programming languages (e.g. Java, C)
28DTD Document Type Descriptors
- Part of the original XML 1.0 specification
- Describe the grammar of the XML file
- Element declarations how elements are allowed to
nest within each other by rules - Attributes lists describe what attributes are
allowed on which element - Some constraints on the value of elements and
attributes - the root element of the XML tree
29Sample XML Data
Fugitive, The
Roger Ebert gives
two thumbs
up ! A fun
action movie, Harrison Ford at his best.
The standard
hollywood summer movie strikes back.
183,752,965x_office year1994 X
Files,The
4
Exactly one title
As many reviews as needed after title
Box office or seasons info
30Specifying the Structure
- title to specify a title
element - director? to specify an
optional (0 or 1) director elements - review to specify 0 or more
review - title, review to specify a title
followed by 1 or - more
review - box_office seasons to specify a box_office
or a seasons element
31Specifying the Structure (cont)
- So the whole structure of a movie element is
specified by - (seasonbox_office))
- This is known as a regular expression
- PCDATA only textual content allowed
-
32Regular Expressions
- Each regular expression determines a
corresponding finite state automaton -
- Lets start with a simpler example
- title, review
- This suggests a simple parsing program to
validate XML data
review
title
33A More Complicated Example
- title,review,(box_offcie seasons),actor
34Defining the attribute lists
- attribute-type default-value
- attribute-type CDATA, ....
- default value
- value the default value of the attribute
- REQUIRED the attribute value must be included
- IMPLIED attribute is optional
- FIXED value the attribute value is fixed.
- E.g.
35The DTD for the Sample XML Data
- DTD
-
- seasons))
-
-
-
-
- Indicating DTD in an XML file
-
-
36DTDs Arent Enough Sometimes
- DTDs capture grammatical structure, but have
some drawbacks - Not themselves in XML inconvenient to build
tools for them - No built-in data types and domains
- Limited abilities to specify constraints on
values - No way of defining OO-like inheritance
37XML Schema of the Data
XML namespace, specified by a URI
-
- ema"
-
-
-
-
-
-
- typexsstring/
- minoccurs1 maxoccursunbounded /
-
- namebox_office typexsinteger/
- nameseasons typexsinteger/
-
-
- useoptional/
-
-
-
A user-defined (unnamed) complex data type
By default, minoccurs1 maxoccurs1 type
xsdanyType
The value of use can be optional or
required. We can also have attribute fixed,
default
38Using XML Schema for Type Inheritance
- Suppose that we want to differentiate two types
of shows movie and tv_show. Both have
title, review subelements and year
attributes. Movies have box_office, tv_shows have
seasons. An XML document records movies and TV
shows. -
- To do this, we need to
- name show type explicitly (since this type is
used more than once in the schema) - create two new data types, derived from show by
extension. - Note that XML schema also allows derivation by
restriction
39-
- ema"
-
-
- typexsstring/
- mixedtrue minoccurs1 maxoccursunbounded
/ -
- typexsinteger useoptional/
-
-
-
-
- namebox_office" type"xsinteger"/
-
-
-
-
We can also write the schema without naming
Movie and TV_show types
40DTD vs XML Schema
- XML Schema is more expressive
- It defines data type information
- Simple types and complex types
- Built-in types and user-defined types
- Can specify the cardinality of an element within
its parent element - Can specify expressive value constraints, such as
keys, and foreign keys (more on this later!) - XML Schema is an also XML file!
41Well-Formed XML
- Well-formed applies to any XML document (with or
without a DTD) - All open-tags have matching close-tags, or a
special case - is a shortcut for
- Attributes (which are unordered, in contrast to
elements) only appear once within an element - Theres a single root element
- XML is case-sensitive
42Valid XML
- Valid specifies that the document conforms to the
DTD or XML Schema. - conforms to the grammar
- the types of attributes are correct
- constraints on references are satisfied
43Summary
- As a data format, the main virtues of XML are its
widespread acceptance and the (important) ability
to handle semi-structured data (data without
schema) - Problems remain
- How to store large XML documents?
- How to query them?
- How to map XML from/to other data representation
formats?
44XML Standard Landscape
- Schema languages
- DTDs
- XML Schemas
- Programming APIs
- DOM
- SAX
- Query languages
- XPath
- XQuery
- XSLT
45Programming APIs
DOM Instance or SAX events
XML document
Applications
XML Parser (DOM, SAX)
- DOM
- http//www.w3.org/DOM/
- http//www.w3schools.com/dom/
- SAX
- http//www.saxproject.org/
46DOM Document Object Model
- Build a tree data structure (in memory)
- Provide accesses to nodes in a tree.
- Level 1. Functionality for XML document
navigation and manipulation. - Level 2. Stylesheet and namespaces
- Level 3. Document loading and saving DTDs and
schemas
47DOM Interfaces
- Interface Document
- Method createElement getElementsByTagName...
- Interface Node
- Attribute parentNode, childNodes, firstChild,
nextSibling, attributes... - Method appendChild removeChild...
- Interface Element
- Method getAttributeNode, removeAttributeNode.
48DOM an Example
49Navigating DOM trees
- What does this code fragment do?
- var xgetElementsByTagName(title')
- for (i0ildNodes0.nodeValue)
-
What if we are only interested in titles of
shows?
50SAX Simple API for XML
- http//www.saxproject.org/
- Event based
- Instead of reading the entire file in memory and
building a tree, SAX reads a stream of tokens and
triggers events, e.g., - startDocument
- startElement
- endElement
- endDocuments
- Characters
- Applications write handlers for events.
- Supports document order access to data
- Read-only access, No update-in-place
51SAX an example
- startElement(imdb, null)
- startElement(show, (year, 1993))
- startElement(title,)
- characters(Fugitive, The)
- endElement(title)
- startElement(review, null)
- startElement(suntimes, null)
- startElement(reviewer)
- characters(Roger
Ebert) - endElement(reviewer)
- characters( gives )
... - startElement(rating,
null) - characters(two thumbs
up) - endElement(rating)
- characters(! A fun movie)
- endElement(suntimes)
- endElement(review) .
Fugitive, The
Roger Ebert gives
two thumbs
up ! A fun
action movie, Harrison Ford at his best.
..
How can you find all show titles using a SAX
parser?
52DOM vs SAX
- DOM
- XML represents a tree model and DOM is very
natural to understand. - Supports navigation to document
- Enable dynamic update, add, and delete to
document content - SAX
- Lightweight
- Good for applications that read large XML
documents once. - E.g. filter stock quotes, network alerts,
load XML documents into storage systems
53Link to XML parsers download site
- http//xml.apache.org/
- Xerces projects implement DOM and SAX parser in
Java, C, Perl. - http//xml.apache.org/xerces2-j/
- http//xml.apache.org/xerces-c/
- http//xml.apache.org/xerces-p/
54XML Standards Landscape
- Schema languages
- DTDs
- XML Schemas
- Programming APIs
- DOM
- SAX
- Query languages
- XPath
- XQuery
- XSLT
55Common Querying Tasks
- Filter, select XML values
- Navigation, selection, extraction
- Merge, integrate values from multiple XML sources
- Joins, aggregation
- Transform XML values from one schema to another
- XML construction
56XML Query Languages
- XPath
- httpwww.w3.org/TR/xpath
- Language for navigation, selection, extraction
- Used in XSLT, XQuery, XPointer, Xlink, XML
Schema, et al - XQuery
- http//www.w3.org/TR/xquery/
- Strongly-typed query language
- Additional join, transformation and construction
ability - XSLT
- http//www.w3.org/TR/xslt
- Transform XML to XML, HTML, Text
57A Simple XPath Query
- In its simplest form, an XPath is like a path in
a file system /imdb/show/review
imdb
imdb
show
show
title
review
review
_at_year
review
Fugitive, The
1993
XPath Query
suntimes
nyt
...
rating
reviewer
Roger Ebert
gives
XML Data
58XPath Basics
- The XPath returns a set of XML nodes (and their
subtrees) selected by the path - XPaths can have node tests at the end, returning
only particular node types - e.g., text(), returns the PCDATA associated
with a node and, - element(), attribute(), etc.
- XPath is fundamentally an ordered language it
can query in order-aware fashion, and it returns
nodes in order
59Wildcards
- In fact, besides use tag/attribute names, we may
use as a wildcard in queries. - E.g. /imdb//review
60XPath Axes
- In the previous XPath query, / matches a child
edge in XML tree, i.e. go down the XML tree one
step - XPath have axes to specify more expressive
navigations, so we can go up, left, right, for
multiple steps. - self
- child (/) parent
- descendant (//) ancestor
- descendant-or-self ancestor-or-self
- preceding-sibling following-sibling
- preceding following
61Another Sample XPath Query
imdb
child
imdb
show
descendant
title
following-sibling
review
review
title
_at_year
review
Fugitive, The
1993
XPath Query /childimdb/descendant title/follo
wing-siblingreview
suntimes
nyt
...
rating
reviewer
Roger Ebert
gives
XML Data
62Context Nodes and Relative Paths
- XPath has a notion of a context node
- analogous to a current directory in a file
system - . represents this context node
-
- .. represents the parent node, so we can
express relative paths - E.g. .///../..
- By default, the document root is the context node
at the start of an XPath evaluation
63Predicates
- A predicate allows us to filter the node set
based on selection-like conditions over
sub-XPaths - A predicate tests existence of a path.
- A predicate can be boolean expressions (and, or)
and include XPath functions - An example XPath Query
- //show./review//ratingtwo thumbs up/title
-
64An XPath Query with Predicates
imdb
descendant
show
show
child
child
title
review
descendant
review
review
title
_at_year
rating
Fugitive, The
1993
two thumbs up
suntimes
nyt
XPath Query //show./review//ratingtwo thumbs
up/title
...
rating
reviewer
Roger Ebert
gives
XML Data
65XQuery
- A concrete syntax
- http//www.w3.org/TR/xquery
- A formal semantics and algebra
- http//www.w3.org/TR/query-semantics/
- Some use cases
- http//www.w3.org/TR/xquery-use-cases/
66XQuery
- A strongly-typed XML manipulation language
- Designed mostly by DB and functional language
people - Attempts to satisfy the needs of data management
and document management - The database-style core is mostly complete
- The document keyword querying features are in
progress shows in the order-preserving default
model - http//www.w3.org/TR/xquery-full-text-requirements
/ - http//www.w3.org/TR/xmlquery-full-text-use-cases/
67XQuerys Basic Form
- Has an analogous form to SQLs SELECT..FROM..WHERE
..GROUP BY..ORDER BY - The model
- binds nodes (or node sets) to variables
- operates over each legal combination of
bindings - produces a set of nodes
- FLWOR statement
- for iterators that bind variables
- let collections
- where conditions
- order by order-conditions
- return output constructor
68Iterations in XQuery For
- A series of (possibly nested) FOR statements
assigning the results of XPaths to variables - The FOR clause iterates over the items in the
binding sequence, binding the variable to each
item in turn - for root in document(http//my.org/my.xml)
- for v1 in root//show,
- v2 in v1/reviews,
- document()/doc() function specifies an input file
as a URI - Something like a template that pattern-matches,
and produces binding tuples - For each of these, we evaluate the WHERE and
possibly output according to the RETURN template
69XQuery Example Q1
-
- for s in document(imdb.xml)/imdb/show,
- yr in s/_at_year
- where yr 1994
- return s
-
70Output of Q1
Fugitive, The
Roger Ebert gives
two thumbs
up ! A fun
action movie, Harrison Ford at his best.
The standard
hollywood summer movie strikes back.
183,752,965x_office year1994 X
Files,The
4
X
Files,The
4
71Output in XQuery Return
- Building a nested XML trees is perhaps the most
common operation - In XQuery, its easy put a subquery in the
return clause where you want things to repeat! - Curly braces delimit enclosed expressions
from literal text - Q2
-
- for s in document(imdb.xml)/imdb/show
- where s/_at_year 1993
- return
- s/title
- for r in s/reviews
- where r//reviewer/text() Roger Ebert
return r -
-
72Output of Q2
Fugitive, The
Roger Ebert gives
two thumbs
up ! A fun
action movie, Harrison Ford at his best.
The standard
hollywood summer movie strikes back.
183,752,965x_office year1994 X
Files,The
4
73Output of Q2 for Revised XML Data
Will this show appear in the output?
Fugitive, The
The standard hollywood
summer movie strikes back.
183,752,965fice
- Q2
-
- for s in document(imdb.xml)/shows
- where s/_at_year 1993
- return
- s/title
- for r in s/reviews
- where r//reviewer/text()
Roger Ebert return r -
-
How to revise Q2 such that this show will not be
returned?
74Collections Aggregation in XQuery Let
- In XQuery, many operations return collections,
the LET clause binds each variable to the result
of its associated expression, without iteration - Aggregation simply applies a function over a
collection, where the function returns a value - Q3
- let s document(imdb.xml)/imdb/show
- return
- fncount(fndistinct-values(
s/reviewers)) -
-
75Distinct-ness
- In XQuery, DISTINCT-ness happens as a function
over a collection - But since we have nodes, we can do duplicate
removal according to value or node - fndistinct-values(collection) remove duplicate
atomic values - http//www.xqueryfunctions.com/xq/fn_distinct-valu
es.html - fndistinct-nodes(collection) remove duplicate
nodes - E.g.
-
- Roger Ebert
- Roger Ebert
Same value, Different nodes
76Iterations vs Collections
- Different uses of FOR and LET clauses.
- Example 1
- let s (, , ) return
s -
-
- Example 2
- for s in (, , ) return
s -
-
-
- Output
-
- Only one item is generated, containing the
binding of s -
- Output
-
- One tuple is generated for each of these
bindings, and the return clause is invoked for
each tuple
77Collections Aggregation in XQuery Let (cont)
- Q4 Suppose each reviewer has a first name and a
last name, list all distinct reviewer names. -
- let r doc(imdb.xml")//reviewer
- for last in distinct-values(r/last),
- first in distinct-values(rlast
last/first) - order by last, first
- return
-
- last
- first
-
-
78Collections Aggregation in XQuery Let (cont)
- We can compose aggregations and create new
collections from the old - Q5
-
- let avgItemsSold fnavg( for order in
document(my.xml)/orders/order let
totalSold fnsum(order/item/quantity)
return totalSold) - return avgItemsSold
-
- What does this query do?
79Joins in XQuery
- We can use variable bindings to represent joins
- Q6
-
- for r in distinct-values(doc(imdb.xm
l")//reviewer) - return
- r
- for s in
doc(imdb.xml")/show - where some sr in
s//reviewer r - return s/title
-
According to distinct-values function, each r
is bound to a string value, not an element node.
In a direct element constructor, curly braces
delimit enclosed expressions, distinguishing them
from literal text. Enclosed expressions are
evaluated and replaced by their value.
, !, , compares the values (of
sequences), atomization is applied to each
operand. (eq, ne, lt, le, gt, ge for comparing
single values) is, compares node
identity or document order http//www.w3.org/TR/xq
uery/id-comparisons
80Sorting in XQuery
- SQL actually allows you to sort its output, with
a special ORDER BY clause - XQuery borrows this idea
- In XQuery, what we order is the sequence of
result tuples output by the return clause - Q7
- for x in document(imdb.xml)/imdb/show
- order by x/year
- return x
81What If Order Doesnt Matter?
- By default
- SQL is unordered
- XQuery is ordered, since XML is ordered!
- What if we want to use XML to represent unorder
data, e.g. relations? - XQuery has a way of telling the query engine to
avoid preserving order - unordered for x in (mypath)
82XQuery Beyond FLWOR
- XQuery has many built-in functions and
predicates,such as - count(), sum(), min(), max(), position(),
first(), last() which work over sequences - distinct-values(), distinct-nodes() remove
duplicates - Set operations union, intersection
- If-then-else statements and function definition
(define function name (params) returns result)
are also included
83Querying Defining Metadata
- XML is a model that mixes data and meta data.
XQuery has capabilities to query them seamlessly - Obtain a nodes name using function node-name()
- for x in document(imdb.xml)/imdb/
- return node-name(x)
- Construct elements and attributes
- for x in document(imdb.xml)/imdb/,
- year in x/_at_year,
- title in x/title/text(),
- element node-name(x)
- attribute year year title
-
- Cant do this in SQL!
84XSL(T) The Bridge Back to HTML
- XSL (XML Stylesheet Language)
- A transformation language
- XSLT is designed primarily for the
transformations used in XSL. Besides XSLT, XSL
includes an XML vocabulary for specifying
formatting. - E.g. convert from XML ? HTML, which is how a lot
of people do their formatting - Products like Apache Cocoon generally translate
XML ? HTML on the server side - http//www.w3.org/TR/xslt
85- We have discussed the schema (partially), API and
query languages for XML. - Comparing XPath with XQuery (FLWOR)
- XPath expresses limited FWR
- Most of the components in RDBMS have their
counterpart in XML standards. - What else?
86Keys
- An essential part of database design
- provide a way to identify a tuple in a relation
- Important for updates
- More philosophically,
- a key of a tuple is the invariant connection
between the tuple and the real world entity it
represents eg. SSN of Persons relation
SSN
Name
H. Simpson
1234
87DOM Node Addresses
db
1
2
composer
composer
1
2
4
2
1
3
_at_period
work
work
work
period
name
born
name
baroque
_at_num
2
1
1
1
2
1
_at_num
_at_num
1
title
num
num
first
last
num
last
first
title
19
82
552
1685
1
1
1
1
1
1
G.F
Handel
Art Thou Troubled?
Ich habe genug
J.S
Bach
Need a value-based mechanism for identifying
nodes
88Value-based Constraints for XML
- ID/IDREF in DTD
- KEY/KEYREF in XML Schema
- Functional dependencies for XML
89Specifying ID and IDREF attributes in DTD
Used to uniquely identify elements in an XML
document
-
-
-
-
- id ID REQUIRED
- mother IDREF IMPLIED
- father IDREF IMPLIED
- children IDREFS IMPLIED
90Navigate IDREF edges in XML Data
- Using ID/IDREF, XML can be used to represent a
directed graph data model. - To navigate this graph model, we can use
- Function id()
- Selects elements by their unique ID
-
- Examples
- id("foo") selects the element with unique ID
foo - id(//person/mother)
91- Is function id() sufficient to express
navigations in a graph?
- Restrictions of using id() to navigate through
IDREF - Need to explicitly invoke id() in the query
- Navigate IDREF edges in the graph differently as
other edges. - There is no reverse traversal of id() can be
performed
92Why ID/IDREF Is Not Sufficient for Value-based
Constraints?
- It is not categorized.
- Person name and book title share the same ID
type - Unary keys only
- We can not specify that a person has key of the
first name and last name. - It is impossible to express two alternative keys
for a node - It is always global
- i.e. ID attributes are unique within the
entire document
93Example 1 Absolute/Global Keys
db
composer
composer
work
name
work
work
born
period
name
baroque
title
num
last
first
num
num
last
first
title
19
82
552
1685
G.F
Handel
Art Thou Troubled?
Ich habe genug
J.S
Bach
94Example 2 Relative/Local Keys
db
book
book
. . .
chapter
. . .
title
chapter
chapter
chapter
. . .
title
sec
sec
num
sec
sec
sec
num
sec
sec
sec
num
num
Biology
Chemistry
num
num
num
num
num
num
num
num
Eleven
One
Twelve
One
6
1
1
5
1
3
1
4
95Expressing Keys in XML Schema (I)
XML
XML Schema
namecomposer"
type"rcomposerType"/
name"NumKey" xpath//work"/
J.S
Bach
Ich
habe genug 82
552
1685
..
Note 1. All XPaths start at the element
currently being defined 2. Each field must
identify a single node for each node evaluated
from selector
96An Example Key in XML Schema (II)
XML
XML Schema
J.S
Bach
Ich
habe genug 82
552
1685
..
xpath"//composer"/
97Two Flavors of Keys in XML Schema
XML Schema
xpathp"/ xpathp2"/ . . . xpathpk"/
xpathp"/ xpathp2"/ . . . xpathpk"/
98Keys in XML Schema
- Unique guarantees uniqueness
- Key guarantees uniqueness and existence
- All XPath expressions are restricted
- /a/b /a/c OK for selector
- //a/b//c OK for field
- To help the implementation
- Note better and more expressive than DTDs ID
mechanism
99How to Use XML Keys to Specify an ID?
- Why we use unique instead of key?
- Often XML does not require every node has an ID,
the existence of ID is not guaranteed.
100How to Specify a Relative Key?
db
book
book
. . .
chapter
. . .
title
chapter
chapter
chapter
. . .
title
sec
sec
num
sec
sec
sec
num
sec
sec
sec
num
num
Biology
Chemistry
num
num
num
num
num
num
num
num
Eleven
One
Twelve
One
6
1
1
5
1
3
1
4
101How to Specify a Relative Key (cont)
XML Schema
. namebookKey" xpath/book"/ xpathtitle"/ namebook"
. namechapterKey" xpath/chapter"/ xpathnum"/
- Define keys in the right context!
- A chapter can be uniquely identified by num
within a book node.
102Keyrefs in XML Schema
db
XML Schema
XML Data
namebookKey" xpath/book"/ xpathtitle"/ namebookRef" referbookKey"
..
..
..
..
..
For Keyref Sometimes it is not clear what
paths should be selector, versus field
103Reference to Global/Relative Keys
- We can use Keyref to refer a global key.
- E.g. We can refer a book by its title anywhere in
the document. - To refer a relative key, Keyrefs must be in the
same context (element) as keys. - Since we have no mechanisms to specify the
context of a key in a keyref - E.g. Can not refer to a chapter of another book!
104Another Representation for Relative Keys
- Compute an absolute key from several relative
keys in a transitive way - Such a (absolute) key can be referenced anywhere
in the document
ment namebook" namechapterKey" xpath/chapter"/ xpathnum"/
105Combining Keys and Schemas
- On XML Integrity Constraints in the Presence of
DTDs, Fan and Libkin, PODS2001 - Keys DTDs sometimes imply unexpected effects
106Combining Keys and Schemas
expertJim DB
Graphics nameJim AI
OS
. . . .
107Combining Keys/Keyrefs and Schemas
DTD subject) REQUIRED subject expert CDATA REQUIRED
- Keys
- Any teacher node is keyed by _at_name
- Any subject node is keyed by _at_expert
- Keyrefs
- All the values of //subject/_at_expert appear in
//teacher/_at_name
- But it is impossible for an XML document to
satisfy all of them ! - In general undecidable to check if such an XML
document exists
108Functional dependencies for XML
- for x in //student, x/_at_sno-x/name
- Why this would be useful?
- Not in XML Schema yet!
109References for XML Constraints
- DTD http//www.w3.org/TR/REC-xml/
- XML Schema http//www.w3.org/TR/xmlschema-0/
- Keys for XML by Buneman, Davidson, Fan, Hara,
Tan, in WWW10, 2001. - A Normal Form for XML Documents by Marcelo
Arenas, Leonid Libkin, in PODS 2002. - Data on the Web Abiteboul, Buneman, Suciu
section 7.7
110Useful links
- http//www.w3.org/XML/
- http//www.w3.org/TR/xpath
- http//www.w3.org/TR/xquery-use-cases/
- http//www.w3.org/TR/xquery/