Title: COMP3311: DTD, Schema, XQuery
1COMP3311 DTD, Schema, XQuery
Modified from Prof. Dan Sucius notes.
2XMLDocument Type Definitions
- part of the original XML specification
- an XML document may have a DTD
- terminology for XML
- well-formed if tags are correctly closed
- valid if it has a DTD and conforms to it
- validation is useful in data exchange
3Very Simple DTD
lt!DOCTYPE company lt!ELEMENT company
((personproduct))gt lt!ELEMENT person (ssn,
name, office, phone?)gt lt!ELEMENT ssn
(PCDATA)gt lt!ELEMENT name (PCDATA)gt
lt!ELEMENT office (PCDATA)gt lt!ELEMENT phone
(PCDATA)gt lt!ELEMENT product (pid, name,
description?)gt lt!ELEMENT pid (PCDATA)gt
lt!ELEMENT description (PCDATA)gt gt
4Very Simple DTD
Example of valid XML document
ltcompanygt ltpersongt ltssngt 123456789 lt/ssngt
ltnamegt John lt/namegt
ltofficegt B432 lt/officegt
ltphonegt 1234 lt/phonegt lt/persongt
ltpersongt ltssngt 987654321 lt/ssngt
ltnamegt Jim lt/namegt
ltofficegt B123 lt/officegt lt/persongt
ltproductgt ... lt/productgt ... lt/companygt
5Content Model
- Element content what we can put in an element
(aka content model) - Content model
- Complex a regular expression over other
elements - Text-only PCDATA
- Empty EMPTY
- Any ANY
- Mixed content (PCDATA A B C)
- (i.e. very restrictied)
6Attributes in DTDs
lt!ELEMENT person (ssn, name, office,
phone?)gt lt!ATTLIS person age CDATA
REQUIREDgt
ltperson age25gt ltnamegt ....lt/namegt
... lt/persongt
7Attributes in DTDs
lt!ELEMENT person (ssn, name, office,
phone?)gt lt!ATTLIS person age CDATA
REQUIRED id
ID REQUIRED
manager IDREF REQUIRED
manages IDREFS
REQUIRED gt
ltperson age25 idp29432
managerp48293 managesp34982
p423234gt ltnamegt ....lt/namegt
... lt/persongt
8Attributes in DTDs
- Types
- CDATA string
- ID key
- IDREF foreign key
- IDREFS foreign keys
separated by space - (Monday Wednesday Friday) enumeration
- NMTOKEN, NMTOKENS, ENTITY
- you
dont want to know this
9Attributes in DTDs
- Kind
- REQUIRED
- IMPLIED optional
- value default value
- value FIXED the only value allowed
10Using DTDs
- Must include in the XML document
- Either include the entire DTD
- lt!DOCTYPE rootElement ....... gt
- Or include a reference to it
- lt!DOCTYPE rootElement SYSTEM http//www.mydtd.org
gt - Or mix the two... (e.g. to override the external
definition)
11DTDs as Grammars
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
12DTDs as Grammars
- A DTD a grammar
- A valid XML document a parse tree for that
grammar
13DTDs as Schemas
- Not so well suited
- impose unwanted constraints on order
lt!ELEMENT person (name,phone)gt - references cannot be constrained
- can be too vague
- lt!ELEMENT person ((namephoneemail))gt
14XML Schemas
- http//www.w3.org/TR/xmlschema-1/10/2000
- generalizes DTDs
- uses XML syntax
- two documents structure and datatypes
- http//www.w3.org/TR/xmlschema-1
- http//www.w3.org/TR/xmlschema-2
- XML-Schema is very complex
- often criticized
- some alternative proposals
15XML Schemas
- ltxsdelement namepaper typepapertype/gt
- ltxsdcomplexType namepapertypegt
- ltxsdsequencegt
- ltxsdelement nametitle
typexsdstring/gt - ltxsdelement nameauthor
minOccurs0/gt - ltxsdelement nameyear/gt
- ltxsd choicegt lt xsdelement
namejournal/gt - ltxsdelement
nameconference/gt - lt/xsdchoicegt
- lt/xsdsequencegt
- lt/xsdelementgt
DTD lt!ELEMENT paper (title,author?,year,
(journalconference))gt
16Elements v.s. Types in XML Schema
ltxsdelement namepersongt ltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegt
lt/xsdcomplexTypegtlt/xsdelementgt
ltxsdelement nameperson
typetttgtltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegtlt/xsdco
mplexTypegt
DTD lt!ELEMENT person (name,address)gt
17- Types
- Simple types (integers, strings, ...)
- Complex types (regular expressions, like in DTDs)
- Element-type-element alternation
- Root element has a complex type
- That type is a regular expression of elements
- Those elements have their complex types...
- ...
- On the leaves we have simple types
18Local and Global Types in XML Schema
- Local type
- ltxsdelement namepersongt
define locally the persons type
lt/xsdelementgt - Global type ltxsdelement nameperson
typettt/gt ltxsdcomplexType nametttgt
define here the type ttt
lt/xsdcomplexTypegt
Global types can be reused in other elements
19Local v.s. Global Elements inXML Schema
- Local element
- ltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement
nameaddress type.../gt...
lt/xsdsequencegt lt/xsdcomplexTypegt - Global element ltxsdelement nameaddress
type.../gt ltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement
refaddress/gt ... lt/xsdsequencegt
lt/xsdcomplexTypegt
Global elements like in DTDs
20Regular Expressions in XML Schema
- Recall the element-type-element alternation
- ltxsdcomplexType name....gt
regular expression on
elements lt/xsdcomplexTypegt - Regular expressions
- ltxsdsequencegt A B C lt/...gt
A B C - ltxsdchoicegt A B C lt/...gt
A B C - ltxsdgroupgt A B C lt/...gt
(A B C) - ltxsd... minOccurs0 maxOccursunboundedgt
..lt/...gt (...) - ltxsd... minOccurs0 maxOccurs1gt ..lt/...gt
(...)?
21Local Names in XML-Schema
ltxsdelement namepersongt ltxsdcomplexTypegt
. . . . . ltxsdelement
namenamegt ltxsdcomplexTypegt
ltxsdsequencegt
ltxsdelement namefirstname
typexsdstring/gt
ltxsdelement namelastname typexsdstring/gt
lt/xsdsequencegt
lt/xsdelementgt . . . .
lt/xsdcomplexTypegtlt/xsdelementgt ltxsdelement
nameproductgt ltxsdcomplexTypegt . .
. . . ltxsdelement namename
typexsdstring/gt lt/xsdcomplexTypegtlt/xsdel
ementgt
name has different meanings in person and in
product
22Subtle Use of Local Names
ltxsdcomplexType nameoneBgt ltxsdchoicegt
ltxsdelement nameB typexsdstring/gt
ltxsdsequencegt ltxsdelement nameA
typeonlyAs/gt ltxsdelement nameA
typeoneB/gt lt/xsdsequencegt
ltxsdsequencegt ltxsdelement nameA
typeoneB/gt ltxsdelement nameA
typeonlyAs/gt lt/xsdsequencegt
lt/xsdchoicegtlt/xsdcomplexTypegt
ltxsdelement nameA typeoneB/gt ltxsdcomplex
Type nameonlyAsgt ltxsdchoicegt
ltxsdsequencegt ltxsdelement nameA
typeonlyAs/gt ltxsdelement nameA
typeonlyAs/gt lt/xsdsequencegt
ltxsdelement nameA typexsdstring/gt
lt/xsdchoicegtlt/xsdcomplexTypegt
Arbitrary deep binary tree with A elements, and a
single B element
23Summary of XML Schema
- Formal Expressive Power
- Can express precisely the regular tree languages
- Lots of other stuff
- Some form of inheritance
- A null value
- Large collection of data types
24XQuery
- Based on Quilt (which is based on XML-QL)
- http//www.w3.org/TR/xquery/
- XML Query data model
- Ordered !
25FLWR (Flower) Expressions
- FOR ... LET... FOR... LET...
- WHERE...
- RETURN...
26XQuery
- Find all book titles published after 1995
FOR x IN document("bib.xml")/bib/book WHERE
x/year gt 1995 RETURN x/title
Result lttitlegt abc lt/titlegt lttitlegt def
lt/titlegt lttitlegt ghi lt/titlegt
27XQuery
- For each author of a book by Morgan Kaufmann,
list all books she published
FOR a IN distinct(document("bib.xml")
/bib/bookpublisherMorgan
Kaufmann/author) RETURN ltresultgt
a, FOR t IN
/bib/bookauthora/title
RETURN t lt/resultgt
distinct a function that eliminates duplicates
28XQuery
- Result
- ltresultgt
- ltauthorgtJoneslt/authorgt
- lttitlegt abc lt/titlegt
- lttitlegt def lt/titlegt
- lt/resultgt
- ltresultgt
- ltauthorgt Smith lt/authorgt
- lttitlegt ghi lt/titlegt
- lt/resultgt
29XQuery
- FOR x in expr -- binds x to each value in the
list expr - LET x expr -- binds x to the entire list
expr - Useful for common subexpressions and for
aggregations
30XQuery
ltbig_publishersgt FOR p IN
distinct(document("bib.xml")//publisher)
LET b document("bib.xml")/bookpublisher
p WHERE count(b) gt 100 RETURN
p lt/big_publishersgt
count a (aggregate) function that returns the
number of elms
31XQuery
- Find books whose price is larger than average
LET aavg(document("bib.xml")/bib/book/price) FOR
b in document("bib.xml")/bib/book WHERE
b/price gt a RETURN b
32XQuery
- Summary
- FOR-LET-WHERE-RETURN FLWR
FOR/LET Clauses
List of tuples
WHERE Clause
List of tuples
RETURN Clause
Instance of Xquery data model
33FOR v.s. LET
- FOR
- Binds node variables ? iteration
- LET
- Binds collection variables ? one value
34FOR v.s. LET
Returns ltresultgt ltbookgt...lt/bookgtlt/resultgt
ltresultgt ltbookgt...lt/bookgtlt/resultgt ltresultgt
ltbookgt...lt/bookgtlt/resultgt ...
FOR x IN document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
LET x IN document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
Returns ltresultgt ltbookgt...lt/bookgt
ltbookgt...lt/bookgt
ltbookgt...lt/bookgt ... lt/resultgt
35Collections in XQuery
- Ordered and unordered collections
- /bib/book/author an ordered collection
- Distinct(/bib/book/author) an unordered
collection - LET a /bib/book ? a is a collection
- b/author ? a collection (several authors...)
Returns ltresultgt ltauthorgt...lt/authorgt
ltauthorgt...lt/authorgt
ltauthorgt...lt/authorgt
... lt/resultgt
RETURN ltresultgt b/author lt/resultgt
36Collections in XQuery
- What about collections in expressions ?
- b/price ? list of n
prices - b/price 0.7 ? list of n numbers
- b/price b/quantity ? list of n x m numbers ??
37Sorting in XQuery
ltpublisher_listgt FOR p IN distinct(document("
bib.xml")//publisher) RETURN ltpublishergt
ltnamegt p/text() lt/namegt ,
FOR b IN document("bib.xml")//bookpublisher
p RETURN ltbookgt
b/title ,
b/price
lt/bookgt SORTBY(price DESCENDING)
lt/publishergt SORTBY(name)
lt/publisher_listgt
38Sorting in XQuery
- Sorting arugments refer to the name space of the
RETURN clause, not the FOR clause
39If-Then-Else
FOR h IN //holding RETURN ltholdinggt
h/title, IF
h/_at_type "Journal"
THEN h/editor ELSE
h/author lt/holdinggt SORTBY
(title)
40Existential Quantifiers
FOR b IN //book WHERE SOME p IN b//para
SATISFIES contains(p, "sailing") AND
contains(p, "windsurfing") RETURN b/title
41Universal Quantifiers
FOR b IN //book WHERE EVERY p IN b//para
SATISFIES contains(p, "sailing") RETURN
b/title