Title: More XML: semantics, DTDs, XPATH
1More XML semantics, DTDs, XPATH
2XML Document
ltdatagt ltperson ido555 gt ltnamegt Mary
lt/namegt ltaddressgt ltstreetgt Maple lt/streetgt ltnogt
345 lt/nogt ltcitygt Seattle lt/citygt
lt/addressgt lt/persongt ltpersongt ltnamegt John
lt/namegt ltaddressgt Thailand lt/addressgt ltphonegt
23456 lt/phonegt ltmarried/gt lt/persongt lt/datagt
3XML Terminology
- Elements
- enclosed within tags
- ltpersongt lt/persongt
- nested within other elements
- ltpersongt ltaddressgt lt/addressgt lt/persongt
- can be empty
- ltmarriedgtlt/marriedgt abbreviated as ltmarried/gt
- can have Attributes
- ltperson id0005gt lt/persongt
- XML document has as single ROOT element
4Buzzwords
- What is XML?
- W3C data exchange format
- Hierarchical data model
- Self-describing
- Semi-structured
5XML as a Tree !!
ltdatagt ltperson ido555 gt ltnamegt Mary
lt/namegt ltaddressgt ltstreetgt Maple lt/streetgt ltnogt
345 lt/nogt ltcitygt Seattle lt/citygt
lt/addressgt lt/persongt ltpersongt ltnamegt John
lt/namegt ltaddressgt Thailand lt/addressgt ltphonegt
23456 lt/phonegt lt/persongt lt/datagt
data
Minor Detail Order matters !!!
6XML is self-describing
- Schema elements become part of the data
- In XML ltpersonsgt, ltnamegt, ltphonegt are part of the
data, and are repeated many times - Relational schema persons(name,phone) defined
separately for the data and is fixed - Consequence XML is much more flexible
7Relational Data as XML
person
- ltpersonsgt
- ltpersongt ltnamegtJohnlt/namegt
- ltphonegt 3634lt/phonegt
- lt/persongt
- ltpersongt ltnamegtSuelt/namegt
- ltphonegt 6343lt/phonegt
- lt/persongt
- ltpersongt ltnamegtDicklt/namegt
- ltphonegt 6363lt/phonegt
- lt/persongt
- lt/personsgt
8XML is semi-structured
- Missing elements
- Could represent in a table with nulls
ltpersongt ltnamegt Johnlt/namegt
ltphonegt1234lt/phonegt lt/persongt ltpersongt
ltnamegtJoelt/namegt lt/persongt
? no phone !
name phone
John 1234
Joe -
9XML is semi-structured
- Repeated elements
- Impossible in tables
ltpersongt ltnamegt Marylt/namegt
ltphonegt2345lt/phonegt
ltphonegt3456lt/phonegt lt/persongt
? two phones !
name phone
Mary 2345 3456
???
10XML is semi-structured
- Elements with different types in different
objects - Heterogeneous collections
- ltpersonsgt can contain both ltpersongts and
ltcustomergts
ltpersongt ltnamegt ltfirstgt John lt/firstgt
ltlastgt Smith lt/lastgt
lt/namegt
ltphonegt1234lt/phonegt lt/persongt
? structured name !
11Document Type Definition DTD
- an XML document may have a DTD
- rules about the contents of elements
- like a schema for an XML document
- XML document
- well-formed if tags are correctly closed
- valid if it has a DTD and conforms to it
- validation is useful in data exchange
- part of the original XML specification
12Very Simple DTD
lt!DOCTYPE company lt!ELEMENT company
((personproduct))gt lt!ELEMENT person (ssn,
name, office, phone?)gt lt!ELEMENT ssn
(PCDATA)gt lt!ELEMENT name (PCDATA)gt
lt!ELEMENT office (PCDATA)gt lt!ELEMENT phone
(PCDATA)gt lt!ELEMENT product (pid, name,
description?)gt lt!ELEMENT pid (PCDATA)gt
lt!ELEMENT description (PCDATA)gt gt
13DTD The Content Model
-
- Content model
- Complex a regular expression over other
elements - Text-only PCDATA
- Empty EMPTY
- Any ANY
- Mixed content (PCDATA A B C)
lt!ELEMENT tag (CONTENT)gt
contentmodel
14Very Simple DTD
Example of valid XML document
ltcompanygt ltpersongt ltssngt 123456789 lt/ssngt
ltnamegt John lt/namegt
ltofficegt B432 lt/officegt
ltphonegt 1234 lt/phonegt lt/persongt
ltpersongt ltssngt 987654321 lt/ssngt
ltnamegt Jim lt/namegt
ltofficegt B123 lt/officegt lt/persongt
ltproductgt ... lt/productgt ... lt/companygt
15DTD Regular Expressions
DTD
XML
sequence
lt!ELEMENT name
(firstName, lastName))
ltnamegt ltfirstNamegt . . . . . lt/firstNamegt
ltlastNamegt . . . . . lt/lastNamegt lt/namegt
optional
lt!ELEMENT name (firstName?, lastName))
ltpersongt ltnamegt . . . . . lt/namegt
ltphonegt . . . . . lt/phonegt ltphonegt . . . .
. lt/phonegt ltphonegt . . . . . lt/phonegt .
. . . . . lt/persongt
Kleene star
lt!ELEMENT person (name, phone))
alternation
lt!ELEMENT person (name, (phoneemail)))
lots of other features
16Querying XML Data
- XPath simple navigation through the tree
- XQuery the SQL of XML
- XSLT recursive traversal
- will not discuss in class
17Sample Data for Queries
- ltbibgtltbookgt ltpublishergt Addison-Wesley
lt/publishergt ltauthorgt Serge
Abiteboul lt/authorgt ltauthorgt
ltfirst-namegt Rick lt/first-namegt
ltlast-namegt Hull lt/last-namegt
lt/authorgt ltauthorgt Victor
Vianu lt/authorgt lttitlegt Foundations
of Databases lt/titlegt ltyeargt 1995
lt/yeargtlt/bookgtltbook price55gt
ltpublishergt Freeman lt/publishergt
ltauthorgt Jeffrey D. Ullman lt/authorgt
lttitlegt Principles of Database and Knowledge
Base Systems lt/titlegt ltyeargt 1998
lt/yeargtlt/bookgt - lt/bibgt
18Data Model for XPath
The root
The root element
book
book
publisher
author
. . . .
Addison-Wesley
Serge Abiteboul
19XPath Simple Expressions
- Result ltyeargt 1995 lt/yeargt
- ltyeargt 1998 lt/yeargt
- Result empty (there were no papers)
/bib/book/year
/bib/paper/year
20XPath Restricted Kleene Closure
//author
- Resultltauthorgt Serge Abiteboul lt/authorgt
- ltauthorgt ltfirst-namegt Rick
lt/first-namegt - ltlast-namegt Hull
lt/last-namegt - lt/authorgt
- ltauthorgt Victor Vianu lt/authorgt
- ltauthorgt Jeffrey D. Ullman
lt/authorgt - Result ltfirst-namegt Rick lt/first-namegt
/bib//first-name
21Xpath Text Nodes
/bib/book/author/text()
- Result Serge Abiteboul
- Jeffrey D. Ullman
- Rick Hull doesnt appear because he has
firstname, lastname - Functions in XPath
- text() matches the text value
- node() matches any node ( or _at_ or text())
- name() returns the name of the current tag
22Xpath Wildcard
- Result ltfirst-namegt Rick lt/first-namegt
- ltlast-namegt Hull lt/last-namegt
- Matches any element
//author/
23Xpath Attribute Nodes
/bib/book/_at_price
- Result 55
- _at_price means that price is an attribute
24Xpath Predicates
/bib/book/authorfirstname
- Result ltauthorgt ltfirst-namegt Rick lt/first-namegt
- ltlast-namegt Hull
lt/last-namegt - lt/authorgt
25Xpath More Predicates
- Result ltlastnamegt lt/lastnamegt
- ltlastnamegt lt/lastnamegt
-
/bib/book/authorfirstnameaddress//zipcity/
lastname
26Xpath More Predicates
/bib/book_at_price lt 60
/bib/bookauthor/_at_age lt 25
/bib/bookauthor/text()
27Xpath Summary
- bib matches a bib element
- matches any element
- / matches the root element
- /bib matches a bib element under root
- bib/paper matches a paper in bib
- bib//paper matches a paper in bib, at any depth
- //paper matches a paper at any depth
- paperbook matches a paper or a book
- _at_price matches a price attribute
- bib/book/_at_price matches price attribute in book,
in bib - bib/book/_at_pricelt55/author/lastname matches