Title: A1258690283CaZpc
14/1
about XML/Xquery/RDF
2Why XML
- XML is the confluence of several factors
- The Web needed a more declarative format for
data, trying to describe the meaning of the data - Documents needed a mechanism for extended tags
- Database people needed a more flexible
interchange format - Original expectation
- The whole web would go to XML instead of HTML
- Todays reality
- Not so But XML is used all over under the
covers
3(No Transcript)
4An XML Document Example
- ltimdbgt
- ltshow year1993gt
- lttitlegtFugitive, Thelt/titlegt
- ltreviewgt
- ltsuntimesgt
- ltreviewergtRoger
Ebertlt/reviewergt gives ltratinggttwo thumbs - uplt/ratinggt! A fun action
movie, Harrison Ford at his best. - lt/suntimesgt
- lt/reviewgt
- ltreviewgt
- ltnytgtThe standard hollywood
summer movie strikes back.lt/nytgt - lt/reviewgt
- ltbox_officegt183,752,965lt/box_officegt
- lt/showgt
- ltshow year1994gt
- lttitlegtX Files,Thelt/titlegt
- ltseasonsgt4lt/seasonsgt
- lt/showgt
- lt/imdbgt
Mixed Content
Attribute
5XML Terminology
- tags book, title, author,
- start tag ltbookgt, end tag lt/bookgt
- elements ltbookgtltbookgt,ltauthorgtlt/authorgt
- elements are nested
- empty element ltredgtlt/redgt abbrv. ltred/gt
- an XML document single root element
well formed XML document if it has matching tags
6More XML Attributes
- ltbook price 55 currency USDgt
- lttitlegt Foundations of Databases lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
-
- ltyeargt 1995 lt/yeargt
- lt/bookgt
Attributes are single-valued --No
guidance on when to use them
7More XML Oids and References
Object identifiers
- ltperson ido555gt ltnamegt Jane lt/namegt lt/persongt
- ltperson ido456gt ltnamegt Mary lt/namegt
- ltchildren
idrefo123 o555/gt - lt/persongt
- ltperson ido123 mothero456gtltnamegtJohnlt/namegt
- lt/persongt
oids and references in XML are just syntax
8HTML vs. XML
- lth1gt Bibliography lt/h1gt
- ltpgt ltigt Foundations of Databases lt/igt
- Abiteboul, Hull, Vianu
- ltbrgt Addison Wesley, 1995
- ltpgt ltigt Data on the Web lt/igt
- Abiteoul, Buneman, Suciu
- ltbrgt Morgan Kaufmann, 1999
- ltbibliographygt
- ltbookgt lttitlegt Foundations lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
- ltauthorgt Hull lt/authorgt
- ltauthorgt Vianu lt/authorgt
- ltpublishergt Addison Wesley
lt/publishergt - ltyeargt 1995 lt/yeargt
- lt/bookgt
-
- lt/bibliographygt
Self-describing -Schema info part of the
data -Good for data exchange (albeit
baroque for storage)
9lth1gt Bibliography lt/h1gt ltpgt ltigt Foundations of
Databases lt/igt Abiteboul, Hull, Vianu
ltbrgt Addison Wesley, 1995 ltpgt ltigt Data on
the Web lt/igt Abiteoul, Buneman, Suciu
ltbrgt Morgan Kaufmann, 1999
ltbibliographygt ltbookgt lttitlegt Foundations
lt/titlegt ltauthorgt Abiteboul
lt/authorgt ltauthorgt Hull
lt/authorgt ltauthorgt Vianu
lt/authorgt ltpublishergt Addison
Wesley lt/publishergt ltyeargt 1995
lt/yeargt lt/bookgt lt/bibliographygt
HTML describes presentation
XML describes content
10Why are Database folks so excited about XML?
- XML is just a syntax for (self-describing) data
- This is still exciting because
- No standard syntax for relational data
- With XML, we can
- Translate any legacy data to XML
- Can exchange data in XML format
- Ship over the web, input to any application
11XML ? machine accessible meaning
Jim Hendler
This is what a web-page in natural language
looks like for a machine
12XML ? machine accessible meaning
Jim Hendler
XML allows meaningful tags to be added toparts
of the text
13XML ? machine accessible meaning
Jim Hendler
But to your machine, the tags look like this.
14XML ? machine accessible meaning
Jim Hendler
Schemas help.
lt CV gt
by relating common termsbetween documents
private
15But other people use other schemas
Jim Hendler
Someone else has one like this.
16But other people use other schemas
Jim Hendler
lt CV gt
which dont fit in
private
Moral There is still need for
ontology mapping..
17The X-standards
- XML an on-the-wire representation for data
- Xquery a query language for XML
- Xschema a schema description language for XML
data - RDF a language for meta-data description
- WSDL/SOAP/UDDI languages for describing services
18lth1gt Bibliography lt/h1gt ltpgt ltigt Foundations of
Databases lt/igt Abiteboul, Hull, Vianu
ltbrgt Addison Wesley, 1995 ltpgt ltigt Data on
the Web lt/igt Abiteoul, Buneman, Suciu
ltbrgt Morgan Kaufmann, 1999
ltbibliographygt ltbookgt lttitlegt Foundations
lt/titlegt ltauthorgt Abiteboul
lt/authorgt ltauthorgt Hull
lt/authorgt ltauthorgt Vianu
lt/authorgt ltpublishergt Addison
Wesley lt/publishergt ltyeargt 1995
lt/yeargt lt/bookgt lt/bibliographygt
HTML describes presentation
XML describes content
19XML Dialect pot pourri
- Extensible Financial Reporting Markup Language
(XFRML), - eXtensible Business Reporting Language (XBRL),
- MusicXML,
- Spacecraft Markup Language (SML),
- Bank Internet Payment System (BIPS),
- Bioinformatic Sequence Markup Language (BSML),
- Biopolymer Markup Language (BIOML),
- Open Catalog Format (OCF),
- Chemical Markup Language (CML),
- Electronic Business XML Initiative (ebXML),
- Open Trading Protocol (OTP),
- FinXML, Financial Information eXchange protocol
(FIX), - RecipeML, CVML,
- XML Bookmark Exchange Language (XBEL),
- Scalable Vector Graphics (SVG),
- NewsML,
- DocBook,
- Real Estate Listing Markup Language (RELML), . . .
20XML vs. Relational Data
- XML is meant as a language that supports both
Text and Structured Data - Conflicting demands...
- XML supports semi-structured data
- In essence, the schema can be union of multiple
schemas - Easy to represent books with or without prices,
books with any number of authors etc. - XML supports free mixing of text and data
- using the PCDATA type
- XML is ordered (while relational data is
unordered)
21XML Data Model
imdb
show
title
review
review
_at_year
Fugitive, The
1993
suntimes
nyt
rating
reviewer
two...
gives
Roger Ebert
- Check http//www.w3.org/XML/ for more details
22DTDs
Notice that DTD is not In XML syntax ?
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
Semi- structured
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
23XML Schemas
- More recent proposal (with XML syntax)
- unifies previous schema proposals
- generalizes DTDs
- uses XML syntax
- two documents structure and datatypes
- http//www.w3.org/TR/xmlschema-1
- http//www.w3.org/TR/xmlschema-2
24XML Schema
25RDF Meta-data Standard for Web
- ltrdfDescription aboutwww.mypage.comgt
- ltaboutgt birds, butterflies, snakes
lt/aboutgt - ltauthorgt ltrdfDescriptiongt
- ltfirstnamegt John
lt/firstnamegt - ltlastnamegt Smith
lt/lastnamegt - lt/rdfDescriptiongt
- lt/authorgt
- lt/rdfDescriptiongt
Goodol semantic networks..?
26Xquery Resources
- XQuery 1.0 An XML Query Language
- W3C Working Draft 20 December 2001
- XML Query Use Cases
- W3C Working Draft 20 December 2001
- Microsoft .Net Xquery Language Demo
- http//131.107.228.20/
- http//support.x-hive.com/xquery/index.html
- Supports querying on the documents described in
the W3C Use Cases - Xquery Tutorial by Fankhauser Wadler
- www.research.avayalabs.com/user/wadler/papers/xque
ry-tutorial/ xquery-tutorial.pdf
2710/24
Today Xquery discussion Semantic
Web standards
- --Exam 1 returned (both versions)
- --Project 2 due on Wednesday
- --Homework 3 started (will be closed shortly)
- --Approximate schedule of topics put up
28Exam 1 Stats
- In-class
- Avg 44 Max 62 Min 32 Stdev 12.7
- Grads 49/62/33/9.8
- UG 34/53/16/12.6
- At-home
- Avg 53Max 63 Min 32.5 Stdev 8.18
- Grads 56.8/63/49/4.75
- UG 48.4/59/32.5/9.69
All happy families are happy alike, each unhappy
family is unhappy in its own way
All correct answers are correct alike, each
incorrect answer is incorrect in its own way
29Querying XML
- Requirements
- Need to handle lack of schema.
- We may not know much about the data, so we need
to navigate the XML. - Need to support both information retrieval and
SQL-style queries. - Ordered vs. un-ordered XML
- Human readable
- like SQL? ?
- Candidates
- Many based on conflicting requirements
- XSL Makes IR folks happy
- XML-QL Makes DB folks happy
- Xquery W3Cs attempt to make everybody (un)happy
30http//support.x-hive.com/xquery/index.html
You will be asked to play with it in homework
3 qn 4
31FLoWeR Expressions
- Xquery queries are made up of FLWR expressions
that work on paths - For binds variables to nodes
- Let computes aggregates
- Where applies a formula to find matching elements
- Return constructs the output elements
- Path expressions are of the form
- element//element/elementattribvalue
32Comparison to SQL
- Look at the use case description on Xquery manual
- Supports all (?) SQL style queries (with
different syntax of course) default queries in
the demo - Has support for
- constructionoutputting the answers in
arbitrary XML formats (use case XMP ) - path expressions --- navigating the XML tree
(use case seq) - Simple text queries use case text
- Allows queries on Tag elements
- Removes the data/meta-data barrier in queries
- For each book that has at least one author, list
the title and first two authors, and an empty
"et-al" element if the book has additional
authors. XMP use case 6
33DTD for http//www.bn.com/bib.xml
- lt!ELEMENT bib (book )gt
- lt!ELEMENT book (title, (author editor ),
publisher, price )gt - lt!ATTLIST book year CDATA REQUIRED gt
- lt!ELEMENT author (last, first )gt
- lt!ELEMENT editor (last, first, affiliation )gt
- lt!ELEMENT title (PCDATA )gt
- lt!ELEMENT last (PCDATA )gt
- lt!ELEMENT first (PCDATA )gt
- lt!ELEMENT affiliation (PCDATA )gt
- lt!ELEMENT publisher (PCDATA )gt
- lt!ELEMENT price (PCDATA )gt
34Example Query
Query
Result
- ltbibgt
- for b in /bib/book
- where b/publisher "Addison-Wesley"
- and b/_at_year gt 1991
- return ltbook year b/_at_year gt
- b/title
- lt/bookgt
- lt/bibgt
- For all books after 1991,
- return with Year changed from
- a tag to an attribute
ltbibgt ltbook year"1994"gt lttitlegtTCP/IP
Illustratedlt/titlegt lt/bookgt ltbook
year"1992"gt lttitlegtAdvanced Programming in
the Unix environmentlt/titlegt lt/bookgt lt/bibgt
35Example Query (2)
- Return the books that cost more at amazon than
fatbrain - Let amazon document(http//www.amazon.com/book
s.xml), - Let fatbrain document(http//www.fatbrain.com/
books.xml) - For am in amazon/books/book,
- fat in fatbrain/books/book
- Where am/isbn fat/isbn
- and am/price gt fat/price
- Return ltbookgt am/title, am/price, fat/price
ltbookgt
Join
36XML frenzy in the DB Community
- Now that XML is there, what can we do with it?
- Convert all databases from Relational to XML?
- Or provide XML views of relational databases?
- Develop theory of native XML databases?
- Or assume that XML data will be stored in
relational databases.. - Issues What sort of storage mechanisms? What
sort of indices?
37XML middleware for Databases
RDBMS
On the internet, nobody needs to know that you
are a dog
- XML adapters (middle-ware) received significant
attention in DB community - SilkRoute (ATT)
- Xperanto (IBM)
- Issues
- Need to convert relational data into XML
- Tagging (easy)
- Need to convert Xquery queries into equivalent
SQL queries - Trickier as Xquery supports schema querying