More XML: semantics, DTDs, XPATH - PowerPoint PPT Presentation

About This Presentation
Title:

More XML: semantics, DTDs, XPATH

Description:

More XML: semantics, DTDs, XPATH February 18, 2004 XML Document XML Terminology Elements enclosed within tags: nested within other elements ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 28
Provided by: JayantM7
Category:
Tags: xml | xpath | dick | dtds | more | semantics

less

Transcript and Presenter's Notes

Title: More XML: semantics, DTDs, XPATH


1
More XML semantics, DTDs, XPATH
  • February 18, 2004

2
XML Document
ltdatagt ltperson ido555 gt ltnamegt Mary
lt/namegt ltaddressgt ltstreetgt Maple lt/streetgt ltnogt
345 lt/nogt ltcitygt Seattle lt/citygt
lt/addressgt lt/persongt ltpersongt ltnamegt John
lt/namegt ltaddressgt Thailand lt/addressgt ltphonegt
23456 lt/phonegt ltmarried/gt lt/persongt lt/datagt
3
XML Terminology
  • Elements
  • enclosed within tags
  • ltpersongt lt/persongt
  • nested within other elements
  • ltpersongt ltaddressgt lt/addressgt lt/persongt
  • can be empty
  • ltmarriedgtlt/marriedgt abbreviated as ltmarried/gt
  • can have Attributes
  • ltperson id0005gt lt/persongt
  • XML document has as single ROOT element

4
Buzzwords
  • What is XML?
  • W3C data exchange format
  • Hierarchical data model
  • Self-describing
  • Semi-structured

5
XML as a Tree !!
ltdatagt ltperson ido555 gt ltnamegt Mary
lt/namegt ltaddressgt ltstreetgt Maple lt/streetgt ltnogt
345 lt/nogt ltcitygt Seattle lt/citygt
lt/addressgt lt/persongt ltpersongt ltnamegt John
lt/namegt ltaddressgt Thailand lt/addressgt ltphonegt
23456 lt/phonegt lt/persongt lt/datagt
data
Minor Detail Order matters !!!
6
XML is self-describing
  • Schema elements become part of the data
  • In XML ltpersonsgt, ltnamegt, ltphonegt are part of the
    data, and are repeated many times
  • Relational schema persons(name,phone) defined
    separately for the data and is fixed
  • Consequence XML is much more flexible

7
Relational Data as XML
person
  • ltpersonsgt
  • ltpersongt ltnamegtJohnlt/namegt
  • ltphonegt 3634lt/phonegt
  • lt/persongt
  • ltpersongt ltnamegtSuelt/namegt
  • ltphonegt 6343lt/phonegt
  • lt/persongt
  • ltpersongt ltnamegtDicklt/namegt
  • ltphonegt 6363lt/phonegt
  • lt/persongt
  • lt/personsgt

8
XML is semi-structured
  • Missing elements
  • Could represent in a table with nulls

ltpersongt ltnamegt Johnlt/namegt
ltphonegt1234lt/phonegt lt/persongt ltpersongt
ltnamegtJoelt/namegt lt/persongt
? no phone !
name phone
John 1234
Joe -
9
XML is semi-structured
  • Repeated elements
  • Impossible in tables

ltpersongt ltnamegt Marylt/namegt
ltphonegt2345lt/phonegt
ltphonegt3456lt/phonegt lt/persongt
? two phones !
name phone
Mary 2345 3456

???
10
XML is semi-structured
  • Elements with different types in different
    objects
  • Heterogeneous collections
  • ltpersonsgt can contain both ltpersongts and
    ltcustomergts

ltpersongt ltnamegt ltfirstgt John lt/firstgt
ltlastgt Smith lt/lastgt
lt/namegt
ltphonegt1234lt/phonegt lt/persongt
? structured name !
11
Document Type Definition DTD
  • an XML document may have a DTD
  • rules about the contents of elements
  • like a schema for an XML document
  • XML document
  • well-formed if tags are correctly closed
  • valid if it has a DTD and conforms to it
  • validation is useful in data exchange
  • part of the original XML specification

12
Very Simple DTD
lt!DOCTYPE company lt!ELEMENT company
((personproduct))gt lt!ELEMENT person (ssn,
name, office, phone?)gt lt!ELEMENT ssn
(PCDATA)gt lt!ELEMENT name (PCDATA)gt
lt!ELEMENT office (PCDATA)gt lt!ELEMENT phone
(PCDATA)gt lt!ELEMENT product (pid, name,
description?)gt lt!ELEMENT pid (PCDATA)gt
lt!ELEMENT description (PCDATA)gt gt
13
DTD The Content Model
  • Content model
  • Complex a regular expression over other
    elements
  • Text-only PCDATA
  • Empty EMPTY
  • Any ANY
  • Mixed content (PCDATA A B C)

lt!ELEMENT tag (CONTENT)gt
contentmodel
14
Very Simple DTD
Example of valid XML document
ltcompanygt ltpersongt ltssngt 123456789 lt/ssngt
ltnamegt John lt/namegt
ltofficegt B432 lt/officegt
ltphonegt 1234 lt/phonegt lt/persongt
ltpersongt ltssngt 987654321 lt/ssngt
ltnamegt Jim lt/namegt
ltofficegt B123 lt/officegt lt/persongt
ltproductgt ... lt/productgt ... lt/companygt
15
DTD Regular Expressions
DTD
XML
sequence
lt!ELEMENT name
(firstName, lastName))
ltnamegt ltfirstNamegt . . . . . lt/firstNamegt
ltlastNamegt . . . . . lt/lastNamegt lt/namegt
optional
lt!ELEMENT name (firstName?, lastName))
ltpersongt ltnamegt . . . . . lt/namegt
ltphonegt . . . . . lt/phonegt ltphonegt . . . .
. lt/phonegt ltphonegt . . . . . lt/phonegt .
. . . . . lt/persongt
Kleene star
lt!ELEMENT person (name, phone))
alternation
lt!ELEMENT person (name, (phoneemail)))
lots of other features
16
Querying XML Data
  • XPath simple navigation through the tree
  • XQuery the SQL of XML
  • XSLT recursive traversal
  • will not discuss in class

17
Sample Data for Queries
  • ltbibgtltbookgt ltpublishergt Addison-Wesley
    lt/publishergt ltauthorgt Serge
    Abiteboul lt/authorgt ltauthorgt
    ltfirst-namegt Rick lt/first-namegt
    ltlast-namegt Hull lt/last-namegt
    lt/authorgt ltauthorgt Victor
    Vianu lt/authorgt lttitlegt Foundations
    of Databases lt/titlegt ltyeargt 1995
    lt/yeargtlt/bookgtltbook price55gt
    ltpublishergt Freeman lt/publishergt
    ltauthorgt Jeffrey D. Ullman lt/authorgt
    lttitlegt Principles of Database and Knowledge
    Base Systems lt/titlegt ltyeargt 1998
    lt/yeargtlt/bookgt
  • lt/bibgt

18
Data Model for XPath
The root
The root element
book
book
publisher
author
. . . .
Addison-Wesley
Serge Abiteboul
19
XPath Simple Expressions
  • Result ltyeargt 1995 lt/yeargt
  • ltyeargt 1998 lt/yeargt
  • Result empty (there were no papers)

/bib/book/year
/bib/paper/year
20
XPath Restricted Kleene Closure
//author
  • Resultltauthorgt Serge Abiteboul lt/authorgt
  • ltauthorgt ltfirst-namegt Rick
    lt/first-namegt
  • ltlast-namegt Hull
    lt/last-namegt
  • lt/authorgt
  • ltauthorgt Victor Vianu lt/authorgt
  • ltauthorgt Jeffrey D. Ullman
    lt/authorgt
  • Result ltfirst-namegt Rick lt/first-namegt

/bib//first-name
21
Xpath Text Nodes
/bib/book/author/text()
  • Result Serge Abiteboul
  • Jeffrey D. Ullman
  • Rick Hull doesnt appear because he has
    firstname, lastname
  • Functions in XPath
  • text() matches the text value
  • node() matches any node ( or _at_ or text())
  • name() returns the name of the current tag

22
Xpath Wildcard
  • Result ltfirst-namegt Rick lt/first-namegt
  • ltlast-namegt Hull lt/last-namegt
  • Matches any element

//author/
23
Xpath Attribute Nodes
/bib/book/_at_price
  • Result 55
  • _at_price means that price is an attribute

24
Xpath Predicates
/bib/book/authorfirstname
  • Result ltauthorgt ltfirst-namegt Rick lt/first-namegt
  • ltlast-namegt Hull
    lt/last-namegt
  • lt/authorgt

25
Xpath More Predicates
  • Result ltlastnamegt lt/lastnamegt
  • ltlastnamegt lt/lastnamegt

/bib/book/authorfirstnameaddress//zipcity/
lastname
26
Xpath More Predicates
/bib/book_at_price lt 60
/bib/bookauthor/_at_age lt 25
/bib/bookauthor/text()
27
Xpath Summary
  • bib matches a bib element
  • matches any element
  • / matches the root element
  • /bib matches a bib element under root
  • bib/paper matches a paper in bib
  • bib//paper matches a paper in bib, at any depth
  • //paper matches a paper at any depth
  • paperbook matches a paper or a book
  • _at_price matches a price attribute
  • bib/book/_at_price matches price attribute in book,
    in bib
  • bib/book/_at_pricelt55/author/lastname matches
Write a Comment
User Comments (0)
About PowerShow.com