Title: XML: Extensible Markup Language
1XML Extensible Markup Language
2How the Web is Today
- HTML documents
- all intended for human consumption
- many generated automatically by applications
Easy to fetch any Web page, from any server, any
platform
3Limits of the Web Today
- Application cannot consume HTML
- HTML wrapper technology is brittle
- screen scraping
- OO technology (Corba) requires controlled
environment - Companies merge, form partnerships need
interoperability fast
4Paradigm Shift on the Web
- new Web standard XML
- XML generated by applications
- XML consumed by applications
- data exchange
- across platforms enterprise interoperability
- across enterprises
Web from collection of documents to data and
documents
5XML
- a W3C standard to complement HTML
- origins structured text SGML
- motivation
- HTML describes presentation
- XML describes content
-
- http//www.w3.org/TR/REC-xml (2/98)
6From HTML to XML
HTML describes the presentation
7HTML
lth1gt Bibliography lt/h1gt ltpgt ltigt Foundations of
Databases lt/igt Abiteboul, Hull, Vianu
ltbrgt Addison Wesley, 1995 ltpgt ltigt Data on
the Web lt/igt Abiteoul, Buneman, Suciu
ltbrgt Morgan Kaufmann, 1999
8XML
ltbibliographygt ltbookgt lttitlegt Foundations
lt/titlegt ltauthorgt Abiteboul
lt/authorgt ltauthorgt Hull
lt/authorgt ltauthorgt Vianu
lt/authorgt ltpublishergt Addison
Wesley lt/publishergt ltyeargt 1995
lt/yeargt lt/bookgt lt/bibliographygt
9XML Terminology
- tags book, title, author,
- start tag ltbookgt, end tag lt/bookgt
- elements ltbookgtltbookgt,ltauthorgtlt/authorgt
- elements are nested
- empty element ltredgtlt/redgt abbrv. ltred/gt
- an XML document single root element
well formed XML document if it has matching tags
10More XML Attributes
ltbook price 55 currency USDgt lttitlegt
Foundations of Databases lt/titlegt ltauthorgt
Abiteboul lt/authorgt ltyeargt 1995
lt/yeargt lt/bookgt
attributes are alternative ways to represent data
11Query Languages Motivation
- granularity of the HTML Web one file
- granularity of Web data varies
- single data item get Johns salary
- entire database get all salaries
- aggregates get average salary
- need query language to define granularity
12XML-QL A Query Language for XML
- http//www.w3.org/TR/NOTE-xml-ql (8/98)
- features
- regular path expressions
- patterns, templates
- Skolem Functions
- based on OEM data model
13Pattern Matching in XML-QL
where ltbook languagefrenchgt
ltpublishergt ltnamegt
Morgan Kaufmann lt/namegt
lt/publishergt ltauthorgt a
lt/authorgt lt/bookgt in
www.a.b.c/bib.xml construct a
14Simple Constructors in XML-QL
where ltbook language lgt
ltauthorgt a lt/gt lt/gt in
www.a.b.c/bib.xml construct ltresultgt ltauthorgt
a lt/gt ltlanggt l lt/gt lt/gt
Note lt/gt abbreviates lt/bookgt or lt/resultgt or ...
ltresultgt ltauthorgtSmithlt/authorgtltlanggtEnglishlt/lang
gtlt/resultgt ltresultgt ltauthorgtSmithlt/authorgtltlanggtMa
ndarinlt/langgtlt/resultgt ltresultgt
ltauthorgtDoelt/authorgtltlanggtEnglishlt/langgtlt/resultgt
15Schemas in XML
- Document Type Definition (DTD)
- XML Schema
- RDF Schema
16Document Type Definition DTD
- part of the original XML specification
- an XML document may have a DTD
- terminology for XML
- well-formed if tags are correctly closed
- valid if it has a DTD and conforms to it
- validation is useful in data exchange
17DTDs as Grammars
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
18DTDs as Schemas
- Not so well suited
- impose unwanted constraints on order
lt!ELEMENT person (name,phone)gt - references cannot be constrained
- can be too vague
- lt!ELEMENT person ((namephoneemail))gt
19XML Storage
- text file (XML)
- store in ternary relation
- use DTD to derive schema
- mine data to derive schema
- build special purpose repository (Lore)
20XML Storage Text File
- advantages
- simple
- less space than one thinks
- reasonable clustering
- disadvantage
- no updates
- require special purpose query processor
21Store XML in Ternary Relation
o1
paper
o2
year
title
author
author
o3
o4
o5
o6
1986
Florescu, Kossman 1999
22Use DTD to derive Schema
- DTD
- ODMG classes
- Christophides et al. 1994 , Shanmugasundaram et
al. 1999
lt!ELEMENT employee (name, address,
project)gt lt!ELEMENT address (street, city,
state, zip)gt
class Employee public type tuple (namestring,
addressAddress, projectList(Project)) class
Address public type tuple (streetstring, )
23Mine Data to Derive Schema
Deutsch et al. 1999
24XML and Databases (1)
- In a strict sense, no.
- In a more liberal sense, yes, but
- XML has
- Storage (the XML document)
- A schema (DTD)
- Query languages (XQL, XML-QL, )
- Programming interfaces (SAX, DOM)
- XML lacks
- Efficient storage, indexes, security,
transactions, multi-user access, triggers,
queries across multiple documents
25XML and Databases (2)
- Data versus Documents
- There are two ways to use XML in a database
environment - Use XML as a data transport, i.e., to get data in
and out of the database - Data is stored in a relational or object-oriented
database - Middleware converts between the database and XML
- Use a native XML database, i.e., store data in
document form - Use a content management system
26XML and Databases (3)
- Data-centric documents
- Fairly regular structure
- Fine-grained data
- Little or no mixed content
- Order of sibling elements often not significant
- Document-centric documents
- Irregular structure
- Larger-grained data
- Lots of mixed content
- Order of sibling elements is significant
27XML and Databases (4)
- Data-centric storage and retrieval systems
- Use a database
- Add middleware to convert to/from XML
- Use an XML server (specialized product for
e-commerce) - Use an XML-enabled web server with a database
backend - Document-centric storage and retrieval systems
- Content management system
- Persistent DOM implementation
28XML and Databases (5)
- Mapping document structure to database structure
- Template-driven
- No predefined mapping
- Embedded commands process (retrieve) data
- Currently only available from RDBMS to XML
- lt?xml version1.0gtltFlightInfogt ltIntrogtThe
following flights have available
seatslt/Introgt ltSelectStmtgtSELECT Airline,
FltNumber, Depart, Arrive FROM
Flightslt/SelectStmtgt ltConcludegtWe hope one of
these meets your needslt/Concludegtlt/FlightInfogt
29XML and Databases (6)
- Template-driven - Example result
- lt?xml version1.0gtltFlightInfogt ltIntrogtThe
following flights have available
seatslt/Introgt ltFlightsgt ltRowgt
ltAirlinegtACMElt/Airlinegt ltFltNumbergt123lt/FltN
umbergt ltDepartgtDec 12, 2000,
1343lt/Departgt ltArrivegtDec 13, 2000,
0121lt/Arrivegt lt/Rowgt lt/Flightsgt
ltConcludegtWe hope one of these meets your
needslt/Concludegtlt/FlightInfogt
30XML and Databases (7)
- Mapping document structure to database structure
- Model-driven
- A data model is imposed on the structure of the
XML document - This model is mapped to the structures in the
database - There are two common models
- Model the XML document as a single table or a set
of tables - Model the XML document as a tree of data-specific
objects (good for OODBMS mapping)
31XML and Databases (8)
- Single table or set of tables
- lt?xml version1.0gtltdatabasegt lttablegt
ltrowgt ltcolumn1gt...lt/column1gt
ltcolumn2gt...lt/column2gt ... lt/rowgt
lt/tablegtlt/databasegt
- Tree organization
- Orders SalesOrder
/ \Customer Item Item
- Part Part
32XML and Databases (9)
- Generating DTDs from a database schema and vice
versa - Many times the DTD does not change often for an
application and does not need to be automatically
generated. - Some simple conversions are possible
- Example DTD from relational schema
- For each table, create an ELEMENT.
- For each column in a table, create an attribute
or a PCDATA-only child ELEMENT. - For each primary key/foreign key relationship
in which a column of the table contributes the
primary key, create a child ELEMENT.
33XML and Databases (10)
- Document-centric storage and retrieval systems
- Content management system
- Allows the storage of discrete content fragments,
such as examples, procedures, chapters, as well
as metadata such as author names, revision dates,
etc. - Many content management systems are built on top
of relational or object-oriented database
systems. - Examples
- BladeRunner (Interleaf), SigmaLink (STEP),
Parlance Content Manager (XyEnterprise),Target
2000 (Progressive Information Technology) - Persistent DOM implementation
34Further Readings
www. w3.org/XML www-db.stanford.edu/widom www-roc
q.inria.fr/abiteboul db.cis.upenn.edu www.researc
h.att.com/suciu Abiteboul, Buneman, Suciu Data
on the Web From Relational to Semistructured to
XML Morgan Kaufmann, 1999 (appears in October)