Title: Introduction to XML Database
1Introduction to XML Database
- National Cheng Kung University
- Department of Electrical Engineering
- Shang-Rong Tsai
2Outline
- Background
- XML-based databases and information system
- An XML-based Information Server
3Background
- What is XML?
- What is a database?
- Is XML a Database?
- What is an XML Database?
- What is the goal of XML Database?
- What is the difference between RDB and XDB?
- XML in the Web
4What is XML?
- XML stands for eXtensible Markup Language
- XML is a textual encoding system for describing
structured documents - HTML documents are SGML documents which conform
to the HTML DTDs - DTDs (Document Type Definitions) are the syntax
defined in SGML to describe the tag structure for
a particular type of document.
5What is XML? (cont.)
- XML is a subset of Standard Generalized Markup
Language (SGML)Â defined by the World Wide Web
Consortium (Use only 10 of SGML to express 90
power of SGML) - HTML is for presentation only
- XML allows developers to define their own markup
languages to express their information more
meaningfully - XML lets developers describe, deliver and
exchange structured data between applications,
including Web servers and browsers.Â
6The features of XML
- Extensible
- Self-described
- Separate data from presentation
- Text based, platform neutral
- Unified if confirm to schema of specific domain
- Integration
7XML Technologies
- XML/DTD
- XML Namespaces
- XSL/XSLT
- XLink/XPointer/XPath
- XML Schema
- XML data query
- XHTML
8Definition of Database
- A database is a collection of related data
- A database represent some aspect of real world.
- A database is logically coherent collection of
data with some inherent meaning. - A database is designed, built, and populated with
data for specific purpose.
9XML and Database
- XML is basically a data format, we still need
persistent store - Lots of the information on the Web come from
databases - Data model of XML and RDBMS / OODBMS
- XML mismatches with relational databases
10XML and Database (cont.)
- Schema mapping between XML documents and RDBMS
- data unit as XML document/element/attribute
- keys for relational tables
- data type mapping
- relationship between the stored tables
11XML and Database (cont.)
- Query/update languages
- Indexing and search
- A new database system for XML ?
- XML-enabled database.
- native XML database (the data is actually stored
as XML internally)
12Is XML a Database?
- Something similar
- data storage (XML documents)
- DTD/Schema
- Query languages (XQuery, XPath, XQL, XML-QL,
QUILT, etc.) - Programming interface (DOM/SAX)
13Is XML a Database? (cont.)
- Something it lacks
- transaction
- security
- indexing
- concurrent access
- query from multiple data objects
- data integrity
14XML as platform independent data format
15Data integration with XML
16What is an XML Database?
- Databases that store XML documents and provide a
view of operational data, generally either as
indexed text or as some variant of the DOM mapped
to an underlying data store.
17The Goal of XML Database
- Solve the problem of mismatches between the
XML-structure data and data model RDB products
support - Provide a complete solution for storing,
accessing and manipulating XML documents - Make the data integration and exchange easier
- Support the original goal of Web
- Human communication thru shared knowledge
- The Universe of network-accessible information
18Difference between RDB and XDB
- Data
- Table vs. XML
- Modeling
- Logical Model
- ER vs. XML
- Physical Model
- Interface
- SQL vs. XQuery
- Application
- Transaction-based vs. Document-based
19Storing and Retrieving XML Documents
- File System
- BLOB (Binary Large OBject)
- Native XML Databases
- Persistent DOMs (PDOMs)
- Content Management Systems
- Systems for managing fragments of human-readable
documents and include support for editing,
version control, and building new documents from
existing fragments.
20Data oriented vs. Document oriented
- Data oriented
- Documents that use XML as a data transport
- Designed for machine consumption
- Regular structure, fine-grained data, little or
no mixed content - Document oriented
- Designed for human consumption
- Irregular structure, larger grained data , lots
of mixed content
21Two typical examples of XML instances
22Taxonomy of XML Database
- Native XML Database (NXD)
- A database fundamentally designed to store and
manipulate XML data. - Defines a (logical) model for an XML document and
stores and retrieves documents according to that
model. - Has an XML document as its fundamental unit of
(logical) storage, just as a relational database
has a row in a table as its fundamental unit of
(logical) storage. - It is NOT required to have any particular
underlying physical storage model.
23Taxonomy of XML Database
- XML Enabled Database (XEDB)
- A database that has an added XML mapping layer
provided either by the database vendor or a third
party.
24Applications of XML Database
- Corporate information portals
- Membership databases
- Product catalogs
- Parts databases
- Patient information tracking
- Business to business document exchange
25Some related standard
- W3C
- XML Schema
- XPath
- XQuery
- XMLDB ORG
- XMLDB API
- XUpdate
26XML Schema
- The purpose of a schema is to define a class of
XML documents, and so the term "instance
document" is often used to describe an XML
document that conforms to a particular schema.
27XML Schema
- XML Schema is to define and describe a class of
XML documents by using schema constructs to
constrain and document the meaning, usage and
relationships of their constituent parts. - Structure
- Data type
28An example of XML Schema
Double Click ME!
29XPath?
- The primary purpose of XPath is to address parts
of an XML document. - XPath is also designed so that it has a natural
subset that can be used for matching. - XPath models an XML document as a tree of nodes.
- Element nodes
- Attribute nodes
- Text nodes
30Examples of XPath
- Collections element and .
- ./first-name
- Selecting children and descendants / and //
- author/first-name
- bookstore//title
- Collecting element children
- author/
- book//last-name
- Finding an attribute _at_
- _at_style
- price/_at_exchange
31XQuery
- A query language that uses the structure of XML
intelligently can express queries across all
these kinds of data, whether physically stored in
XML or viewed as XML via middleware.
32An example of XQuery
List each publisher and the average price which
is greater than 100 of its books
33XMLDB API
- XMLDB API is being developed by the XMLDB
Initiative to facilitate the development of
applications that function with minimal change on
more then one XML database. - This is roughly equivalent to the functionality
provided by JDBC or ODBC for providing access to
relational databases.
34An example of XMLDB API
Double Click ME!
35XUpdate
- XUpdate is a specification under development by
the XMLDB Initiative to enable simpler updating
of XML documents. - XUpdate gives you a declarative method to insert
nodes, remove nodes, and change nodes within an
XML document.
36An example of XUpdate
Double Click ME!
37Some XML database products
- Commercial
- Tamino
- X-Hive
- Excelon
- Open Source (All Java based)
- Xindice (dbXML Core)
- eXist
- Ozone
38Apache Xindice (dbXML Core)
- Apache Xindice is a database designed from the
ground up to store XML data or what is more
commonly referred to as a native XML database. - At the present time Xindice uses XPath for its
query language and XMLDB XUpdate for its update
language.
39Feature Summary
- Document Collections
- XPath Query Engine
- XML Indexing
- XMLDB XUpdate Implementation
- Java XMLDB API Implementation
- XMLObjects
- Command Line Management Tools
- CORBA Network API
- Modular Architecture
40Collections and XML Object
- The Xindice server is designed to store
collections of XML documents. Collections can be
arranged in a hierarchy similar to that of a
typical UNIX or Windows file system. - A collection is a container for documents and
other collections. - XMLObjects are how dbXML provides server side
dynamic logic. They are roughly equivalent to
stored procedures in a traditional relational
database.
41An Example of Collection Path
- If you had a collection created under 'db' called
my-collection and a collection under that called
my-child-collection the path used when accessing
the my-child-collection collection would be - /db/my-collection/my-child-collection
42Managing Documents The format of Command line
tool
Double Click ME!
43Managing Documents Examples of Command line tool
- Adding a Document With a Given Key
- dbxml add_document -c /db/data/products -f
fx102.xml -n fx102 - Adding a Document Without a Key
- dbxml add_document -c /db/data/products -f
fx102.xml - Retrieving a Document Using an ID
- dbxml retrieve_document -c /db/data/products -n
fx102 -f result.xml - Deleting a document using an ID
- dbxml delete_document -c /db/data/products -n
fx102
44Querying the Database
- Original Document
- Query
- dbxml xpath_query -c /db/data/products -q
/product_at_product_id"120320" - Result
lt?xml version"1.0"?gtltproduct product_id"120320"
gt ltdescriptiongtGlazed Hamlt/descriptiongtlt/produc
tgt
ltproduct product_id"120320 xmlnssrc"http//www
.dbxml.org/NodeSource"
srccol"/db/data/products" srckey"120320"gt
ltdescriptiongtGlazed Hamlt/descriptiongtlt/productgt
45XML Object URI
- The XMLObject, associated document and method to
execute are specified as part of the URI. - http//localhost8080/local/test/document.xml/MyXM
LObject/method?param1value
46XML Object URI cont.
47Auto Linking
- dbXML provides a facility for automating
relational links between managed documents called
AutoLinking. - dbhref attribute
- Define the location of resource
- dbtype attribute
- Define the type of link replace, append, insert
48Example of Auto Linking
49Present Web System
50The Original Goal of Web
- Human communication thru shared knowledge.
Working together - Social efficiency, understanding and scaling
- The Universe of network-accessible information
51The problems of Current Web
- HTML is for presentation only
- Not agent and search engine friendly
- Web Automation is difficult
- Enter, search and click
- Integration is difficult
- Data format is not unified and extensible
52The Web System in the future
53Motivation
- There are too much useless information on
Internet - Solve the problems of search engine on Web
- Find out useful information for users
- Information sharing
54XML-based Information Center
55Four types of users
56The Architecture of XML Storage
57The Architecture of XML Input Tools
58The Data Capture Template Editor
59The Schema Editor for the DB Designer
60The GUI for the Information Provider
61The Input form generated by the Data Capture
Template Processor
62Data Presentation generated by XSL and XSL
Processor
63Epilogue
- XML make the web more automatic
- More and more Internet applications use XML
technology - XML can describe data in a more appropriate way
than using Relational model - XML plays an important role in the database area
- Information sharing using XML would be more
efficient than HTML approach
64Reference
- XML Database Overview
- Oasis XML and Databases, http//www.oasis-open.or
g/cover/xmlAndDatabases.html - XML and Database, http//www.rpbourret.com/xml/XML
AndDatabases.htm - Programming
- Java XML Tutorial, http//java.sun.com/xml/tutoria
l_intro.html - Java World, http//www.javaworld.com
- http//xml.apache.org
- http//jakarta.apache.org