Title: Use of Native XML Database In TAPoR Project
1Use of Native XML Database In TAPoR Project
- Eric Zhang
- Dec 02, 2003
- TAPoR Project - Alberta
2Data Vs. Document
- Data-centric XML documents focus only on the data
contained inside the document, such as an address
book. The order of the element is not important - Document-centric XML documents not only focus on
the data contained inside, but also on the
structure of the doc, such as the element
ordering information.
3Storing XML using Relational DB
- Mainly for data-centric documents where the data
is inserted into different tables according to
some mapping strategy - ltstudentgt
- ltnamegt Eric lt/namegt
- ltagegt 27 lt/agegt
- lt/studentgt
Table Student
4Problem with Relational DB
- The mapping strategy is hard to define when the
document has a very complex structure - When the document is semi-structured, either a
lot of tables or a table with lots of null
columns is created - The ordering information about the elements,
processing instructions, comments, etc. are lost - When retrieving documents, a lot of multi-table
join queries need to be performed which make the
queries very slow
5Native XML Database
- Store XML document as it is
In Native Database ltPlaygt ltScenegt This is a
sweet story ltPb gradeab/gt happened long
long ltTd nameterrific/gt lt/Scenegt lt/Playgt
A XML File ltPlaygt ltScenegt This is a sweet
story ltPb gradeab/gt happened long long ltTd
nameterrific/gt lt/Scenegt lt/Playgt
6Advantage of using Native XML DB
- Document completes round-trip as an xml
document - Keeps all information, such as the data contained
inside, the ordering of elements, the processing
instructions, comments, etc. - Index can be created based on element, attribute,
etc. to speed up the query process - Most products support standard XPath queries
- Since most native XML DBs usually create an index
for element and attribute, the performance is
better than just querying the files in a file
system
7General Query Process in Native XML DB
- Using XPath query against a xml document or a set
of documents.
Give me the document that has Basic as the
value of attribute start in element ltcontentgt,
which is the child element of ltchaptergt, which is
the child element of ltdocgt. Query db
with /doc/chapter/content_at_startBasic Then
with the returned document or document fragment,
you can perform more processing using API
provided or convert the result to string or a DOM
tree
ltdoc namea.xmlgt ltchaptergt lttitlegt
Oracle lt/titlegt ltabstract count50gt Sth
about oracle lt/abstractgt ltcontent
startBasic endAppendixgt Table, API,
PL/SQl, .. lt/contentgt lt/chaptergt
.. lt/docgt
8Discussion of 4 Systems
- Oracle 9i XML DB (Commercial)
- dbXML (Open Source)
- eXist (Open Source)
- Xindice
9Storage
10Schema/DTD support
11Query
12Update
13Index
14Basic DB Function
15Programming API
16Other Tools
17Support
18Comparison Chart
Very Good 3 Not Bad 2 Uncompetitive
1
19Oracle Vs. eXist
eXist's advantage - Support DTD,
similar to cocoon, using catalog file
- Return result separately even they happen
in a single file - Automatically
indexing, we don't need to worry about index
- Support XMLDB API, which support
SAX - Integrate with Cocoon
- Free to use, easy to install and
manage eXist's disadvantage -
Can only store XML file, not other format
- Only very basic db functions are
provided - Developed only by a
single person, little support and doc
- The expendability and stability couldn't be
foresee Oracle's advantage -
Can store other format file, as well as
relational data - Can save
storage if document is schema-based
- Specific index can be created as you need
- Provide all Oracle's db function
- Provide a big set of XML tools
- Huge collection of doc, support,
articles, even from oracle directly
- The expendability and stability is supposed
to be good Oracle's disadvantage
- Doesn't support DTD - Only
return result as a single hit even
multi-occurrence happened in one file
- Query need to be foreseen so that index can
be created, query without index is very
slow(???) - The JDBC api doesn't
support as much as functions as XMLDB API
- Doesn't support SAX -
Expensive, hard to install, manage
20Conclusion
- All 4 database systems can basically do what we
want -- store XML documents, and query against
XML data. - dbXML and Xindice are the worst in comparison
with the other two, while Oracle 9i and eXist
have their different advantages. However, not all
circumstance can be foreseen right now, so it is
hard to say either Oracle or eXist is definitely
better. - Oracle 9i and eXist are worth further research
about their features, performance, etc. We can
use them parallel for different data collections.
21Performance Comparison