Title: XML In An RDBMS World
1XML In An RDBMS World
- Michael D. Thomas
- mdthomas_at_ibiblio.org
2Data, data, data
- In general, computing is never far from data
- XML, RDBMSes deal with data play in the same
very large playground - Extensive overlap between RDBMSes and XML
- Goal Learn how to choose between RDBMS vs. XML,
how to integrate the two technologies
3First, some pragmatism
- When all you have is a hammer, everything looks
like a nail. - What if all you have is a hammer?
- When all you have is a hammer, everything must
look like a nail. - Best to have XML and RDBMS in your toolset
- Architectural purity is often impossible
- But we should know when what we are
compromising
4Tonights Agenda
- Compare XML and SQL RDBMS Technologies
- Understand Their Sweet Spots
- Examine Their Use Through Studies of
Anti-Patterns and Patterns Of Usage - Discuss XQuery and XML Storage In The RDBMS
5Topics
- Defining The Playground XML vs. RDBMS
- Anti-Pattern XML As Transactional Data Store
- Data Stores The ACID Test
- Anti-Pattern Document Storage The RDBMS
- XQuery RDBMS Doc Storage
- Anti-Pattern XML Declarative Languages
Gratuitous Encapsulation Of SQL - Understanding Declarative Languages, Such As SQL
- Anti-Pattern XML Meta-data The Overly Abstract
Database - Managing The Split Between XML-based
RDBMS-based Data Models In The Same Application
6XML As Database Anti-Pattern
- Ill use an XML file as a database
- Advantages Simple, Cheap
- An XML Schema defines the metadata of the
database - Data is right on the filesystem, not hidden in
a complicated SQL RDBMS
7XML As Database Problems
- Updates are hard
- Have to re-invent concurrency handling multiple
users changing data at the same time - No constraint checking
- No optimization, such as indexes, caching, etc.
8Organization of Data
- Been formalizing the storage information for a
long time - Age of computing called for new approaches
- Relational model highly successful
- XML is relatively new
- Good for loosely structured data documents --
and data transmission
9Transmission Interoperability
- Electronic transmission is newer than electronic
storage - With the growth of the Internet, transmission of
data is exploding - No Interoperability problems in 1950 lots of
interoperability problems now - XML is an important standard for transmission and
interoperability
10Vocabulary
- Datastore Anything that stores data
- SQL Relational Database A database that
organizes info in tables and adheres to
relational theory - XML 1) eXtensible Markup Language 2) The XML
standard 3) All or part of an XML document
11Vocabulary
- XML Database A database specialized for the
storage of XML documents - Object Database A database that stores objects
- Object-Relational Database A relational
database with an extensible type system
12Data Centric vs. Document Centric
- Data Centric approach the datastore is focused
on handling highly structured, fine grained data.
Favors the relational model - Document centric approach the datastore is
focused on handling semi-structured data, such as
web pages, books, etc. Favors XML
13Data Centric vs. Document Centric
- Best queries are largely a result of structuring
the data well (Messy desk vs. organized file
cabinet) - Documents are semi-structured data with ad hoc
structures - The overhead of defining rigorous structure for
each type of document increases the overall cost
of management - (Would still give you the best queries)
14Storage vs. Transmission
- Data Storage and Data Transmission are two
different concepts - XML is very strong for interoperable transmission
- Can store by writing XML to a file
- You can transmit relational data by exporting a
few tables and ftping, but isnt a strong solution
15Datastore Application Stack
16XML/Relational Comparison
Relational/SQL XML
Metadata Definition Create Table Define XML Schema (Optional)
Data Persistence Insert Update Create XML Document
Data Query Select XPath, XQuery
17Application With XML Transmissions
18Three Data Models
- Persistent Model how data is stored
- Active Model (Object Model) how data is
arranged when it is being manipulated by a
program, usually written in an imperative
language - Presentation/Transmission Model how data is
transmitted, usually as XML
19Datastore Basics
- Any Datastore must tackle the following issues
- Concurrency
- Transactions the ACID test
- Locking
- Joins
- Normalization
- Administration Issues
20Concurrency
- Datastores support concurrency if multiple users
can access the same datastore at the same time - Datastores must not allow the same data to be
modified at the same time
21Transactions
- Atomicity No matter how complex, a transaction
is atomic and indivisible. Transactions are all
or nothing. - Consistency Transactions must leave the
database in a consistent state, i.e., consistent
with the rules - Isolation Transaction is isolated from other
transactions - Durability The effects of a transaction persist
22Isolation levels
- TRANSACTION_READ_UNCOMMITTED Dirty reads,
non-repeatable reads, phantom reads - TRANSACTION_READ_COMMITTED
- Non-repeatable reads, phantom reads
- TRANSACTION_REPEATABLE_READ
- phantom reads
- TRANSACTION_SERIALIZABLE
23Locking
- Data must be locked for transactions to be
isolated - Locking is both a datastore and an application
concern - How much extraneous data is locked? (Page level
locking, document level locking) - Pessimistic locking prevents reading of locked
data - Optimistic locking generates an error when
inappropriate data updates are attempted
24Joins
- A Join joins the data between two different data
entities - E.g., SELECT from emp, dept WHERE emp.deptno
dept.deptno - Joins are the cornerstone of SQL
- XPath doesnt do joins between XML documents!
- XQuery, others can
25Normalization
- Normalization organizing data to minimize
redundancy - Normalized data is easier to maintain and easier
to understand conceptually - In relational db design, normalized DBs need to
be denormalized for performance reasons - XML docs can also be normalized, but not as much
support for tying elements together - Normalization is important when designing, less
crucial at implementation
26Administrative Issues
- Includes backup recovery, installation,
upgrades, optimization, maintenance, etc. - In general
- Bigger is better (economies of scale, 24x7
support) - More popular, more standard is better (law of
increasing returns) - Whatever is already working in your organization
is better (no need to hire, re-train
administrators) - Existing DB vendors have a huge advantage
27Datastores
- Different types
- Relational Database highly structured data such
as account balances, inventory quantities, etc.) - Document Database used to store documents,
probably in XML format - The same DB can serve as both, e.g., Oracle
28Overview Of Relational Databases
- The Relational Model
- SQL
- Entity Relationship Diagrams
29Relational Model -- structure
- Data is grouped in tables
- Tables have columns and rows
- Columns are fairly fixed the set of columns
shouldnt change much (if at all) over the life
of a table - A table can have any number of rows
- Rows change constantly
30Relational model primary key
- In general, a table should have one or more
columns defined as the primary key - The primary key is unique and non-null
- Usually only one column
- Can consist of more than one column (composite
primary key)
31Relational model foreign key
- A foreign key describes a relationship between
two tables - The foreign key column of tableA points to a
column in tableB - TableB is said to be the parent of tableA
- Often, the foreign key points at a primary key
32SQL
- Three types
- Query SELECT FROM emp WHERE deptno10 ORDER BY
ename - Data Modification Language (DML) Update, Insert
- Data Definition Language (DDL) Create Table
33Joins
- A Join is a Cartesian Cross-Product
- SELECT count() FROM emp
- SELECT count() FROM emp
- SELECT count() FROM emp, dept
- SELECT ename, emp.deptno, dept.deptno, dname FROM
emp, dept - SELECT ename, emp.deptno, dept.deptno, dname FROM
emp, dept WHERE emp.deptno dept.deptno
34Joins
- SELECT ename, dname FROM emp, dept WHERE
emp.deptno dept.deptno
35Entity Relationship Diagrams
- Used to create a map of your data
- Describes tables, attributes of tables and
relationships between tables - Come at it from two directions lay out the
entities, assign the attributes group the
attributes into entities
36XML vs. SQL Tables
- XML order matters! (Rows in a relational table
are unordered) - XML is a tree structure
- XML documents tend to be semi-structured, SQL
tables are highly structured - SQL tables arent as flexible or inter-changable
- XML joins arent straightforward
- An XML document/element doesnt serve multiple
purposes as well as DB schema/table
37Anti-Pattern XML Document Storage As A SQL BLOB
- Need to store XML in a database
- Better than storing in a filesystem
- Make a BLOB (Binary Large Object) and store a
document as an element in a row of a table - Problem cant query contents of document without
extracting it from the db! - For simple queries, might have to extract all of
the documents into the application very
inefficient
38Anti-Pattern One-off SQL Schema For A Particular
XML Document
- BLOB storage is bad, so why not shred the
document across multiple DB tables? - I.e., for XML elements named Dept make a Dept
table, Emp elements make a Emp table,
describe the hierarchy with foreign key
constraints - Query performance is much, much better probably
better than XPath against XML as a file
39Problems
- A lot of work! Have to do this for every XML
schema - Hard! XML schemas are inherently more flexible
than SQL schemas. Some mappings can be difficult - Negates flexibility of XML
- Not as learnable Programmers have to learn to
query your SQL schema, not just XPath and XML
Schema
40Anti-Pattern Developing A General Shredding
Solution For All Of XML
- Your Application has several XML schemas, and
its time-consuming to develop shredding for
all of them - So, you try to do a more generalized shredding
- Is possible, but is a very horizontal problem.
You probably wont get the time to solve it
completely. - Still might present learnability problems
41XML Datastore Architectures
- XML Views Of Relational Data
- Relational Wrappers Of XML
- Independent Storage of XML Documents (Native XML
Database) - Text Storage Of XML In RDBMS
- XML Shredding Across Relational Tables
- Storing XML as Objects in Object- Relational DBs
(Oracle XMLType) - Everything is XML (XQuery approach)
42Vocabulary XML Collections
- XML Collection is a collection of XML documents
- A row is to a table as an XML document is to an
XML collection.
43XML Derived From RDBMS
- Data exists naturally as relational data
- Needs to be represented as an XML document for
some reason - The derived XML is usually used for transmission
44Independent XML Store
- A specialized database is used to store XML
documents - Typically, an application will either have two
datastores relational and XML or relational
data will be stored as XML - In the dual datastore case, the application code
has to join the different data sets
45Text Storage Of XML In Databases
- XML is stored in a column of a relational table
- Allows you to mix structured and semi-structured
approaches - However, hard to query against the XML directly
46XML Shredding Across Tables
- An XML document is stored across many tables
- Is possible to use SQL queries against the
documents parts - Vendors can implement an XPath-to-SQL translator
- Can structure the tables based on a schema
- A pain to handle yourself
47Exercise
- Shred an XML schema across relational tables
48XMLType
- Object-Relational databases allow you to define
your own types - You could define an Address object, define
functions for the object, and store instances of
the object in a column in the database - Oracle defines an object-relational type,
XMLType, for the storage of XML documents in the
database - SELECT e.poDoc.getClobval() AS poXML FROM
po_xml_tab e WHERE e.poDoc.existsNode('/POPNAME
"po_2"') 1 - Shredding is managed encapsulated for you
49Oracle 9i XMLType, Text
50Everything Is XML (XQuery)
- With XMLType, Oracle says that everything fits in
to the Object-Relational realm - XQuery says that everything can be represented
and queried as XML - Relational data is derived from RDBMS using
SQL/XML
51XQuery
52SQL/XML
- select xmlelement("emp",
- 'Employee ' ,
- xmlelement( "name", e.job '
' e.ename), - ' was hired on ',
- xmlelement("hiredate",
e.hiredate)) as result - from emp e
- --------------------------------------------------
---------------------------- - ltempgtEmployee
- ltnamegtCLERK SMITHlt/namegt
- was hired on
- lthiredategt17-DEC-80lt/hiredategt
- lt/empgt
53Query Soup
- SQL queries relational data
- XPath queries a particular XML document
- SQL/XML a standard for deriving XML from
relational data - XQuery queries a collection of XML documents
and uses XPath to query particular XML documents
54Query Soup
- There are other XML query languages, such as
XML-QL and Quilt - XSLT is a transformation, not a query, language
- XSQL MS XML/SQL are wrapper languages
55Anti-Pattern Encapsulating SQL With XML
- General Rule Encapsulation is good, but
encapsulating good things is bad. - SQL is good
- (Also, HTML is good)
- Why hide SQL with XML?
56XML encapsulating SQL
- SELECT emp.ename, dept.dname FROM emp, dept WHERE
emp.deptno dept.deptno - ltsql-querygt
- ltfieldsgt
- ltfieldgt ename lt/fieldgt
- ltfieldgt dname lt/fieldgt
- lt/fieldsgt
- lttablesgt
- lttablegt emp lt/tablegt
- lttablegt dept lt/tablegt
- lt/tablesgt
- ltjoinsgt
- ltjoingtemp.deptno dept.deptno lt/joingt
- lt/joinsgt
- lt/sql-querygt
57Same concept in XQueryX
- XQueryX is an XML representation of an XQuery
The result is not particularly convenient for
humans to read and write, but it is easy for
programs to parse
58XQuery Example
- XQuery
- for b in doc("http//www.bn.com/bib.xml")/bib/b
ook where - b/publisher "Addison-Wesley" and
- b/_at_year gt 1991 return ltbook year
- " b/_at_year "gt b/title lt/bookgt
59XQueryX Version
- lt?xml version"1.0" encoding"UTF-8"?gt
lt?xml-stylesheet type"text/xsl"
href"xqueryx.xsl"?gt ltxqxmodule
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xmlnsxqx"http//www.w3.org/2003/12/XQueryX"
xsischemaLocation"http//www.w3.org/2003/12/XQue
ryX xqueryx.xsd"gt ltxqxmainModulegt
ltxqxqueryBodygt ltxqxexpr xsitype"xqxelementCon
structor"gt ltxqxtagNamegtbiblt/xqxtagNamegt
ltxqxelementContentgt ltxqxexpr xsitype"xqxflwrE
xpr"gt ltxqxforClausegt ltxqxforClauseItemgt
ltxqxtypedVariableBindinggt ltxqxvarNamegtblt/xqxvar
Namegt lt/xqxtypedVariableBindinggt ltxqxforExprgt
ltxqxexpr xsitype"xqxpathExpr"gt ltxqxexpr
xsitype"xqxfunctionCallExpr"gt
ltxqxfunctionNamegtdocumentlt/xqxfunctionNamegt
ltxqxparametersgt ltxqxexpr xsitype"xqxstringCon
stantExpr"gt ltxqxvaluegtbib.xmllt/xqxvaluegt
lt/xqxexprgt lt/xqxparametersgt lt/xqxexprgt
ltxqxstepExprgt ltxqxxpathAxisgtchildlt/xqxxpathAxis
gt ltxqxelementTestgt ltxqxnodeNamegt
ltxqxQNamegtbiblt/xqxQNamegt lt/xqxnodeNamegt
lt/xqxelementTestgt lt/xqxstepExprgt ltxqxstepExprgt
ltxqxxpathAxisgtchildlt/xqxxpathAxisgt
ltxqxelementTestgt ltxqxnodeNamegt
ltxqxQNamegtbooklt/xqxQNamegt lt/xqxnodeNamegt
lt/xqxelementTestgt lt/xqxstepExprgt lt/xqxexprgt
lt/xqxforExprgt lt/xqxforClauseItemgt
lt/xqxforClausegt ltxqxwhereClausegt ltxqxexpr
xsitype"xqxoperatorExpr" xqxinfix"true"gt
ltxqxopTypegtANDlt/xqxopTypegt ltxqxparametersgt
ltxqxexpr xsitype"xqxoperatorExpr"
xqxinfix"true"gt ltxqxopTypegtlt/xqxopTypegt
ltxqxparametersgt ltxqxexpr xsitype"xqxpathExpr"
gt ltxqxexpr xsitype"xqxvariable"gt
ltxqxnamegtblt/xqxnamegt lt/xqxexprgt ltxqxstepExprgt
ltxqxxpathAxisgtchildlt/xqxxpathAxisgt
ltxqxelementTestgt ltxqxnodeNamegt
ltxqxQNamegtpublisherlt/xqxQNamegt lt/xqxnodeNamegt
lt/xqxelementTestgt lt/xqxstepExprgt lt/xqxexprgt
ltxqxexpr xsitype"xqxstringConstantExpr"gt
ltxqxvaluegtAddison-Wesleylt/xqxvaluegt \
60XQueryX Version (cont.)
- ltxqxQNamegtpublisherlt/xqxQNamegt lt/xqxnodeNamegt
lt/xqxelementTestgt lt/xqxstepExprgt lt/xqxexprgt
ltxqxexpr xsitype"xqxstringConstantExpr"gt
ltxqxvaluegtAddison-Wesleylt/xqxvaluegt lt/xqxexprgt
lt/xqxparametersgt lt/xqxexprgt ltxqxexpr
xsitype"xqxoperatorExpr" xqxinfix"true"gt
ltxqxopTypegtgtlt/xqxopTypegt ltxqxparametersgt
ltxqxexpr xsitype"xqxpathExpr"gt ltxqxexpr
xsitype"xqxvariable"gt ltxqxnamegtblt/xqxnamegt
lt/xqxexprgt ltxqxstepExprgt ltxqxxpathAxisgtchildlt/x
qxxpathAxisgt ltxqxattributeTestgt ltxqxnodeNamegt
ltxqxQNamegtyearlt/xqxQNamegt lt/xqxnodeNamegt
lt/xqxattributeTestgt lt/xqxstepExprgt lt/xqxexprgt
ltxqxexpr xsitype"xqxintegerConstantExpr"gt
ltxqxvaluegt1991lt/xqxvaluegt lt/xqxexprgt
lt/xqxparametersgt lt/xqxexprgt lt/xqxparametersgt
lt/xqxexprgt lt/xqxwhereClausegt ltxqxreturnClausegt
ltxqxexpr xsitype"xqxelementConstructor"gt
ltxqxtagNamegtbooklt/xqxtagNamegt
ltxqxattributeListgt ltxqxexpr xsitype"xqxattrib
uteConstructor"gt ltxqxattributeNamegtyearlt/xqxattr
ibuteNamegt ltxqxattributeValuegt ltxqxexpr
xsitype"xqxpathExpr"gt ltxqxexpr
xsitype"xqxvariable"gt ltxqxnamegtblt/xqxnamegt
lt/xqxexprgt ltxqxstepExprgt ltxqxxpathAxisgtchildlt/x
qxxpathAxisgt ltxqxattributeTestgt ltxqxnodeNamegt
ltxqxQNamegtyearlt/xqxQNamegt lt/xqxnodeNamegt
lt/xqxattributeTestgt lt/xqxstepExprgt lt/xqxexprgt
lt/xqxattributeValuegt lt/xqxexprgt
lt/xqxattributeListgt ltxqxelementContentgt
ltxqxexpr xsitype"xqxpathExpr"gt ltxqxexpr
xsitype"xqxpathExpr"gt ltxqxexpr
xsitype"xqxvariable"gt ltxqxnamegtblt/xqxnamegt
lt/xqxexprgt ltxqxstepExprgt ltxqxxpathAxisgtchildlt/x
qxxpathAxisgt ltxqxelementTestgt ltxqxnodeNamegt
ltxqxQNamegttitlelt/xqxQNamegt lt/xqxnodeNamegt
lt/xqxelementTestgt lt/xqxstepExprgt lt/xqxexprgt
lt/xqxexprgt lt/xqxelementContentgt lt/xqxexprgt
lt/xqxreturnClausegt lt/xqxexprgt
lt/xqxelementContentgt lt/xqxexprgt
lt/xqxqueryBodygt lt/xqxmainModulegt lt/xqxmodulegt
61When To Use Declarative XML
62XML As User Interface
- Development teams often forget to write use cases
for administrators, downstream developers - XML declarative languages must be considered as
ways to provide user interfaces - Documentation, examples important
- Usability, Learnability issues important
63Declarative XML In App. Lifecycle
- Compile-Time Best Performance, Least
Flexibility - Start-Time XML Parsed Once, Good Performance,
Good Flexibility. App must be restarted or
manually refreshed to get flexibility - Run-Time Can Impact Performance, But Great
Flexibility. Web Services Approach
64Start-Time Configuration
65Runtime Configuration
66Anti-Pattern The Overly Abstract Database
- By using XML, you can achieve runtime
configurability - Instead of hard coding SQL table names, do the
same job in XML configuration files - Problem database is so abstract that it is very
hard or impossible to do SQL queries against it
directly - Database is closed just a persistence extension
to the application - The application (and thus, the app dev team)
assumes all responsibility for any work with the
data
67The Split Persistence Data Model
- Applications Have Persistence Data Models
- Persistence Data Model any data stored on disk
- Everything might be in the DB
- Everything might be in XML docs
- Usually, there is some split i.e., 95 in a SQL
DB, but 5 in a properties file - Managing the split between DB-based data and
XML-based data is accidental complexity and can
be a headache
68Traditional Architecture
69Typical Architecture
70Joins between data models
- Can use XQuery to join the two data models
- Could use XPath enhanced SQL to join the two
models - Often, the two models are joined at the object
level
71Databases Have Meta Data
72Databases Have Metadata
- Column names, table names constraints are all
metadata - Defined at DB design time, which is usually app
design time - App is often hard coded against the database meta
data
73XML as DB Metadata
74Pros vs. Cons
- More flexible XML can change at runtime
- More complex at the application level
- Database must be more abstract
- Can make the application harder to extend
- Dont forget a DB table can also serve the same
purpose as an XML meta data doc
75Alternative Tables As Meta Data