Title: Views and Storing XML in Relational Databases
1Views and Storing XML in Relational Databases
- Susan B. Davidson
- University of Pennsylvania
- CIS330 Database Management Systems
- November 11, 2008
Most slide content courtesy of Zack Ives.
2What are views?
- We frequently want to reference data in a way
that differs from the way its stored - XML data ? HTML, text, etc.
- Relational data ? XML data
- Relational data ? Different relational
representation - XML data ? Different XML representation
- Generally, these can all be thought of as
different views over the data - A view is a named query
- It is the DB analog of methods with arguments in
PL - We will start with views in the same model
(relational, XML), and then address moving XML
data to a relational format.
3Views in SQL
- A view is first created (CREATE VIEW).
- We then use the name of the view to invoke the
query (treating it as if it were the relation it
returns)
Using the view SELECT FROM V, R WHERE V.B5
AND V.CR.C
Creating a view CREATE VIEW V(A,B,C) AS SELECT
A,B,C FROM R WHERE R.A 123
This expands in the query processor to SELECT
FROM (SELECT A,B,C FROM R WHERE
R.A 123) AS V, R WHERE V.B 5 AND V.C
R.C
4Views in XQuery
- In XQuery, views are defined as functions.
Creating a view declare function V() as
element(content) for r in
doc(R)/root/tree, a in r/a, b in r/b,
c in r/c where a 123 return
a, b, c
Using a view for v in V()/content, r in
doc(R)/root/treewhere v/b r/breturn v
5Why are views useful?
- In addition to describing transformations from
one schema to another, views have several common
uses - Providing security/access control
- We can assign users permissions on different
views - Can select rows or project out columns so we only
reveal what we want! - Can be used as relations in other queries
- Allows the user to query things that make more
sense
6Virtual vs. Materialized Views
- A virtual view is a named query that is actually
re-computed every time as shown on the previous
slide - CREATE VIEW V(A,B,C) AS
- SELECT A,B,C FROM R WHERE R.A 123
- A materialized view is one that is computed once
and its results are stored as a table - Think of this as a cached answer
- These are incredibly useful!
- Techniques exist for using materialized views to
answer other queries
SELECT FROM V, RWHERE V.B 5 AND V.C R.C
7Example
- Suppose V is defined as select from R where
R.A 123. If R is initially - Then the result of evaluating select from V
is - Suppose we delete (123, 8, World) from R. Then
the result of select from V should be
A B C 123 5
Atlas 456 10 Guide 123 8
World
A B C 123 5
Atlas 123 8 World
A B C 123 5
Atlas
8Views Should Stay Fresh
- Views behave, from the perspective of a query
language, exactly like base relations - But theres an association that should be
maintained - If tuples change in the base relation, they
should change in the view (whether its
materialized or not) - If tuples change in the view, that should reflect
in the base relation(s)
9View Maintenance and the View Update Problem
- There exist algorithms to incrementally recompute
a materialized view when the base relations
change - We can try to propagate view changes to the base
relations - This is not hard for the previous example.
- However, there are lots of views that arent
easily updatable - We can ensure views are updatable by enforcing
certain constraints (e.g., no aggregation),but
this limits the kinds of views we can have
R
S
R?S
10Publishing XML Views of Relational Data
- It can be done with SQL/XML, an extension of the
SQL standard (see
http//sqlxml.org) - (Dont confuse with old and lame SQLXML for SQL
Server.) - select xmlelement(name Customer,
- xmlelement(name CustID, c.CustId),
- xmlelement(name CustName, c.CustName),
- xmlelement(name City, c.City) )
- from customers c
- where c.Status preferred
- This is a very valuable tool for B2B
(business-to-business) data exchange. Available
in Oracle, DB2, SQL Server.
11Embedding XML in a Relational Database
- Straightforward solution add attributes of type
XML. Promoted by the same SQL/XML standard. - create table clients(
- id int primary key not null,
- name varchar(50),
- status varchar(10),
- contactinfo xml )
- Available in Oracle, DB2, SQL Server. Syntax may
vary. Above syntax is from DB2.
12Querying Relationally Embedded XMLwith SQL/XML
- SQL/XML standard specifies xmlexists and xmlquery
for embedding XPath and XQuery into SQL. DB2
syntax. - select name from clients
- where xmlexists('c/Client/Addresszip"95116"'
- passing clients.contactinfo as
"c") - select name,
- xmlquery('for e in c/Client/email return
e' - passing contactinfo as "c")
- from clients
- where status 'Gold
13SQL Extensions for XQuery (DB2)
- for y in
- db2-fnxmlcolumn('CLIENTS.CONTACTINFO')/Client/A
ddress - return y
- for y in db2-fnsqlquery('select contactinfo
- from clients
- where status''Gold''
- )/Client
- where y/Address/city"San Jose"
- return (
- if (y/email) then y/email
-
- else y/Address )
14More Uniform Ways for Storing XML in
an RDBMS Mapping Relational ? XML
- We know the following
- XML data is tree-like
- XML is SEMI-structured
- Theres some structured stuff, especially if it
follows a DTD - There is some unstructured stuff, eg. text
- Issues relate to describing XML structure,
particularly parent/child in a relational
encoding - Relations are flat
- Tuples can be connected via foreign-key/primary-
key links
15The Simplest Way to Encode a Tree
- Suppose we had
XYZ
14
- If we have no IDs, we CREATE values
- BinaryLikeEdge(key, label, type, value, parent)
What are shortcomings here?
16Improved Edge Approach (Florescu/Kossmann)
- Consider order, typing separate the values
- Vint(vid, value)
- Vstring(vid, value)
- Edge(parent, ordinal, label, flag, target)
17Inlining Techniques
- But can we do better if we know the schema?
- Folks at Wisconsin noted we can exploit the
structured aspects of semi-structured XML - Often the DTD has a lot of required (and often
singleton) child elements - Book(title, author, publisher)
- Recall how normalization worked
- Decompose until we have everything in a relation
determined by the keys - But dont decompose any further than that
- So we should try not to decompose XML beyond the
point of singleton children
18Details (Shanmugasundaram et al. )
- Start with DTD, build a graph representing
structure
tree
?
_at_id
content
_at_id
i-content
sub-content
- The edges are annotated with ?, indicating
optional, repetition of children (respectively) - They simplify the DTD to figure this out
19Building Schemas
- Now, they tried several alternatives that differ
in how they handle elements w/multiple ancestors - Can create a separate relation for each path
- Can create a single relation for each element
- Can try to inline these
- For tree examples, these are basically the same
- Combine non-set-valued things with parent
- Add separate relation for set-valued child
elements - Create new keys as needed
author
book
name
20Schemas for Our Example
- TheRoot(rootID)
- Content(parentID, id, _at_id)
- Sub-content(parentID, varchar)
- I-content(parentID, int)
- If we suddenly changed DTD to content(sub-content, i-content?) what would
happen?
21XQuery to SQL
- Inlining method needs external knowledge about
the schema - Needs to supply the tags and info not stored in
the tables - We can actually directly translate simple XQuery
into SQL over the relations instead of
reconstructing the XML then querying it.
22An Example
- for X in document(mydoc)/tree/contentwhere
X/sub-content XYZreturn X - The steps of the path expression are generally
joins - Except that some steps are eliminated by the
fact weve inlined subelements - Lets try it over the schema
- TheRoot(rootID)
- Content(parentID, id, _at_id)
- Sub-content(parentID, varchar)
- I-content(parentID, int)
23Native XML Storage
- Storing XML in relations incurs inefficiencies
- Storage techniques optimized for path retrieval
may be better appropriate indexes may play a
role - Techniques are already making it into commercial
systems - Oracle 11g XML DB provides a high-performance,
native XML storage and retrieval technology. - DB2 9 pureXML is a native XML database for your
most demanding XML data server needs.
24Summary
- Weve seen that views are useful things, and are
heavily used within database applications as well
as for data exchange. - Views allow us to store and refer to the results
of a query (materialized vs. virtual views) - Weve seen an example of a view that changes from
XML to relations and weve even seen how such a
view can be posed in XQuery and unfolded into
SQL - We have also seen how to define XML views of
relational data