Title: Management of XML Documents in Object-Relational Databases
1Management of XML Documents in Object-Relational
Databases
- Thomas Kudrass Matthias Conrad
- HTWK Leipzig
EDBT-WorkshopXML-Based Data ManagementPrague,
24 March 2002
2Overview
- Motivation
- Object-Relational Database Concepts
- Parsing XML Documents
- XML-to-ORDB Mapping
- Meta-Data
- Special Issues
- Conclusions
3Motivation
- Storing of XML documents in DBMS
- Use existing database technology
- Dealing with complex objects
- XML documents complex objects
- avoid any decomposition
- object-relational database technology good choice
to represent complex objects
4User-Defined Types in ORDB
- Complex Data Types
- Object Type
- Collection Type
- Object References
- Object Views
5Example Object Types
CREATE TYPE Type_Professor AS OBJECT
( PName VARCHAR(80), Subject VARCHAR(120)
) object-valued object table
attribute CREATE TYPE Type_Course AS OBJECT (
CREATE TABLE TabProfessor OF Name
VARCHAR(100), Type_Professor Professor Type_Pr
ofessor )
6Example Collection Types
CREATE TYPE Type_Professor AS OBJECT
( PName VARCHAR(80), Subject VARCHAR(120)
) Array Nested
Table CREATE TYPE TypeVa_ Professor AS
CREATE TYPE Type_TabProfessor AS
VARRAY(5) OF Type_Professor
TABLE OF Type_Professor  CREATE TABLE
TabDept ( DName VARCHAR(80), Professor
Type_TabProfessor ) NESTED TABLE
Professor STORE AS TabProfessor_List
7Example Object References
CREATE TYPE Type_Professor AS OBJECT
( PName VARCHAR(80), Dept VARCHAR(120)
) Â CREATE TABLE TabProfessor OF
Type_Professor  CREATE TYPE Type_Course AS
OBJECT ( Name VARCHAR(200), Prof_Ref REF
Type_Professor ) Â CREATE TABLE TabCourse OF
Type_Course
Reference to objects of object table TabProfessor
8Parsing DTD and XML
XML Document
DTD
Well-Formedness Validity Check
Syntax Check
XML V2 Parser
DTD Parser
XML DOM Tree
DTD DOM Tree
Schema Definition
XML2 Oracle
JDBC / ODBC
DBMS Oracle
91 lt!ELEMENT University
(StudyCourse,Student)gt 2 lt!ELEMENT
Student (LName,FName,Course)gt 3
lt!ATTLIST Student StudNr CDATA REQUIREDgt 4
lt!ELEMENT Course (Name,Professor,CreditPts?)
gt 5 lt!ELEMENT Professor
(PName,Subject,Dept)gt 6 lt!ENTITY cs
Computer Sciencegt 7 lt!ELEMENT LName
(PCDATA)gt 8 lt!ELEMENT FName
(PCDATA)gt 9 lt!ELEMENT Name
(PCDATA)gt 10 lt!ELEMENT CreditPts
(PCDATA)gt 11 lt!ELEMENT PName
(PCDATA)gt 12 lt!ELEMENT Subject
(PCDATA)gt 13 lt!ELEMENT Dept
(PCDATA)gt 14 lt!ELEMENT StudyCourse
(PCDATA)gt
10ObjectBasedMapping
- Modification of the Mapping Algorithm Bourret
- ? No class definitions
- ? Use objects of the DTD tree
111 lt!ELEMENT University
(StudyCourse,Student)gt 2 lt!ELEMENT
Student (LName,FName,Course)gt 3
lt!ATTLIST Student StudNr CDATA REQUIREDgt 4
lt!ELEMENT Course (Name,Professor,CreditPts?
)gt 5 lt!ELEMENT Professor
(PName,Subject,Dept)gt 6 lt!ENTITY cs
Computer Sciencegt 7 lt!ELEMENT LName
(PCDATA)gt 8 lt!ELEMENT FName
(PCDATA)gt 9 lt!ELEMENT Name
(PCDATA)gt 10 lt!ELEMENT CreditPts
(PCDATA)gt 11 lt!ELEMENT PName
(PCDATA)gt 12 lt!ELEMENT Subject
(PCDATA)gt 13 lt!ELEMENT Dept
(PCDATA)gt 14 lt!ELEMENT StudyCourse
(PCDATA)gt
Step 1
- Each Complex Element ? Table
- Each Set-Valued Element ? Table
- Primary Key in each Table
1 lt!ELEMENT University (StudyCourse,Student)gt CR
EATE TABLE TabUniversity ( IDUniversity  2
lt!ELEMENT Student (LName,FName,Course)gt CREATE
TABLE TabStudent ( IDStudent  4
lt!ELEMENT Course (Name,Professor,CreditPts?)gt
CREATE TABLE TabCourse ( IDCourse  5
lt!ELEMENT Professor (PName,Subject,Dept)gt
CREATE TABLE TabProfessor (
IDProfessor CREATE TABLE TabSubject
( IDSubject
121 lt!ELEMENT University
(StudyCourse,Student)gt 2 lt!ELEMENT
Student (LName,FName,Course)gt 3
lt!ATTLIST Student StudNr CDATA REQUIREDgt 4
lt!ELEMENT Course (Name,Professor,CreditPts?)
gt 5 lt!ELEMENT Professor
(PName,Subject,Dept)gt 6 lt!ENTITY cs
Computer Sciencegt 7 lt!ELEMENT LName
(PCDATA)gt 8 lt!ELEMENT FName
(PCDATA)gt 9 lt!ELEMENT Name
(PCDATA)gt 10 lt!ELEMENT CreditPts
(PCDATA)gt 11 lt!ELEMENT PName
(PCDATA)gt 12 lt!ELEMENT Subject
(PCDATA)gt 13 lt!ELEMENT Dept
(PCDATA)gt 14 lt!ELEMENT StudyCourse
(PCDATA)gt
Step 2
Other Elements Attributes ? Table Columns
CREATE TABLE TabCourse ( IDCourse, attrName, at
trCreditPts, CREATE TABLE TabProfessor
( IDProfessor, attrPName, attrDept, CREATE
TABLE TabSubject ( IDSubject, attrSubject,
CREATE TABLE TabUniversity ( IDUniversity, attrS
tudyCourse, Â CREATE TABLE TabStudent
( IDStudent, attrStudNr, attrLName, attrFName,
 CREATE TABLE TblMatrikelNr ( IDMatrikelNr, at
trMNummer,
13Step 3
Relationships between Elements ? Foreign Keys
CREATE TABLE TabUniversity ( IDUniversity
INTEGER NOT NULL, attrStudyCourse
VARCHAR(4000) NOT NULL, PRIMARY KEY
(IDUniversity)) Â CREATE TABLE TabStudent
( IDStudent INTEGER NOT
NULL, IDUniversity INTEGER NOT
NULL, attrStudNr VARCHAR(4000) NOT
NULL, attrLName VARCHAR(4000) NOT
NULL, attrFName VARCHAR(4000) NOT
NULL, PRIMARY KEY (IDStudent), CONSTRAINT
conMatrikel FOREIGN KEY (IDUniversity)
REFERENCES TabUniversity (IDUniversity)) Â ...
14ORDBS Oracle and XML
- Basic Idea
- Generate an object-relational schema from the DTD
- Natural representation of an XML document by
combining user-defined types - Different Mapping Rules
- Simple elements
- Complex elements
- Set-valued elements
- Complex set-valued elements
15XML Attributes Simple Elements
- Elements of PCDATA type and XML attributes
- ? Attributes of the object type
- Domain of Simple Elements
- No type information in the DTD
- numeric vs. alphanumeric?
- length?
- Restrictions of the DBMS (e.g. VARCHAR Oracle
4000 characters) - Mapping of an XML attribute of a simple element
- ? Definition of an object type for both attribute
and element
16XML Attributes Simple Elements
lt!ELEMENT Professor (PName,Subject,Dept)gt lt!ATTLIS
T Professor PAddress CDATA REQUIREDgt lt!ELEMENT
PName (PCDATA)gt lt!ELEMENT Subject
(PCDATA)gt lt!ELEMENT Dept (PCDATA)gt lt!ATTLIST
Dept DAddress CDATA REQUIREDgt
CREATE TABLE TabProfessor OF
Type_Professor CREATE TYPE Type_Professor AS
OBJECT ( attr PAddress VARCHAR(4000),
attrPName VARCHAR(4000), attrSubject
VARCHAR(4000), attrDept Type_Dept) CREA
TE TYPE Type_Dept AS OBJECT ( attrDept
VARCHAR(4000), attrDAddress VARCHAR(4000))
Â
17Complex Elements
Nesting of elements by composite DB object types
CREATE TABLE TabUniversity ( attrStudyCourse
VARCHAR(4000), attrStudent Type_Matrikel
) Â CREATE TYPE Type_Student AS OBJECT
( attrStudNr VARCHAR(4000), attrLName VARCHAR(40
00), attrFName VARCHAR(4000), attrCourse
Type_Vorlesung ) Â CREATE TYPE Type_Course AS
OBJECT ( attrName VARCHAR(4000), attrProfessor T
ype_Professor, attrCreditPts VARCHAR(4000)) Â C
REATE TYPE Type_Professor AS OBJECT
( attrPName VARCHAR(4000), attrSubject
VARCHAR(4000), attrDept VARCHAR(4000))
INSERT INTO TabUniversity VALUES ( Computer
Science' , Type_Student('23374','Conrad','Matthi
as', Type_Course(Databases II,
Type_Professor(Kudrass ,
Database Systems',
Computer Science), '4')))
SELECT u.attrStudent.attrLname FROM
TabUniversity u WHERE u.attrStudent.attrCourse.att
rProfessor.attrPName Kudrass'
18Set-Valued Elements
- Multiple Occurrence (in DTD) marked by or
- DBMS Restrictions
- collection type applicable to set-valued elements
with text-valued subelements, e.g. ARRAY OF
VARCHAR - collection type not applicable to set-valued
elements with complex subelements - subelements may be set-valued again
- Solutions
- use newer DBMS releases (e.g. Oracle 9i)
- model relationships with object references
19Set-Valued Elements
lt!ELEMENT University (StudyCourse,Student)gt
Reference to University Objects
CREATE TYPE Type_Student AS OBJECT (
attrJahrgang VARCHAR(4000) ,
attrUniversity REF Type_University
) Â CREATE TABLE TabStudent OF
Type_Student  CREATE TYPE Type_University AS
OBJECT( attrStudyCourse
VARCHAR(4000)) Â CREATE TABLE TabUniversity OF
Type_University
- Set-valued element Student
- Modeling in object type Type_Student with a
reference - to objects of the table TabUniversity
20Set-Valued Elements
CREATE TYPE TypeVA_Course AS VARRAY(100) OF
Type_Course CREATE TYPE TypeVA_Professor AS
VARRAY(100) OF Type_Professor CREATE TYPE
TypeVA_Subject AS VARRAY(100) OF
VARCHAR(4000) CREATE TABLE TabUniversity
( attrStudyCourse VARCHAR(4000), attrStudent
Type_Matrikel ) Â CREATE TYPE Type_Student AS
OBJECT ( attrStudNr VARCHAR(4000), attrLName VAR
CHAR(4000), attrFName VARCHAR(4000), attrCourse
Type_Vorlesung ) Â CREATE TYPE Type_Course AS
OBJECT ( attrName VARCHAR(4000), attrProfessor T
ype_Professor, attrCreditPts VARCHAR(4000)) Â C
REATE TYPE Type_Professor AS OBJECT
( attrPName VARCHAR(4000), attrSubject
VARCHAR(4000), attrDept VARCHAR(4000))
21Set-Valued ElementsExample
INSERT INTO TabUniversity VALUES ( Computer
Science' , TypeVA_Student (
Type_Student('23374','Conrad','Matthias',
TypeVA_Course ( Type_Course(Databas
es II, TypeVA_Professor (
Type_Professor(Kudrass ,
TypeVA_Subject ( Database
Systems,Operating Systems), Computer
Science)),4), Type_Course(CAD
Intro, TypeVA_Professor (
Type_Professor(Jaeger ,
TypeVA_Subject ( CAD,CAE),
Computer Science)),4),
...)), Type_Student(00011',Meier',Ralf', )
) ...)
22Dealing with Null Values
- Restrictions with NOT NULL constraints in
object-relational DB schema - NOT NULL constraints in table - not in object
type! - NOT NULL constraints not applicable to collection
types - Object-valued attributes
- use CHECK constraints for NOT NULL
- Loss of DTD semantics DTD in the database
23Dealing with CHECK Constraints
lt!ELEMENT Course (CName, Address?)gt lt!ELEMENT
Addresse (Street, City?)gt    CREATE TYPE
Type_Address AS OBJECT ( attrStreet VARCHAR(4000)
, attrCity VARCHAR(4000)) Â CREATE TYPE
Type_Course AS OBJECT ( attrName VARCHAR(4000),
attrAddress Type_Address) Â CREATE TABLE
TabCourse OF Type_Course ( attrName NOT
NULL, CHECK (attrAdresse.attrStrasse IS NOT
NULL))
 // ORA-02290 Desired error message 1.
INSERT INTO TabCourse ( VALUES (CAD
Intro,Type_Address
(NULL,Leipzig) Â // ORA-02290
Undesired error message 2. INSERT INTO
TabCourse ( VALUES ('RN', NULL)
24Meta-Data about XML Documents
- Unique DocumentID for each Document
- Prolog Information
- Document Location (URL)
- Name Space
- Element vs. Attribute
25Naming Conventions for DB Objects
- Rules
- TabElementname ? Table Name
- Type_Elementname ? Object Type Name
- TypeVa_Elementname ? Array Name
- No Conflicts with Keywords
- Introduction of a Schema ID
- Naming Rule
- SchemaID Naming Convention Name
CREATE TYPE DTD01_Type_University
CREATE TYPE DTD02_Type_University AS OBJECT (
AS OBJECT (
attrStudyCourse VARCHAR(4000) )
attrRegister VARCHAR(4000) )
26Conclusions Advantages
- Non-atomic domains possible
- Natural representation of XML Documents
- Nesting of any complexity possible
- Simple queries by using dot notation
- Using object references to represent
relationships (OIDs)
27Conclusions Drawbacks
- Mapping Deficiencies
- Possible restrictions of element types in
collections - No adequate mapping of NOT NULL constraints
- Loss of Information
- Prolog, Comments, Processing Instructions, Prolog
- Entity References
- Attribute vs. Element ?
- Schema Evolution
- Modification of DTD ? Modification of DB
- Type Information
- Target type VARCHAR - not sufficient!
28Outlook
- Graph-based creation of a schema
- Source XML Schema
- Use CLOB datatype
- Enhance Meta-Schema
- Comments, Processing Instructions and their
position in document - Entity references and their substitution text