Title: XML to Relational Mapping
 1XML to Relational Mapping 
 2XML-to-Relational Mapping
XML Translation Layer
Tuples
Relational Database System 
 3Where/How to Store XML Data ?
- File system 
- OODBMS 
- Semistructured DBMS Lore, etc. 
- XML DBMS eXcelon, Tamino, etc. 
- RDBMS/ORDBMS 
- CLOB(Character Large Object) column 
- Decomposing (Shredding) into tuples 
- XML Type
4Approaches to XML Data Storage Using an RDBMS
- Predefined set of table schema 
- Automatic generation of table schema for given 
 DTD
- Data mining approach
5Predefined Schema
- simple ad-hoc schemes 
- requires no input by the user or by the system 
 administrator
- works in the absence of type (e.g., DTD) 
- does not involve any analysis of the XML data 
- many variations possible 
- Edge-inlining 
- XRel 
- XParent
6Predefined SchemaFlor99 
 7XML Data Model 
- Ordered  labeled graph 
- Each XML element is represented by a node in the 
 graph the node is labeled with the oid of the
 XML object.
- Element-subelement relationships are represented 
 by edges in the graph and labeled by the name of
 the subelement.
- In order to represent the order of subelements of 
 an XML object, the outgoing edges of a node in
 the graph are also ordered.
- Values (e.g., strings) of an XML document are 
 represented as leaves in the graph.
8Example XML Data 
 9XML Data Model 
 10Mapping XML Data into Relational Table
- Mapping Edges 
- Edge Approach 
- Binary Approach 
- Universal Table Approach 
- Mapping Values 
- Separate Value Tables 
- Inlining 
- 3 x 2  6 possible mapping schemes 
11Edge Approach
- Store all edges of the XML document in a single 
 table Edge Table
Edge (source, ordinal, name, flag, target) 
 12Edge Approach (Contd)
- The index on the source column is useful for 
 forward traversals needed to reconstruct a
 specific object given its oid.
- The index on name, target is useful for 
 backward traversals
-  e.g.) find all objects that have a child 
 named John.
13Binary Approach
- Group all edges with the same label into one 
 table
- This approach corresponds to a horizontal 
 partitioning of the Edge Table, using name as the
 partitioning attribute.
14Universal Table Approach
- Generate a single Universal Table to store all 
 the edges.
- The Universal Table corresponds to the result of 
 a full outer join of all Binary Tables.
- The Universal Table has many fields which are set 
 to NULL.
- This Universal Table is denormalized.
15Separate Value Tables
- Store values in the separate Value Tables. 
- A separate Value Table for each conceivable data 
 type
- e.g., integer, date, ref, string, etc.
16Inlining 
- Store values and attributes in the same table 
- e.g., binary  inlining approach 
17Modified Edge-Inlining Approach
- XML Data Model ordered, labelled tree 
- Node id (i.e., source)  assigned in preorder 
 traversal of XML tree
- 1 Table 
- Node id (source) 
- Tag name (name) 
- Text (PCDATA) 
- Parent Node id (tagging) 
- Path (optional) 
- Type (element or attribute) 
- Level 
- Doc id
18Automatic Schema Generation out of DTDShan 99
- Assumption 
- XML document conforms to a scheme (DTD) 
- Transformation Approach 
- Three Techniques 
- Basic, Shared, Hybrid Inlining 
- DTD ? DTD Graph ? Element Graph ? Table Schema
19DTD to Relational Schema
- Naïve Approach 
- Each Element gt Relation 
- Each Attribute of Element gt Column of Relation 
- Connect elements using foreign keys 
- Fragmentation problem 
- Too many relations 
- Requires many joins in query processing
20Naïve Approach  Example
lt!ELEMENT author (name, address)gt lt!ATTLIST 
author id ID REQUIREDgt lt!ELEMENT name 
(firstname?, lastname)gt lt!ELEMENT firstname 
(PCDATA)gt lt!ELEMENT lastname (PCDATA)gt lt!ELEME
NT address ANYgt
author (authorID integer, id string) name 
(nameID integer, authorID integer) firstname 
(firstnameID integer, nameID integer, value 
string) lastname (lastnameID integer, nameID 
integer, value string) address (addressID 
integer, authorID integer, value string) 
 21Inlining
- Inlining 
- solves fragmentation problem 
- Basic inlining as many descendents of an element 
 as possible into a single relation
- Shared inlining the nodes with an in-degree of 1 
 (DTD graph)
- Hybrid 
- Basic  Shared 
- inlining the nodes with an in-degree gt 1 (DTD 
 graph)
- Issues 
- set-valued attribute 
- recursion
22DTD
? 
 23DTD Graph 
 24Element Graph for editor 
 25Basic Inlining Technique 
 26Shared Inlining Technique 
 27Hybrid Inlining Technique
- Same as Shared Inlining except 
- inlines some elements that are not inlined in 
 Shared
- inlines elements with in-degree greater than one 
- that are not recursive or 
- reach through a  node
28Data Mining Approach
- Data Pattern Analysis 
- Wang  Lis Semistructured Data Mining Algorithm 
- STORED(Semistructured To Relational Data) 
- a declarative language for description of 
 XML-to-Relational Mapping
- Overflow Mapping 
- definition for the objects not mapped to 
 relational data
- overflow graph 
- stored in the object repository of semistructured 
 data
29Data Pattern Analysis
STORED Mapping
XML Data
Algo
Overflow Mapping