A New Inlining Algorithm for Mapping XML DTDs to Relational Schema - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

A New Inlining Algorithm for Mapping XML DTDs to Relational Schema

Description:

Idea: inline a child c to its parent p if p can contain at most one ... developed a new inlining algorithm that maps a given input DTD to a relational schema. ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 32
Provided by: arth112
Category:

less

Transcript and Presenter's Notes

Title: A New Inlining Algorithm for Mapping XML DTDs to Relational Schema


1
A New Inlining Algorithm for Mapping XML DTDs to
Relational Schema
  • Speaker Shiyong Lu
  • Email shiyong_at_cs.wayne.edu
  • Wayne State University
  • Joint work with Yezhou, Mustafa and Farshad

2
Introduction
  • XML is rapidly emerging on the World Wide Web as
    a standard for representing and exchanging data.
  • The amount of XML documents is increasing each
    day.
  • It is critical to store and query XML documents
    efficiently and effectively.

3
Current approaches of storing and querying XML
documents
  • Native XML repositories, e.g., Software AGs
    Tamino 2, eXcelons XIS 1.
  • XML support enabled by commercial database
    systems such as SQL Server, Oracle, and DB2 in
    which XMLType is introduced.
  • Use RDBMS/ODBMS to store and query XML documents.
    8, 10, 16, 11.

4
Issues of the relational approach
  • XML data model needs to be mapped into the
    relational model
  • XML queries need to be translated into SQL
    queries
  • Query results need to be tagged to XML format.

5
Our contributions
  • We proposed a new inlining algorithm to map DTDs
    to relational schemas.
  • Improvements over the shared-inlining 16
  • Completeness
  • Redundancy elimination for shared nodes
  • Optimizations
  • Efficiency

6
Outline of the talk
  • Introduction of XML DTDs
  • Mapping DTDS to relational schemas
  • Simplifying DTDs
  • Creating and inlining DTD graphs
  • Generating relational schemas
  • An example
  • Conclusions and future work

7
An overview of DTDs A DTD example
  • lt!DOCTYPE memo
  • lt!ELEMENT memo (to, from, date, subject?, body)gt
  • lt!ATTLIST memo security CDATAgt
  • lt!ATTLIST memo lang CDATAgt
  • lt!ELEMENT to (PCDATA)gt
  • lt!ELEMENT from (PCDATA)gt
  • lt!ELEMENT date (PCDATA)gt
  • lt!ELEMENT subject (PCDATA)gt
  • lt!ELEMENT body (para)gt
  • lt!ELEMENT para (PCDATA)gt

8
DTD Document Type Defintion
  • lt!DOCTYPE root-element doctype-declaration...
  • lt!ELEMENT element-name content-modelgt, content
    model , ,, , , ?
  • lt!ATTLIST element-name attr-name attr-type
    attr-default ...gt

9
DTD Document Type Definition (cont)
  • lt!ATTLIST element-name attr-name attr-type
    attr-default ...gtdeclares which attributes are
    allowed or required in which elements attribute
    types
  • CDATA any value is allowed (the default)
  • (value...) enumeration of allowed values
  • ID, IDREF, IDREFS ID attribute values must be
    unique (contain "element identity"), IDREF
    attribute values must match some ID (reference to
    an element)
  • ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION
    just forget these... (consider them deprecated)
  • attribute defaults
  • REQUIRED the attribute must be explicitly
    provided
  • IMPLIED attribute is optional, no default
    provided
  • "value" if not explicitly provided, this value
    inserted by default
  • FIXED "value" as above, but only this value is
    allowed

10
Mapping DTDs to relational schemas
  • Simplifying DTDs
  • Creating and inlining DTD graphs
  • Generating relational schemas

11
Simplifying DTDs
  • A DTD might be very complex due to nesting, e.g.,
    ltELEMENT a ((b, c, d?)?, (e?, f, (g, h?))?)gt
  • A XML query language is concerned about
  • The parent-child relationships between XML
    elements
  • The relative order relationships between siblings
    (add an ordinal attribute to each relation)

12
DTD simplifications rules
  • e ? e
  • e? ? e
  • (e1 en) ? (e1, ,en)
  • (a) (e1, ,en) ? (e1, ,en)
  • (b) e ? e
  • 5. (a) , e, , e, ?,e, ,
  • (b) , e, , e, ?,e, ,
  • (c) , e, , e, ?,e, ,
  • (d) , e, , e, ?,e, ,

13
Example of simplifying a DTD
  • ltELEMENT a ((b, c, d?)?, (e?, f, (g, h?))?)gt
  • simplified to
  • ltELEMENT a (b, c, d, e, f, g, h)gt

14
Creating and inlining DTD graphs
  • We create a DTD graph based on the simplified
    DTD. In the graph, nodes represent XML elements,
    and edges represent operators.
  • Idea inline a child c to its parent p if p can
    contain at most one occurrence of c.
  • Rationale inlined elements will produce a
    relation.

15
Inlining DTD graphs
16
Inlining
  • Case 1 Element a is connected to b by a ,-edge
    and b has no other incoming edges, inlining b to
    a.
  • Case 2 Element a is connected to b by a ,-edge
    but b has other incoming edges, b is a shared
    node, no inlining.
  • Case 3 Element a is connected to b by a -edge,
    no inlining.

17
Inlinable node
  • Definition 2 Given a DTD graph, a node is
    inlinable if and only if it has exactly one
    incoming edge and that edge is a ,-edge.

18
Inlinable tree
  • Given a DTD graph and a node e in the graph,
  • node e and all other inlinable nodes that are
    reachable from e by ,-edge constitute a tree This
    tree is called the inlinable tree for node e
    (rooted at e).

19
Complexity of inlining
  • Theorem 2 (Complexity)
  • Our inlining algorithm can be performed in O(n)
    where n is the number of elements in the input
    DTD.

20
The inlining procedure
21
The inlining procedure (cont)
22
Generating relational schema
  • For each node e, a relation e is generated with
    the following attributes
  • ID is the primary key, and for each XML
    attribute A of e, a corresponding relational
    attribute A is generated with the same name.
  • If e.inlinedSet gt 2, introduce attribute
    nodetype to indicate the type of the XML element
  • The names of all the terminal XML elements in
    e.inlinedSet
  • If there is a ,-edge from e to node c, then
    introduce c.ID as a foreign key of e referencing
    relation c.

23
Generating relational schema (cont)
  • If there are at least two relations t_1(ID) and
    t_2(ID) generated by step 1, then we combine all
    the relations of the form t(ID) into one single
    relation table1(ID, nodetype)
  • If there are at least two relations t_1(ID, t_1)
    and t_2(ID, t_2) generated by step 1, then we
    combine all the relations of the form t(ID, t)
    into one single relation table2(ID, nodetype,
    pcdata)
  • If there is at least one edge in the inlined
    DTD graph, then we introduce relation
    edge(parentID, childID, parentType, childType).

24
Improvement over the shared-inlining algorithm
  • Completeness
  • Redundancy elimination for shared nodes
  • Optimizations
  • Efficiency
  • See the next slide for examples.

25
Examples
26
A complete example
27
DTD graph
28
Inlined DTD graph
29
Generated relational schema
30
Conclusions
  • We have developed a new inlining algorithm that
    maps a given input DTD to a relational schema.
  • We made several improvements over the
    shared-inlining algorithm. Experimental results
    will be presented in an upcoming paper.

31
Future work
  • Lossless schema mapping. How to maintain the
    sibling order relationship as well, so that the
    original XML document can be reconstructed!
  • Maintain the ID/IDREF/IDREFS in terms of key and
    foreign key constraints.
Write a Comment
User Comments (0)
About PowerShow.com