Relational Databases for Querying XML Documents: Limitations and Opportunities presentation

About This Presentation

Transcript and Presenter's Notes

Title: Relational Databases for Querying XML Documents: Limitations and Opportunities

1
Relational Databases for Querying XML
DocumentsLimitations and Opportunities

Presented by
Yi Lu

2
Introduction

XML is fast emerging as the dominant standard for
representing data in the World Wide Web.
The initial purpose of XML is to enhance the
ability of exchanging data over the Internet.
It raises a problem how to query the contents of
the XML documents.

3
Approaches for querying XML documents

Use semi-structured query languages and query
evaluation techniques.
Use relational database to store and query XML
documents.
Native XML repositories, e.g., Software AGs
Tamino, eXcelons XIS. (summarized by Lus paper)

4
Processes that used in the relational approach

Process a XML DTD to generate a relational schema
Parse XML documents conforming to DTDs and load
them into tuples of relational tables in a
standard commercial DBMS (DB2, Oracle)
Translate semi-structured queries over XML
documents into SQL queries over relational
database
Convert the results back to XML format.

5
Outline of the talk

XML background
Mapping XML DTD to relational schema
Basic inlining techniques
Shared inlining techniques
Hybrid inlining techniques
Some experiments
Translating semi-structured queries into SQL
queries
Converting results to XML format.
Conclusions and future work

6
XML background

Extensible Markup Language
DTDs and other XML schemas
DCD (Document Content Descriptor)
XML Schemas
Semi-structure query languages
XML-QL, Lorel, UnQL, XQL
notion of path expressions for navigating nested
structure of XML

7
XML Query Languages

XML-QL Use nested XML-like structure

8
XML Query Languages

Lorel more like SQL
In this paper, the combination of XML-QL Lorel
is used to do demonstration

9
Outline of the talk

XML background
Mapping XML DTD to relational schema
Basic inlining techniques
Shared inlining techniques
Hybrid inlining techniques
Some experiments
Translating semi-structured queries into SQL
queries
Converting results to XML format.
Conclusions and future work

10
Mapping DTDs to relational schemas

Simplifying DTDs
Creating and inlining DTD graphs
Generating relational schemas

11
Simplifying DTDS

Why?
DTDs can be complex
Generating relational schemas would be unwieldy
Can simplify DTD and still generate relational
schema that can store and query documents
conforming to the original DTD.
Transformation preserves semantics of
one or many
null or not null
Loses some information about relative orders of
the element
can be captured when a specific XML doc is loaded
into relational schema?

12
Simplifying DTDS (Cont)

Flatting transformations
Simplification transformations

13
Simplifying DTDS (Cont)

Grouping transformations
All operators are transformed to
A DTD simplification example
lt!ELEMENT a((bce)?,(e?f?,(b,b))))gt
Transformed to lt!ELEMENT a(b,c?,e,f)gt

14
Motivation for special schema conversion

Relational schema
derived from data model such as ER-model
clear separation between Entity and Attribute
Try mapping DTDs element and attribute to ERs
entity and attribute
no correspondence
lead to excessive fragmentation of the document

15
DTD graph

Represents the structure of DTD
Nodes
Elements appear exactly once
Attributes appear as many times as they appear
Operators appear as many times as they appear
Cycles in the DTD graph indicate the presence of
recursion

16
An example used in the paper
17
DTD graph
18
The Basic inlining technique

Creating a relation with DTD graph
All elements descendents are inlined into that
relation
Exception
children directly below node are made into
separate relations
each node having a backpointer edge pointing to
it is made into a separate relations

19
The Basic inlining technique

Attributes are named by the path from the root
Each relation has an ID field
key of the relation
All relation corresponding to element nodes
having a parent have a parentID field
foreign key

20
The Basic inlining technique
21
The Shared inlining technique

Attempts to avoid drawbacks of Basic
Principal idea
identify element nodes that are represented in
multiple relations
share them by creating separate relations

22
The Shared inlining technique

Creating a relation that all elements in the DTD
graph whose nodes have in-degree greater than
one
in-degree of 1 inlined
in-degree of 0 separate relation is created
Elements below node are made into separate
relations

23
The Shared inlining technique
24
The Hybrid inlining technique

Inlines some elements that are not inlined in
Shared
Inlines elements with in-degree greater than one
that are not recursive
reached through a node

25
The Hybrid inlining technique
26
A qualitative evaluation of the Basic, Shared and
Hybrid techniques

Evaluation Metric
Major concern efficiency of query processing
Average number of SQL joins required to process
path expressions of a certain length N
Measurements
The average number of SQL queries generated for
path expressions of length N
The average number of joins in each SQL query for
path expressions of length N
The total average number of joins in order to
process path expressions of length N
Concentrate on comparisons between Shared and
Hybrid

27
Evaluation results

Hybrid eliminates large number of joins for some
DTDs
Hybrid requires more SQL queries than using
Shared for some DTDs
Shared always produces at least number of join
per SQL query as Hybrid
Hybrid always produces at least the number of SQL
queries as Shared

28
Outline of the talk

XML background
Mapping XML DTD to relational schema
Basic inlining techniques
Shared inlining techniques
Hybrid inlining techniques
Some experiments
Translating semi-structured queries into SQL
queries
Converting results to XML format.
Conclusions and future work

29
Translating semi-structured queries to SQL queries

Semi-structured QL have more flexibility than
SQL, allow path expression with various operators
and wild cards
Converting queries with simple path expression to
SQL
Converting simple recursive path expressions to
SQL
Converting arbitrary path expression to simple
recursive path expressions

30
Converting queries with simple path expression to
SQL

Relation corresponding to start of the root path
expression are identified
Add it to the from clause of the SQL query
Path expressions are translated to joins among
relations

31
Converting simple recursive path expressions to
SQL

Determine the initialization of the recursion and
the actual recursive path expression
Ask for the names of all editors reachable
directly or indirectly from the monograph with
title Subclass Cirripedia

32
Converting arbitrary path expression to simple
recursive path expressions

Path expression can be of arbitrary complexity
ask for all the name elements reachable directly
or indirectly through monograph
Method
take path expression appearing in such query
translate them into possibly many simple path
expressions

33
Outline of the talk

XML background
Mapping XML DTD to relational schema
Basic inlining techniques
Shared inlining techniques
Hybrid inlining techniques
Some experiments
Translating semi-structured queries into SQL
queries
Converting results to XML format.
Conclusions and future work

34
Converting relational results to XML

Explorer How results by SQL queries can be
converted XML documents
It is difficult to constructing arbitrary XML
result, and it is the main drawback in using
current relational approach
use XML-QL as the illustrative query language, it
provides XML structuring constructs

35
Simple structuring
The first and last name of all the authors of
books.
36
Tag variables

Generating a relational query that contains tag
value as an element of the result tuple.
Ask for The names of authors of all
publications, nested under a tag specifying the
type of publication

37
Grouping

Ask all the publications of an author to be
grouped together, and within this structure,
requires the titles of publications to be grouped
by the type of publication.
Two approach can be used
The relational database order the result tuples
by last name and then by publication type, and
scan the result to construct the XML document. It
is showed in the figure.
get an unordered set of tuples and do a grouping
operation outside of database engine, by last
name and by type.

38
Grouping

Treating tag variables as attribute in the result
relation provided a way of uniformly treating
the contents of the result XML document
Some relational database functionality is either
not fully exploited or is duplicated outside.

39
Converting other element type

Complex element construction
Mainly concern about the set values
Heterogeneous Results
The different queries can be handled in different
ways, and then the results can be merged together
Nested Queries
Using outer join to construct the association
between a query and a sub-query

40
Conclusions

Study the virtues and limitation of the
relational model for processing queries over XML
The advantages is reusing relational database
technology which has high performance. The paper
shows the possibility to handle most queries on
XML document using a relational DB
Limitations
Awkward complex XML constructs in their results
Inefficient Fragmentation causes too many joins
in the evaluation of simple queries

41
Future work

Support for sets
Untype/variable-type references
Information retrieval style indices
Flexible comparisons operators
Multiple-query optimization execution
More powerful recursion

42
Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Relational Databases for Querying XML Documents: Limitations and Opportunities PowerPoint PPT Presentation