Relational Databases for Querying XML Documents: Limitations and Opportunities PowerPoint PPT Presentation

presentation player overlay
1 / 42
About This Presentation
Transcript and Presenter's Notes

Title: Relational Databases for Querying XML Documents: Limitations and Opportunities


1
Relational Databases for Querying XML
DocumentsLimitations and Opportunities
  • Presented by
  • Yi Lu

2
Introduction
  • XML is fast emerging as the dominant standard for
    representing data in the World Wide Web.
  • The initial purpose of XML is to enhance the
    ability of exchanging data over the Internet.
  • It raises a problem how to query the contents of
    the XML documents.

3
Approaches for querying XML documents
  • Use semi-structured query languages and query
    evaluation techniques.
  • Use relational database to store and query XML
    documents.
  • Native XML repositories, e.g., Software AGs
    Tamino, eXcelons XIS. (summarized by Lus paper)

4
Processes that used in the relational approach
  • Process a XML DTD to generate a relational schema
  • Parse XML documents conforming to DTDs and load
    them into tuples of relational tables in a
    standard commercial DBMS (DB2, Oracle)
  • Translate semi-structured queries over XML
    documents into SQL queries over relational
    database
  • Convert the results back to XML format.

5
Outline of the talk
  • XML background
  • Mapping XML DTD to relational schema
  • Basic inlining techniques
  • Shared inlining techniques
  • Hybrid inlining techniques
  • Some experiments
  • Translating semi-structured queries into SQL
    queries
  • Converting results to XML format.
  • Conclusions and future work

6
XML background
  • Extensible Markup Language
  • DTDs and other XML schemas
  • DCD (Document Content Descriptor)
  • XML Schemas
  • Semi-structure query languages
  • XML-QL, Lorel, UnQL, XQL
  • notion of path expressions for navigating nested
    structure of XML

7
XML Query Languages
  • XML-QL Use nested XML-like structure

8
XML Query Languages
  • Lorel more like SQL
  • In this paper, the combination of XML-QL Lorel
    is used to do demonstration

9
Outline of the talk
  • XML background
  • Mapping XML DTD to relational schema
  • Basic inlining techniques
  • Shared inlining techniques
  • Hybrid inlining techniques
  • Some experiments
  • Translating semi-structured queries into SQL
    queries
  • Converting results to XML format.
  • Conclusions and future work

10
Mapping DTDs to relational schemas
  • Simplifying DTDs
  • Creating and inlining DTD graphs
  • Generating relational schemas

11
Simplifying DTDS
  • Why?
  • DTDs can be complex
  • Generating relational schemas would be unwieldy
  • Can simplify DTD and still generate relational
    schema that can store and query documents
    conforming to the original DTD.
  • Transformation preserves semantics of
  • one or many
  • null or not null
  • Loses some information about relative orders of
    the element
  • can be captured when a specific XML doc is loaded
    into relational schema?

12
Simplifying DTDS (Cont)
  • Flatting transformations
  • Simplification transformations

13
Simplifying DTDS (Cont)
  • Grouping transformations
  • All operators are transformed to
  • A DTD simplification example
  • lt!ELEMENT a((bce)?,(e?f?,(b,b))))gt
  • Transformed to lt!ELEMENT a(b,c?,e,f)gt

14
Motivation for special schema conversion
  • Relational schema
  • derived from data model such as ER-model
  • clear separation between Entity and Attribute
  • Try mapping DTDs element and attribute to ERs
    entity and attribute
  • no correspondence
  • lead to excessive fragmentation of the document

15
DTD graph
  • Represents the structure of DTD
  • Nodes
  • Elements appear exactly once
  • Attributes appear as many times as they appear
  • Operators appear as many times as they appear
  • Cycles in the DTD graph indicate the presence of
    recursion

16
An example used in the paper
17
DTD graph
18
The Basic inlining technique
  • Creating a relation with DTD graph
  • All elements descendents are inlined into that
    relation
  • Exception
  • children directly below node are made into
    separate relations
  • each node having a backpointer edge pointing to
    it is made into a separate relations

19
The Basic inlining technique
  • Attributes are named by the path from the root
  • Each relation has an ID field
  • key of the relation
  • All relation corresponding to element nodes
    having a parent have a parentID field
  • foreign key

20
The Basic inlining technique
21
The Shared inlining technique
  • Attempts to avoid drawbacks of Basic
  • Principal idea
  • identify element nodes that are represented in
    multiple relations
  • share them by creating separate relations

22
The Shared inlining technique
  • Creating a relation that all elements in the DTD
    graph whose nodes have in-degree greater than
    one
  • in-degree of 1 inlined
  • in-degree of 0 separate relation is created
  • Elements below node are made into separate
    relations

23
The Shared inlining technique
24
The Hybrid inlining technique
  • Inlines some elements that are not inlined in
    Shared
  • Inlines elements with in-degree greater than one
  • that are not recursive
  • reached through a node

25
The Hybrid inlining technique
26
A qualitative evaluation of the Basic, Shared and
Hybrid techniques
  • Evaluation Metric
  • Major concern efficiency of query processing
  • Average number of SQL joins required to process
    path expressions of a certain length N
  • Measurements
  • The average number of SQL queries generated for
    path expressions of length N
  • The average number of joins in each SQL query for
    path expressions of length N
  • The total average number of joins in order to
    process path expressions of length N
  • Concentrate on comparisons between Shared and
    Hybrid

27
Evaluation results
  • Hybrid eliminates large number of joins for some
    DTDs
  • Hybrid requires more SQL queries than using
    Shared for some DTDs
  • Shared always produces at least number of join
    per SQL query as Hybrid
  • Hybrid always produces at least the number of SQL
    queries as Shared

28
Outline of the talk
  • XML background
  • Mapping XML DTD to relational schema
  • Basic inlining techniques
  • Shared inlining techniques
  • Hybrid inlining techniques
  • Some experiments
  • Translating semi-structured queries into SQL
    queries
  • Converting results to XML format.
  • Conclusions and future work

29
Translating semi-structured queries to SQL queries
  • Semi-structured QL have more flexibility than
    SQL, allow path expression with various operators
    and wild cards
  • Converting queries with simple path expression to
    SQL
  • Converting simple recursive path expressions to
    SQL
  • Converting arbitrary path expression to simple
    recursive path expressions

30
Converting queries with simple path expression to
SQL
  • Relation corresponding to start of the root path
    expression are identified
  • Add it to the from clause of the SQL query
  • Path expressions are translated to joins among
    relations

31
Converting simple recursive path expressions to
SQL
  • Determine the initialization of the recursion and
    the actual recursive path expression
  • Ask for the names of all editors reachable
    directly or indirectly from the monograph with
    title Subclass Cirripedia

32
Converting arbitrary path expression to simple
recursive path expressions
  • Path expression can be of arbitrary complexity
  • ask for all the name elements reachable directly
    or indirectly through monograph
  • Method
  • take path expression appearing in such query
  • translate them into possibly many simple path
    expressions

33
Outline of the talk
  • XML background
  • Mapping XML DTD to relational schema
  • Basic inlining techniques
  • Shared inlining techniques
  • Hybrid inlining techniques
  • Some experiments
  • Translating semi-structured queries into SQL
    queries
  • Converting results to XML format.
  • Conclusions and future work

34
Converting relational results to XML
  • Explorer How results by SQL queries can be
    converted XML documents
  • It is difficult to constructing arbitrary XML
    result, and it is the main drawback in using
    current relational approach
  • use XML-QL as the illustrative query language, it
    provides XML structuring constructs

35
Simple structuring
The first and last name of all the authors of
books.
36
Tag variables
  • Generating a relational query that contains tag
    value as an element of the result tuple.
  • Ask for The names of authors of all
    publications, nested under a tag specifying the
    type of publication

37
Grouping
  • Ask all the publications of an author to be
    grouped together, and within this structure,
    requires the titles of publications to be grouped
    by the type of publication.
  • Two approach can be used
  • The relational database order the result tuples
    by last name and then by publication type, and
    scan the result to construct the XML document. It
    is showed in the figure.
  • get an unordered set of tuples and do a grouping
    operation outside of database engine, by last
    name and by type.

38
Grouping
  • Treating tag variables as attribute in the result
    relation provided a way of uniformly treating
    the contents of the result XML document
  • Some relational database functionality is either
    not fully exploited or is duplicated outside.

39
Converting other element type
  • Complex element construction
  • Mainly concern about the set values
  • Heterogeneous Results
  • The different queries can be handled in different
    ways, and then the results can be merged together
  • Nested Queries
  • Using outer join to construct the association
    between a query and a sub-query

40
Conclusions
  • Study the virtues and limitation of the
    relational model for processing queries over XML
  • The advantages is reusing relational database
    technology which has high performance. The paper
    shows the possibility to handle most queries on
    XML document using a relational DB
  • Limitations
  • Awkward complex XML constructs in their results
  • Inefficient Fragmentation causes too many joins
    in the evaluation of simple queries

41
Future work
  • Support for sets
  • Untype/variable-type references
  • Information retrieval style indices
  • Flexible comparisons operators
  • Multiple-query optimization execution
  • More powerful recursion

42
Questions?
Write a Comment
User Comments (0)
About PowerShow.com