LegoDB: Cost-based XML to Relational - PowerPoint PPT Presentation

About This Presentation
Title:

LegoDB: Cost-based XML to Relational

Description:

LegoDB: Cost-based XML to Relational Shredding Jerome Simeon Bell Labs Lucent Technologies joint work with: Juliana Freire (Bell Labs, OGI) – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 19
Provided by: homepage91
Category:
Tags: legodb | xml | based | cost | rdbms | relational

less

Transcript and Presenter's Notes

Title: LegoDB: Cost-based XML to Relational


1
LegoDBCost-based XML to Relational Shredding
  • Jerome Simeon
  • Bell Labs Lucent Technologies
  • joint work with
  • Juliana Freire (Bell Labs, OGI)
  • Jayant Haritsa (IISc, Bangalore)
  • Maya Ramanath (IISc, Bangalore)
  • Prasan Roy (Bell Labs, ITT Bombay)

2
(No Transcript)
3
Why Storing XML in a RDBMS?
  • For some applications it makes sense to use a
    relational database backend
  • Leverage many years of development of relational
    technology
  • concurrency control/transaction support
  • Scalability
  • Safety (crash recovery, duplication)
  • Integrate with existing data stored in an RDBMS
  • Performance matters!!
  • But storing and querying XML data in an
  • RDBMS is a non-trivial task

4
XML and Relational Databases
  • Mismatch between the relational model and XML.
  • How to store XML data into relational tables?
  • Mapping (Shredding) XML data into flat and
    regular tables
  • How to evaluate XML queries over relational
    tables?
  • Mapping XQuery into SQL (or SQL-XML?)...
  • Litterature filled with various mapping
    proposals
  • Many variations over binary tables (Florescu et
    al, Grust et al).
  • Shanmugasundaram et. All try to inline as much
    nested elements as possible in the same table.

There are many alternative mappings!
5
(No Transcript)
6
...have various performances
Q1 Simple selection query Q2 Join involving a
fragment of the reviews Q3 publishing query
No given mapping is best
7
The LegoDB Storage "Shredding" Engine
  • An optimization approach
  • automatically explores a space of possible
    mappings
  • selects the mapping which has the lowest cost for
    a given application
  • Important features
  • Application-driven takes into account schema,
    data statistics and query workload
  • Logical/physical independence interface is
    XML-based (XML Schema, XQuery, XML data
    statistics)
  • Leverage existing technology XML standards
    XML-specific operations for generating space of
    mappings relational optimizer for evaluating
    configurations

8
(No Transcript)
9
A word about mapping queries
XQuery for x in //show/review where
contains(x/review, Potter) SQL select
review from Show_Table, Review_Table where
Parent_show Show_id // join here! and review
contains Potter
10
Transforming Schemas
  • Key idea A given document can be validated by
    different XML Schemas
  • Different but equivalent regular expressions can
    be used to define an element
  • The presence or absence of a type name does not
    change the semantics of an XML Schema
  • Applying transformations that manipulate the
    types (but preserve the element structure of
    schema) leads to a space of distinct relational
    configurations
  • Define XML Schema transformations that
  • Exploit the structure of the schema, and
  • lead to useful relational configurations

11
(No Transcript)
12
Regular expression rewritings
(group Agroup B),group C gt group Dgroup E
horizontal table partition define group
D(group A,group C) define group E(group B,group
C) group A gt put
first item in a table, put the group A, group A
rest in a different
table element type string extracts
certain element gt
in separate table element a type string
element (a) type string
...........
13
(No Transcript)
14
Searching for a good configuration
  • Cost is key use a relational optimizer as a
    black box
  • Support different cost-models
  • Quality of selected configuration depends on the
    accuracy of the optimizer!
  • Set of possible configurations that result from
    applying the rewritings is very large - possibly
    infinite!
  • How to search for the optimal solution?
  • LegoDB use a greedy search
  • Importance of statistics for cost evaluation
  • Large collections vs. small collections (e.g.,
    many new york time review or not?)
  • Selectivity of predicates
  • XML structure distribution (distribution of
    children / parent relation)

15
(No Transcript)
16
Conclusion
  • Cost-based approach for generating relational
    storage for XML
  • takes application characteristics into account
  • schema data 'statistics' queries
  • Performance shows that storage significant
    performance improvements
  • The same is likely to be true for other indexing
    techniques
  • Full text index on XML (best for text queries?)
  • Native XML indexes (best for Xpath queries
    without schema?)
  • Files!!! (Best when a lot of small documents an
    no need for concurrency control)
  • There is no one best way to store XML...
  • We should try to hide that from the user!

17
A few other things I've done...
  • Mapping XQuery to SQL/XML
  • Work with Jonathan Robie
  • SQL/XML is an extension of SQL to XML
  • Standard mapping from table to XML document
  • Standard mapping from relational schema to XML
    Schema
  • Extension of SQL query language to builds XML
    elements
  • Goal is to evaluation XQueries on top of an ODBC
    driver supporting XQuery
  • Approach identify a fragment of XQuery which has
    a direct syntactic mapping into SQL / XML
  • Surprise the syntactic approach worked really
    well, because of the way XQuery is designed.
    (i.e., FLWOR close to SQL statement).
  • Galax XQuery 1.0 implementation

18
My take on the Data Binding problem
  • The main problem will be about the mismatch
    between type systems
  • SQL vs. XML Schema vs. Object-Oriented
  • I've never seen a good proposal that brings any
    two type systems together (then builds a language
    one top of it)
  • Where I see the best chance of success
  • Use XML has the data model (can represent pretty
    much anything).
  • XML Schema can represent relational schema (see
    SQL/XML standard)
  • Can it represent Object-Oriented?
Write a Comment
User Comments (0)
About PowerShow.com