LegoDB - PowerPoint PPT Presentation

About This Presentation
Title:

LegoDB

Description:

imdb show type='movie' title Fugitive, The /title year 1993 /year ... imdb DTD and XML Schema. Question ? Can you find more storage mapping relations? ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 32
Provided by: timsuth
Learn more at: http://web.cs.wpi.edu
Category:
Tags: legodb | imdb

less

Transcript and Presenter's Notes

Title: LegoDB


1
LegoDB
Customizing Relational Storage for XML
Documents Timothy Sutherland Sachin Patidar
2
Managing XML Data
  • XML has become widely used for exchange of data
    over the Web
  • XML is extensible and flexible it can be used in
    applications with widely different requirements
  • There is no one-size-fits-all solution for all
    applications
  • What are procedures to store, query and publish
    XML data?
  • Need adaptable and flexible solutions
  • LegoDB is a component-based XML data management
    system
  • The database-anywhere paradigm
  • portable and adaptable to any data and any
    environment

3
Motivation challenges
  • Challenges
  • mismatch between nested tree structure and flat
    tuples of the relational model
  • Inducing the flexibility to handle the wide
    domain of application
  • But storing and querying XML data in an RDBMS is
    a non-trivial task
  • Motivation
  • reuse of well developed features
  • concurrency control
  • crash recovery
  • query processors
  • wide variety of XML applications
  • Integrate with existing data stored in an RDBMS

4
Mapping XML schema to relations
  • Question ?
  • Can different XML schemas validate the exact same
    set of documents?
  • Yes
  • Different but equivalent regular expression can
    describe the contents of a given element.
  • (a(bc)) ((a,b) (a,c))
  • Sub-elements of an element can be referred to
    directly, or can be referred to by a type name

5
Sample XML DatasetInternet Movie Database
  • Fugitive, The
  • 1993
  • Roger Ebert
  • Two thumbs up!
  • This is a fun action movie,
  • Harrison Ford at his best.
  • The standard Hollywood summer
  • movie strikes back.

X
Files,The 1994 4
Fallen
Angel Larry
Shaw
6
DTD and XML Schema
7
Question ?
  • Can you find more storage mapping relations?
  • By performing a sequence of transformations (i.e.
    rewritings) which preserve the semantics of the
    schema.

8
Mapping XML Schema into tables
Inline as many elements as possible
Partition reviews table one for NYTimes, and one
for rest
Split show table Into TV and Movies
9
Querying XML
  • Presence of schema for XML documents
  • For applications to interpret data
  • For issuing queries
  • Find the title, year and box office proceeds
  • For all 2001 movies
  • For v in document (imbdata)/imbd/show
  • Where v/year2001
  • Return v/title, v/year, v/box_office

10
XML and Relational Databases
  • There is a mismatch between the relational model
    and
  • that of XML
  • Relational Normalized, flat and fragmented
  • XML Un-normalized, nested and monolithic
  • How to store XML data into relational tables?
  • Need to map the nested and irregular XML data
    into flat and
  • regular tables
  • How to evaluate XML queries over relational
    tables?
  • Need to map XQuery into SQL

11
Problem Storing XML in RDMS
Taken from Juliana Freires presentation
12
Queries
13
Mapping an XML Schema into Tables
Different applications requires specific mappings
for best performance
Publish W1Q1 0.4, Q20.4, Q30.1,
Q40.1 Lookup W2Q10.1, Q20.1, Q30.4, Q40.4
Taken from Juliana Freires presentation
14
The LegoDB Storage Mapping Engine
  • An optimization approach
  • automatically explores a space of possible
    mappings
  • selects the mapping which has the lowest cost for
    a given application
  • Basic Principles
  • Application-driven takes into account schema,
    data statistics and query workload
  • Logical/physical independence interface is
    XML-based ( XML
  • Schema, XQuery, XML data statistics)
  • Leverage existing technology XML standards
    XML-specific
  • operations for generating space of mappings
    relational optimizer for evaluating configurations

15
LegoDB
  • Create a p-schema for input XML schema
  • Obtain cost estimates with input of data
    statistics and XQuery workload.
  • Search space of alternative storage
    configurations to achieved an efficient mapping
    for a given application.

16
Architecture of the Mapping Engine
Cost (SQi)
RSi Relational Schema/Queries/Stats
PSi Physical Schema
17
XML Schema to Relations
  • How to transform a XML schema to a relation?
  • P-Schema
  • Type Show
  • show _at_type String,
  • year Integer,
  • title String
  • Review
  • Type Review
  • review String

18
XML Schema to Relations
  • For a type T and relation R
  • R1- Create one relation R for each T.
  • R2- Create a key for each T
  • R3- Create a foreign key for all parents of T
  • R4- Create columns for R for every physical type
    in T
  • R5- Allow null values in R for every optional
    type in T

19
XML Schema to Relations
  • Type Show
  • show _at_type String,
  • year Integer,
  • title String
  • Review
  • Type Review
  • review String

20
Types of XML Transformations
  • Inlining/Outlining
  • Union Factorization/Distribution
  • Repetition Merge/Split
  • Wildcard Rewritings

21
Inlining/Outlining
  • Attributes can be outlined by removing them
    from a relation and using a foreign key to relate
    them to a table.
  • Inlining is the exact opposite.

Type TV seasonsInteger, Description, Episo
de Type Description descriptionString
Type TV seasonsInteger descriptionString
Episode
22
Union Factorization/Distribution
  • ((a,(bc)) (a,ba,c))
  • (at1t2 at1at2)

Type Show show _at_type String, titleString
, yearInteger, (MovieTV) Type Movie
box_officeInteger, video_salesInt, Type
TV seasonsInteger, descriptionString, Ep
isode
Type Show show (_at_typeString, titleStrin
g, yearInteger, box_officeInteger, video_s
alesInteger) (_at_typeString, titleString
, yearInteger, seasonsInteger, description
String, Episode)
23
Repetition Merge/Split
  • (a a,a a,a,a) etc

Type Show show _at_type String, titleString
, yearInteger, Aka, Aka0,
Type Show show _at_type String, titleString
, yearInteger, Aka1,
24
Wildcard Rewritings
  • We might want to access specific elements in a
    wildcard, such as NYTReview

Type Reviews review (NYTReview
OtherReview) Type NYTReview
nytString Type OtherReview (!nyt)String
Type Review reviewString
25
Finding the best pSchema
  • Use a Greedy Search
  • Search until a good result is found
  • 1. Get Initial/Current Schema
  • 2. Get schema cost
  • 3. Apply transformations to the schema
  • 4. Select the best schema cost from the
    transformations
  • 5. If the cost is better than the current schema,
    continue the search, mark this schema as the
    current schema. Otherwise stop searching.

26
Example Search
27
Problem?
  • With the way that this algorithm is set up can
    you find a major oversight?
  • Remember how the relational data is created
  • Sample XML Data
  • Sample XML Queries

28
Problem
  • A problem can be that the relative number of each
    query type is not taken into consideration.
  • For example, what will happen if 90 of queries
    are to gather a review for a website, while that
    is only 1 of 25 queries in the system. Query
    distribution is not uniform!

29
Problem...SolvedKind Of...
  • If we take into account the frequency of a query

30
Related Work
  • STORED- Storing Semistructured Data
  • SilkRoute- Converting Relational Data to XML
  • StatiX- XML Schema statistics framework

31
Conclusions
  • LegoDB is an excellent way to take Cost of a
    query into account when transforming an XML
    document to the relational model
  • Although LegoDB does an excellent job of
    transforming XML compared to static models, more
    work can be done on how to analyze how the
    frequency of queries affect the cost of the
    relational model.
Write a Comment
User Comments (0)
About PowerShow.com