On Wrapping Query Languages and Efficient XML Integration - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

On Wrapping Query Languages and Efficient XML Integration

Description:

Application require integrated access to various information sources, fast ... cosmos{cluet}: yat-mediator -port 6666. yat-mediator is running at cosmos.inria.fr:6666 ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 49
Provided by: rai672
Category:

less

Transcript and Presenter's Notes

Title: On Wrapping Query Languages and Efficient XML Integration


1
On Wrapping Query Languages and Efficient XML
Integration
  • A paper by Vassilis Christophides and Sophie
    Cluet
  • Speaker Yu Wang

2
Outline
  • Introduction
  • YAT System
  • YAT XML Algebra
  • Wrapping source query languages
  • Optimization techniques
  • Conclusion

3
Introduction
  • Application require integrated access to various
    information sources, fast deployment and low
    maintenance cost
  • XML
  • Enable easy wrapping of external sources
  • Enable easy wrapping of declarative integration

4
Advantage in using XML
  • Have flexible format can be used to represent
    structured/semistructured information
  • Convert data into XML easily
  • Exist many languages allowing declarative
    integration of XML data( e.g. MSL, YATL)
  • Facilitates interoperability as a standard

5
Hard issues
  • Wrapping type information
  • XMLs current form of typing is not sufficient to
    capture rich type systems(e.g. an object database
    schema).
  • Recent proposals( e.g. XML Schema, DCD) dont
    provide definitive standard yet.
  • Wrapping source query capabilities
  • TSIMMIS system query templates are used to
    describe source capabilities
  • Not allow an exhaustive description of a source
    capabilities
  • Processing XML queries efficiently
  • Not have a well-understood algebra

6
Solution
  • This paper propose an algebraic framework and
    optimization techniques to address the last 2
    issues
  • An algebra for XML
  • Introduce an operational model based on a
    general-purpose algebra for XML
  • A source description language
  • Use the algebra to wrap full text queries/
    structured query languages( e.g. OQL/ SQL)
  • Query processing techniques
  • Show the algebra is appropriate to optimize
    integration application

7
Example
  • Use an example to show the improvements this
    paper propose
  • Goal
  • Integrate two sources
  • Highly structured an object database
  • Partially structured document repository
    full-text indexed with Wais

8
Sample XML Data
  • ltworkgt
  • ltartistgt Claude Monet lt/artistgt
  • lttitlegt Nympheas lt/titlegt
  • ltstylegt Impressionist lt/stylegt
  • ltsizegt 21 x 61 lt/sizegt
  • ltcplacegtGivernylt/cplacegt
  • lt/workgt
  • ....
  • ltworkgt
  • ltartistgt Claude Monet lt/artistgt
  • lttitlegt Waterloo Bridge lt/titlegt
  • ltstylegt Impressionist lt/stylegt
  • ltsizegt 29.2 x 46.4 lt/sizegt
  • lthistorygtPainted with lttechniquegt Oil on canvas
  • lt/techniquegt in ...
  • lt/workgt
  • ltobject id"a1" class"artifact"gt
  • lttuplegt
  • lttitlegt Nympheas lt/titlegt
  • ltyeargt 1897 lt/yeargt
  • ltcreatorgt Claude Monet lt/creatorgt
  • ltpricegt 10.000.000 lt/pricegt
  • ltowners refs "p1 p2 p3"/gt
  • lt/tuplegt
  • lt/objectgt
  • .....
  • ltobject id"p3" class"person"gt
  • lttuplegt
  • ltnamegt Doctor X lt/namegt
  • ltauctiongt 10.1500.000lt/auctiongt
  • lt/tuplegt
  • lt/objectgt

9
YAT System
  • A semistructured data conversion system
  • Rely on a library of generic wrappers and a
    declarative integration languages, YATL
  • Use 3 steps to setup the application example with
    YAT
  • Structural information exported by the two
    wrappers o2 and xmlwais

10
Installing Wrappers and Mediators
  • logossimeon o2-wrapper -server
    gringos.inria.fr -system cultural -base art -port
    6066
  • o2-wrapper is running at logos.inria.fr6066
  • logossimeon
  • --------------------------------------------------
    ----------------------------
  • sapphochristop xmlwais-wrapper -directory
    christop/wais-sources/museum.src -port 6060
  • xmlwais-wrapper is running at sappho.ics.forth.gr
    6060
  • sapphochristop
  • --------------------------------------------------
    ----------------------------
  • cosmoscluet yat-mediator -port 6666
  • yat-mediator is running at cosmos.inria.fr6666
  • yatgt connect o2artifact logos.inria.fr6066
  • yatgt connect xmlartwork sappho.ics.forth.gr6060
  • yatgt import o2artifact
  • yatgt import xmlartwork
  • yatgt load "/u/cluet/YAT/view1.yat"

11
YAT Type System
  • Allow to represent information at various levels
    of genericity( model, schema , data)
  • Understand the connection existing between these
    levels
  • Using this feature to wrap query languages
  • A graphical representation of YAT data model
  • O2 data model
  • Described as atomic type/ a tuple/ a collection /
    a reference
  • A tuple type is represented as a collection of
    linear subtrees

12
YAT Type System( Cont)
  • The representation of the document exported by
    the xmlwais wrapper
  • Described as a sequence of mandatory elements
  • Capture partially structured information
  • YAT meta-model
  • Capture any tree
  • Others are instances of this model

13
O2, XML-Wais and YAT mediator structural metadata
14
Integration Programs
  • Compose
  • A sequence of rules
  • A sequence of queries 3 clauses
  • MATCH
  • Perform pattern-matching
  • Filters are used to navigate in the source data
    and bind variables
  • WHERE
  • MAKE
  • Construct the result by creating a new tree

15
Integrating information about the works of Art
  • artworks()
  • MAKE doc artwork(t,c) work title t,
  • artist a,
  • year y,
  • price p,
  • style s,
  • size si,
  • owners o,
  • more elds
  • MATCH artifacts WITH set class artifact
    tuple
  • title t,
  • year y,
  • creator c,
  • price p,
  • Owners list class person tuple
  • name o,
  • auction au,
  • works WITH works work artist a,
  • title t',

16
YAT XML Algebra
  • Overview
  • Characteristic
  • Mail tool for both the generic description of
    source query capabilities and the XML query
    optimization
  • Provides a fixed set of predefined operations
  • Satisfy the requirement
  • Expressive power
  • Capture evaluation of query and integration
    languages
  • Support for flexible typing
  • Support for optimization
  • An extension of one object algebra
  • Independent of any underlying physical access
    structure

17
YAT XML Algebra( Cont)
  • Operators
  • Bind operator
  • Extract data from input tree according to the
    filter
  • Produce a tabular representation of the variable
  • Tree operator
  • Returns a collection of trees conforming to the
    input pattern
  • Equivalent to a grouping operation
  • Skolem functions
  • Create new identifier
  • Perform value assignment

18
YAT XML Algebra( Cont)
  • Object algebra
  • Select/ Project / Join / Union / Intersection
  • Group/ Sort/ Map / D-Join
  • Applied on the top level of a Tab structure
    except Map

19
YAT XML Algebra( Cont)
20
A Bind operation and resulting Tab structure
21
The Tree operation
22
YATL Algebraic Translation
  • Steps
  • Named documents are the input
  • MATCH clause is translated into a Bind operation
  • The connection between the various inputs is
    materialized using a Join operation
  • Where clauses are translated into a Select
    operation
  • MAKE clause is translated using the Tree
    operation
  • The example shows the algebraic translation of
    the view definition of Figure 2 and example query

23
Query Example 1
  • Q1 What are the artifacts created at \Giverny" ?
  • MAKE t
  • MATCH artworks WITH doc.work. title.t,
    more.cplace.cl
  • WHERE cl "Giverny"

24
Algebraization of YATL queries
25
Wrapping source query languages
  • Wrapping source operations in YAT is performed in
    two steps
  • Signature
  • Essential
  • Manual
  • Semantics
  • Example function imported by the O2 wrapper
  • 1 ltoperation name"external"gt
  • 2 ltoperation name"current_price"gt
  • 3 ltinputgt
  • 4 ltvalue model"Artifact_Schema"
    pattern"Artifact"/gtlt/inputgt
  • 5 ltoutputgt
  • 6 ltleaf labelFloat /gtlt/outputgt
  • 7 lt/operationgt
  • 8 lt/operationgt

26
Describing OQL capabilities
  • YAT operational model borrows a large part of OQL
    algebra
  • OQL binding capabilities are more restricted
  • Take this restriction into account by restrict
    Bind operation
  • Bind is always the first operation in a query

27
Describing OQL capabilities( Cont )
  • Capturing Binding capabilities
  • Bind operation has 2 parameters a filter and the
    data that has to be filtered/ bound
  • Need to specify which are the acceptable filters
    for OQL
  • E.g. valid filters, called Fpattern
  • Integration programmer does not need to see it
  • Coded by YAT developers
  • Embedded within the O2 wrapper

28
O2 Filter patterns exported in XML
  • 1 ltinterface name"o2artifact"gt
  • 2 ltoperatgt
  • 3 ltfmodel name"o2fmodel"gt
  • 4 ltfpattern name"Fclass"gt
  • 5 ltnode label"class" bind"tree"gt
  • 6 ltnode label"Symbol"
    bind"none" inst"ground"gt
  • 7 ltvalue pattern"Ftype"/gtlt/n
    odegtlt/nodegt
  • 8 lt/fpatterngt
  • 9
  • 10 ltfpattern name"Ftype"gt
  • 11 ltuniongt
  • 12 ltleaf label"Bool"/gt
  • 13 ltleaf label"Char"/gt
  • 14 ltleaf label"Int"/gt
  • 15 ltleaf label"Float"/gt
  • 16 ltleaf label"String"/gt
  • 17 ltnode label"tuple" col"set"
    bind"tree"gt
  • 18 ltstar inst"ground"gt
  • ltnode label"Symbol" bind"none"gt
  • 21 ltnode label"set" col"set" bind"tree"gt
  • ltstar inst"none"gtltvalue label"Ftype"/gtlt/stargtlt
    /nodegt
  • 23 ltnode label"bag" col"bag" bind"tree"gt
  • ltstar inst"none"gtltvalue label"Ftype"/gtlt/stargtlt/n
    odegt
  • 25 ltnode label"list" bind"tree"gt
  • 26 ltstar inst"none"gtltvalue
    label"Ftype"/gtlt/stargtlt/nodegt
  • 27 ltnode label"array" bind"tree"gt
  • 28 ltstar inst"none"gtltvalue
    label"Ftype"/gtlt/stargtlt/nodegt
  • 29 ltref pattern"Fclass"/gt
  • 30 lt/uniongt
  • 31 lt/fpatterngt
  • 32 lt/fmodelgt
  • 33 lt/operatgt
  • 34 lt/interfacegt

29
Description for OQL.Below is a subset of the
operational interface of the O2 wrapper
  • 1 ltomodel name"o2omodel"gt
  • 2 ltoperation name"algebraic"gt
  • 3 ltuniongt
  • 4 ltoperation name"bind"gt
  • 5 ltinputgt
  • 6 ltvalue model"o2model" pattern"Type"/gt
  • 7 ltfilter model"o2fmodel" pattern"Ftype"/gt
  • 8 lt/inputgt
  • 9 ltoutputgt
  • 10 ltvalue model"yatstruc" pattern"Tab"/gt
  • 11 lt/outputgt
  • 12 lt/operationgt
  • 13 ltoperation name"select"gtlt/operationgt
  • 14 ltoperation name"map"gtlt/operationgt
  • 15 ...
  • 16 lt/uniongt
  • 17 lt/operationgt
  • 18
  • 19 ltoperation name"boolean"gt

30
Describing Wais capabilities
  • Three steps to wrap the query capabilities of the
    XML-Wais source
  • Specify the source Fpatterns
  • Declare the source supporting Bind and Select
  • Describe the full-text predicate contains
    supplied by Wais

31
Interface to the XML-Wais wrapper
32
Interface to the XML-Wais wrapper( Cont)
33
Optimization techniques
  • The algebra has two parts
  • Object algebra
  • Two operations to manipulate XML data
  • Bind and Tree
  • Optimization is divided into two parts
  • Optimization techniques proposed for the
    relational or object models are directly
    applicable
  • Rewriting techniques for the Bind/Tree operations
  • Optimize user queries with views locally or by
    pushing queries to the external sources

34
Bind Rewriting
  • Reason
  • A simpler Bind has a better chance to be pushed
    to a source
  • Bind entails navigation that can be costly and
    should be transformed into more traditional
    associative access as much as possible
  • 2 ways
  • Vertical navigation
  • Horizontal navigation and type filtering

35
Bind and vertical navigation
  • Two ways
  • Split a complex Bind into elementary Binds, each
    one connecting together through DJoins
  • Split a complex Bind into a linear sequence of
    elementary ones, each one navigating down the
    result of the previous one

36
From Bind to Join
37
Splitting Binds
38
Bind, horizontal navigation and type filtering
  • When absence the type information
  • In purely semistructured systems, the strategy is
    to navigate through the whole data graph
  • Using type information about the data or the
    filter is useful for XML queries mixing
    structured and semistructured data

39
Bind, horizontal navigation and type filtering(
Cont)
  • Semistructured queries over structured data
  • Queries access both structure and content
  • E.g. Since having precise type information, we
    can simplify the filter
  • Structured queries over Semistructured data
  • By using the projection to rewrite the Bind
    operation, we can simplify the query
  • Be careful not to change the type filtering
    semantics of the Bind

40
Bind and Map or Project
41
Tree-Bind Rewriting
  • Tree captures the restructuring semantics of a
    query or view definition
  • Tree can be rewritten as sequence of Group, Sort
    and nested Map operations
  • It is important to eliminate the intermediate
    Tree operations resulting from the composition of
    queries and view definition

42
Tree-Bind Rewriting( Cont)
  • Optimization process
  • Get ride of the Bind-Tree sequence that appears
    at the frontier between view definition and query
  • Transform the Bind-Tree sequence into a simple
    projection with renaming
  • Eliminate the branch corresponding to the O2
    source and simplify the Bind on the XML source
  • Merge the remaining Bind filters to obtain the
    final expression

43
Optimization of Q1
44
Source Capability-based Rewriting
  • Exploiting source capabilities during query
    processing is the most important technique in a
    distributed context
  • Pushing query evaluations to an external source
    allows
  • Reduce the processing time
  • Minimize the communication costs
  • Limit the system resources
  • Benefit from possible parallelism

45
Source Capability-based Rewriting( Cont)
  • Optimization steps
  • the Bind-Tree simplification
  • the projection is used to simplify the Bind on
    each source and selections are pushed
  • Push as much evaluation as possible to the source
  • On the O2 side, little work is required since
    both Bind and selection can be transformed into
    an OQL query
  • On the XML-Wais side, the possibility is to push
    a simple Bind on XML documents along with a
    contains predicate
  • Introduce a select with contains
  • Split the Bind to match the wais capabilities
    description
  • Determine possible information passing between
    sources based on standard rewriting between Joins
    and DJoins

46
Query Example 2
  • Q2 Which impressionist artworks are sold for
    less than 200,000.00?
  • MAKE answer title t, artist a, price p
  • MATCH works WITH doc work title t,

  • artist a,
  • price
    p,
  • style s
  • WHERE p lt 200000 AND s "Impressionist"

47
Algebraic translation and optimization of Q2
48
Conclusion
  • Present an algebraic framework to support
    efficient query evaluation in XML integration
    systems
  • Rely on a general purpose algebra
  • Wrap with appropriate type information, more
    structured query languages such as OQL and SQL
  • Equip algebra with a number of equivalence
    offering optimization opportunities
Write a Comment
User Comments (0)
About PowerShow.com