Introduction To XML Algebra - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction To XML Algebra

Description:

... format is a new one method for information integration. ... Oracle; IBM; Microsoft Corp. YAT Algebra (May 2000) AT&T Algebra (June 2000) --AT&T; Bell Labs ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 52
Provided by: webC
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction To XML Algebra


1
Introduction To XML Algebra
  • Wan Liu
  • Bintou Kane
  • Advanced Database
  • Instructor Elka
  • 2/11/2002
  • 1

2
Outline
  • Reasons for XML algebra
  • Niagara algebra
  • ATT Algebra

3
Data Model and Design
  • We need a clear framework to design a database
  • A data model is like creating different data
    structures for appropriate programming usage. It
    is a type system, it is abstract.
  • Relational database is implemented by tables, XML
    format is a new one method for information
    integration.

4
Why XML Algebra?
  • It is common to translate a query language into
    the algebra.
  • First, the algebra is used to give a semantics
    for the query language.
  • Second, the algebra is used to support query
    optimization.

5
XML Algebra History
  • Lore Algebra (August 1999)
  • -- Stanford University
  • IBM Algebra (September 1999)
  • --Oracle IBM Microsoft Corp
  • YAT Algebra (May 2000)
  • ATT Algebra (June 2000)
  • --ATT Bell Labs
  • Niagara Algebra (2001)
  • -- University of Wisconsin -Madison

6
NIAGARA
  • Title Following the paths of XML Data An
    algebraic framework for XML query evaluation
  • By Leonidas Galanis, Efstratios Viglas, David
    J. DeWitt, Jeffrey. F. Naughton, and David Maier.

7
OutLine
  • Concepts of Niagara Algebra
  • Operations
  • Optimization

8
Goals of Niagara Algebra
  • Be independent of schema information
  • Query on both structure and content
  • Generate simple,flexible, yet powerful algebraic
    expressions
  • Allow re-use of traditional optimization
    techniques

9
Example XML Source Documents
Invoice.xml ltInvoice_Documentgt ltinvoice No
1gt ltaccount_numbergt2 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.25lt/totalgt
lt/invoicegt ltinvoicegt ltaccount_numbergt1
lt/account_numbergt ltcarriergtSprintlt/carriergt
lttotalgt1.20lt/totalgt lt/invoicegt
ltinvoicegt ltaccount_numbergt1 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.75lt/totalgt
lt/invoicegt lt/Invoice_Documentgt
  • Customer.xml
  • ltCustomer_Documentgt
  • ltcustomergt
  • ltaccountgt1 lt/accountgt
  • ltnamegtTom lt/namegt
  • lt/customer gt
  • ltcustomergt
  • ltaccountgt2 lt/accountgt
  • ltnamegtGeorge lt/namegt
  • lt/customer gt
  • lt/Customer _Documentgt

10
XML Data Model and Tree Graph
  • Example

Invoice_Document
ltInvoice_Documentgt ltinvoicegt
ltnumbergt2lt/numbergt ltcarriergtSprintlt/carriergt
lttotalgt0.25lt/totalgt lt/invoicegt
ltinvoicegt ltnumbergt1lt/numbergt ltcarriergtSprintlt/car
riergt lttotalgt1.20lt/totalgt lt/invoicegt lt/Invoice
_Documentgt

Invoice
Invoice
number
carrier
number
total
total
carrier
2
ATT
0.25
1
1.20
Sprint
Ordered Tree Graph, Semi structured Data
11
XML Data Model GVDNM01
  • Collection of bags of vertices.
  • Vertices in a bag have no order.
  • Example

Root invoice.xml invoice
invoice.account_number
lt account_number gt element-content lt/
account_number gt
ltinvoicegt Invoice-element-content lt/invoicegt
Rootinvoice.xml, invoice, invoice.
account_number
12
Data Model
  • Bag elements are reachable by path expressions.
  • The path expression consists of two parts
  • An entry point
  • A relative forward part
  • Example account_numberinvoice

13
Operators
  • Source S , Follow ?, Select ?, Join , Rename
    ?, Expose ?, Vertex ?, Group ?, Union ?,
    Intersection ?, Difference - , Cartesian Product
    ?.

14
Source Operator S
  • Input a list of documents
  • Output a collection of singleton bags
  • Examples
  • S () All Known XML documents
  • S (invoice.xml) All XML documents whose
    filename matches
  • invoice.xml
  • S (,schema.dtd) All known XML documents that
    conform to schema.dtd

15
Follow operator ?
  • Input a path expression in entry point notation
  • Functionality extracts vertices reachable by
    path expression
  • Output a new bag that consist of the extracted
    vertex all the contents of the original bag (in
    care of unnesting follow)

16
Follow operator (Example)
Root invoice.xml , invoice, invoice.carrier
Root invoice.xml invoice
invoice.carrier
ltcarriergt carrier -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
Unnesting Follow
?(carrierinvoice)
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice
17
Select operator ?
  • Input a set of bags
  • Functionality filters the bags of a collection
    using a predicate
  • Output a set of bags that conform to the
    predicate
  • Predicate Logical operator (?,?,?), or simple
    qualifications (?,?,?,?,?,?)

18
Select operator (Example)
Root invoice.xml , invoice,
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
? invoice.carrier Sprint
Root invoice.xml invoice
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice, Root invoice.xml
, invoice,
19
Join operator
  • Input two collections of bags
  • Functionality Joins the two collections based on
    a predicate
  • Output the concatenation of pairs of pages that
    satisfy the predicate

20
Join operator (Example)
Root invoice.xml , invoice, Root customer.xml ,
customer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
account_number invoice numbercustomer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
Root invoice.xml , invoice
Root customer.xml , customer
21
Expose operator ?
  • Input a list of path expressions of vertices to
    be exposed
  • Output a set of bags that contains vertices in
    the parameter list with the same order

22
Expose operator (Example)
Root invoice.xml , invoice.bill_period,
invoice.carrier
Root invoice.xml invoice.
bill_period invoice.carrier
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt carrier-element-content lt/invoicegt
?(bill_period,carrier)
Root invoice.xml invoice
invoice.carrier invoice.bill_period
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt carrier-element-content lt/invoicegt
Root invoice.xml , invoice, invoice.carrier,
invoice.bill_period
23
Vertex operator ?
  • Creates the actual XML vertex that will encompass
    everything created by an expose operator
  • Example

? (Customer_invoice)?(?(account)invoice.account_
number, ?(inv_total)invoice.total)
24
Other operators
  • Group ? is used for arbitrary grouping of
    elements based on their values
  • Aggregate functions can be used with the group
    operator (i.e. average)
  • Rename ? Changes the entry point annotation of
    the elements of a bag.
  • Example ?(invoice.bill_period,date)

25
Example XML Source Documents
Invoice.xml ltInvoice_Documentgt
ltinvoicegt ltaccount_numbergt2 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.25lt/totalgt
lt/invoicegt ltinvoicegt ltaccount_numbergt1
lt/account_numbergt ltcarriergtSprintlt/carriergt
lttotalgt1.20lt/totalgt lt/invoicegt
ltinvoicegt ltaccount_numbergt1 lt/account_numbergt
lttotalgt0.75lt/totalgt lt/invoicegt ltauditorgt
maria lt/auditorgt lt/Invoice_Documentgt
Customer.xml ltCustomer_Documentgt
ltcustomergt ltaccountgt1 lt/accountgt ltnamegtTom
lt/namegt lt/customer gt ltcustomergt ltaccountgt
2 lt/accountgt ltnamegtGeorge lt/namegt
lt/customer gt lt/Customer _Documentgt
26
Xquery Example
  • List account number, customer name, and invoice
    total for all invoices that has carrier
    Sprint.
  • FOR i in (invoices.xml)//invoice,
  • c in (customers.xml)//customer
  • WHERE i/carrier Sprint and
  • i/account_number c/account
  • RETURN
  • ltSprint_invoicesgt
  • i/account_number,
  • c/name,
  • i/total
  • lt/Sprint_invoicesgt

27
Example Xquery output
  • ltSprint_Invoicegt
  • ltaccount_numbergt1 lt/account_numbergt
  • ltnamegtTom lt/namegt
  • lttotalgt1.20lt/totalgt
  • lt/Sprint_Invoice gt

28
Algebra Tree Execution
Account_number name total
Expose (.account_number , .name, .total )
invoice(2) customer(1)
Join (.invoice.account_number.customer.account)
invoice (2)
Select (carrier Sprint )
customer (2)
customer(1)
Invoice (1)
invoice (2)
invoice (3)
Follow (.invoice)
Follow (.customer)
Source (Invoices.xml)
Source (cutomers.xml)
29
Optimization with Niagara
  • Optimizer based on the Niagara algebra
  • Use the operation more efficiently
  • Produce simpler expression by combining
    operations

30
Language Convention
  • A and B are path expressions
  • Alt B --? Path Expression A is prefix of B
  • AnB ---? Common prefix of path A and B
  • AnB ---? Greatest common of path A and B
  • - ---? Null path Expression

31
Use of Rule 8.5
  • Make profit of rule 8.5
  • Allows optimization based on path selectivity
  • When applying un-nesting follow operation Fµ

32
  • Fµ(A) Fµ(B)Fµ (B)Fµ (A)
  • True When
  • Exist C / C ltA C lt B
  • C AnB
  • Or AnB -
  • Interchangeability of Follow operation

33
Application of 8.5 With Invoice
  • Fµ(acc_Numinvoice)Fµ(carrierinvoice)
  • ?
  • Fµ(carrierinvoice)Fµ(acc_Numinvoice)
  • Both Share the common prefix invoice
  • Case AnB invoice

34
Benefit of Rule Application
  • Note if
  • acc_Num required for each invoice Element
  • carrier is not required for invoice Element
  • Then using
  • Fµ(acc_Numinvoice)Fµ(acc_Numcustomer)
  • make more sense than Why?

35
  • Reduction of Input Size on the first
  • Sub-operation
  • Fµ(carrierinvoice)
  • Should we or can we apply the 8.5 below?
  • Fµ(acc_Numinvoice)Fµ(acc_NumCustomer)
  • Why?

36
  • acc_Numinvoice and
  • acc_NumCustomer are totally different path
  • Case is AnB - Then yes

37
Rule 8.7 , 8.9 , 8.11 Interesting Helps identify
  • When and where to use selection ?
  • to decrease size of input operation to subsequent
    operation
  • Example Algebra tree slide 28
  • Selected before join.

38
Addition would be
  • Give computation for finding when rule can be
    applied automatically in a case and then apply
    it.

39
  • ATT Algebra

40
(No Transcript)
41
ATT Algebra Introduction
  • The algebra is derived from the nested relational
    algebra.
  • ATT algebra makes heavy use of list
    comprehensions, a standard notation in the
    function programming community.
  • ATT algebra uses the functional programming
    language Haskell as a notation from presenting
    the algebra.

42
ATT data model
  • The data model merges attribute and element
    nodes, and eliminates comments.
  • Declare Basic Type Node.
  • Text String -gtnode
  • elem Tag -gt Node -gtnode
  • ref Node -gtNode

elem bib elem book elem _at_year
text 1999 , elem title text Data on
the web
  • ltbibgt
  • ltbook year1999gt
  • lttitlegt Data on the Weblt/titlegt
  • ltyeargt 1999lt/yeargt
  • lt/bookgt
  • lt/bibgt

43
Basic Type Declarations
  • To find the type of a node,
  • isText Node -gt Bool
  • isElem Node -gt Bool
  • isRef Node -gt Bool
  • For a text node, string Node -gt String
  • For an element node,
  • 1)tag Node -gt Tag
  • 2)children Node -gt Node
  • For a reference node,
  • dereference Node -gt Node

44
Nested relational algebra
  • In the nested relational approach, data is
    composed of tuples and lists.
  • Tuple values and tuple types are written in round
    brackets.
  • (1999,"Data on theWeb","Abiteboul")
    (Int,String,String)
  • Decompose values
  • year (Int,String,String)
  • year (x,y,l) x

45
Nested relational algebra
  • Comprehensions List comprehensions can be used
    to express fundamental query operations,
    navigation, cartesian product, nesting, joins.
  • Example value x
  • x lt- children book0, is "author" x
  • gt "Abiteboul"
  • Normal expression exp qual1,...,qualn
  • bool-exp
  • pat lt- list-exp

46
Nested relational algebra
  • Using comprehensions to write queries.
  • Navigate
  • follow Tag -gt Node -gt Node
  • follow t x y y lt- children x, is t y
  • Cartesian product
  • (value y, value z)
  • x lt- follow "book" bib0,
  • y lt- follow "title" x,
  • z lt- follow "author" x
  • gt ("Data on the Web", "Abiteboul")

47
Nested relational algebra
  • Joins.
  • elem "reviews"
  • elem "book"
  • elem "title" text"Data on the Web" ,
  • elem "review" text "This is great!"

(value y, int (value z), value w) x lt- follow
"book" bib0, y lt- follow "title" x, z lt- follow
"_at_year" x, u lt- follow "book" reviews0, v lt-
follow "title" u, w lt- follow _at_year" u, y v
gt ("Data on the Web", 1999, "This is
great!")
elem bib elem book elem _at_year text
1999 , elem title text Data on the
web
48
Nested relational algebra
  • Regular expression matching

( (x,y,u) x lt- item "_at_year", y lt- item
"title", u lt- rep (item "author") )
Reg (Node,Node,Node )
Match Reg a -gt Node-gt a
Result
match reg0 book0 gt (elem "_at_year" text
"1999", elem "title" text "Data on the
Web", elem "author" text "Abiteboul",
elem "author" text "Buneman", elem "author"
text "Suciu" )
49
Nested relational algebra
  • Sorting.
  • sortBy (a -gt a -gt Bool) -gt a -gt a
  • sortBy (lt) 3,1,2,1 gt 1,1,2,3
  • Grouping
  • groupBy (a -gt a -gt Bool) -gt a -gt a
  • groupBy () 3,1,2,1 2,1,1,3

50
Cross Comparisons of Algebra
  • Niagara and ATT standalone XML algebras
  • Niagara proposed after W3C had selected proposed
    standard
  • and has operators which operate on sets of
    bags
  • AtT algebra chosen as proposed standard by W3C
  • -- expressions resemble high level query
    language
  • -- latest version of document referred to as
  • Semantics of XML Query Language XQuery

51
Future Work
  • Need more different evaluation strategies which
    would allow for flexible query plans
  • Develop physical operators that take advantage of
  • physical storage structures and generate
    mapping from
  • query tree to a physical query plan
Write a Comment
User Comments (0)
About PowerShow.com