XML Algebra - PowerPoint PPT Presentation

About This Presentation
Title:

XML Algebra

Description:

NIAGARA. XML Query Optimization. XML Algebra. Data Model. Operator. Query Plan. Equivalent Rules ... account_number 555 777-3158 573 234 3 ... NIAGARA ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 48
Provided by: davi119
Learn more at: https://davis.wpi.edu
Category:
Tags: xml | algebra | niagara

less

Transcript and Presenter's Notes

Title: XML Algebra


1
XML Algebra
  • Comparison between
  • XPERANTO
  • NIAGARA

2
Part I
  • NIAGARA
  • XML Query Optimization
  • XML Algebra
  • Data Model
  • Operator
  • Query Plan
  • Equivalent Rules
  • XPERANTO
  • XML Query to SQL
  • XML Algebra
  • Data Model
  • Operator
  • Query Plan
  • Composition Rules
  • Translation Example

3
Example of Telephone Bill
lt?xml version1.0 encodingUS-ASCII ?gt
lt!DOCTYPE invoice lt!ELEMENT invoice
(account_number, bill_period, carrier,
itemized_call, total)gt lt!ELEMENT
account_number (PCDATA)gt lt!ELEMENT bill_period
(PCDATA)gt lt!ELEMENT carrier (PCDATA)gt lt!ELEMENT
itemized_call EMPTYgt lt!ATTLIST itemized_call no
ID REQUIRED date CDATA REQUIRED number_called
CDATA REQUIRED time CDATA REQUIRED rate
(NIGHTDAY) REQUIRED min CDATA REQUIRED amount
CDATA REQUIREDgt lt!ELEMENT total (PCDATA)gt gt
ltinvoicegt ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt ltbill_periodgtJun 9 - Jul 8,
2000lt/bill_periodgt ltcarriergtSprintlt/carriergt
ltitemized_call no1 dateJUN 10
number_called973 555-8888 time1017pm
rateNIGHT min1 amount0.05 /gt
ltitemized_call no2 dateJUN 13
number_called973 650-2222 time1019pm
rateDAY min1 amount0.15 /gt
ltitemized_call no3 dateJUN 15
number_called206 365-9999 time1025pm
rateNIGHT min3 amount0.15 /gt
lttotalgt0.35lt/totalgt lt/invoicegt
4
Example XQuery
  • User XQuery
  • ltsummarygt
  • FOR
  • rate IN distinct(document(invoice)/invoice/item
    ized_call_at_rate)
  • LET
  • itemized_call document(invoice)/invoice/item
    ized_call_at_raterate
  • WHERE
  • itemized_call/_at_number_called LIKE 973
  • RETURN
  • ltrategtratelt/rategt
  • ltnumber_of_callsgtcount(itemized_call)lt/number_of_
    callsgt
  • lt/summarygt

Count number of itemized_calls in calling area
973 grouped by the calling rate.
5
NIAGARA
  • Title Following the paths of XML Data An
    algebraic framework for XML query evaluation
  • By Leonidas Galanis, Efstratios Viglas, David
    J. DeWitt, Jeffrey. F. Naughton, and David Maier.

6
Goals
  • Be independent of schema information
  • Query on both structure and content
  • Generate simple, flexible, yet powerful algebraic
    expressions
  • Allow re-use of traditional optimization
    techniques

7
Data Model
  • A collection of bags of vertices.
  • The vertices in the bag have no order.
  • Example

Root invoice.xml invoice
invoice.account_number
lt account_number gt carrier -element-content lt/
account_number gt
ltinvoicegt Invoice-element-content lt/invoicegt
Rootinvoice.xml, invoice, invoice.
account_number
8
Data Model
  • Bag elements are reachable by path expressions.
  • The path expression consists of two parts
  • An entry point
  • A relative forward part
  • Example account_numberinvoice

9
Operators
  • Source S , Follow ?, Select ?, Join , Rename
    ?, Expose ?, Vertex ?, Group ?, Union ?,
    Intersection ?, Difference - , Cartesian Product
    ?.

10
Source Operator S
  • Input a list of documents
  • Output a collection of singleton bags
  • Examples
  • S () All Known XML documents
  • S (invoice.xml) All XML documents whose
    filename matches
  • invoice.xml
  • S (,schema.dtd) All known XML documents that
    conform to schema.dtd

11
Follow operator ?
  • Input a path expression in entry point notation
  • Functionality extracts vertices reachable by
    path expression
  • Output a new bag that consist of the extracted
    vertex all the contents of the original bag (in
    care of unnesting follow)

12
Follow operator (Example)
Root invoice.xml , invoice, invoice.carrier
Root invoice.xml invoice
invoice.carrier
ltcarriergt carrier -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
Unnesting Follow
?(carrierinvoice)
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice
13
Select operator ?
  • Input a set of bags
  • Functionality filters the bags of a collection
    using a predicate
  • Output a set of bags that conform to the
    predicate
  • Predicate Logical operator (?,?,?), or simple
    qualifications (?,?,?,?,?,?)

14
Select operator (Example)
Root invoice.xml , invoice,
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
? invoice.carrier Sprint
Root invoice.xml invoice
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice, Root invoice.xml
, invoice,
15
Join operator
  • Input two collections of bags
  • Functionality Joins the two collections based on
    a predicate
  • Output the concatenation of pairs of pages that
    satisfy the predicate

16
Join operator (Example)
Root invoice.xml , invoice, Root customer.xml ,
customer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
account_number invoice numbercustomer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
Root invoice.xml , invoice
Root customer.xml , customer
17
Expose operator ?
  • Input a list of path expressions of vertices to
    be exposed
  • Output a set of bags that contains vertices in
    the parameter list with the same order

18
Expose operator (Example)
Root invoice.xml , invoice.bill_period,
invoice.carrier
Root invoice.xml invoice.
bill_period invoice.carrier
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt carrier-element-content lt/invoicegt
?(bill_period,carrier)
Root invoice.xml invoice
invoice.carrier invoice.bill_period
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt carrier-element-content lt/invoicegt
Root invoice.xml , invoice, invoice.carrier,
invoice.bill_period
19
Vertex operator ?
  • Creates the actual XML vertex that will encompass
    everything created by an expose operator
  • Example

? (Customer_invoice)?(?(account)invoice.account_
number, ?(inv_total)invoice.total)
20
Other operators
  • Group ? is used for arbitrary grouping of
    elements based on their values
  • Aggregate functions can be used with the group
    operator (i.e. average)
  • Rename ? Changes the entry point annotation of
    the elements of a bag.
  • Example ?(invoice.bill_period,date)

21
Example XQuery
  • User XQuery
  • ltsummarygt
  • FOR
  • rate IN distinct(document(invoice)/invoice/item
    ized_call_at_rate)
  • LET
  • itemized_call document(invoice)/invoice/item
    ized_call_at_raterate
  • WHERE
  • itemized_call/_at_number_called LIKE 973
  • RETURN
  • ltrategtratelt/rategt
  • ltnumber_of_callsgtcount(itemized_call)lt/number_of_
    callsgt
  • lt/summarygt

Count number of itemized_calls in calling area
973 grouped by the calling rate.
22
Query Plan Algebra
  • ?(summary)
  • e(?(rate)rate
  • ?(number_of_calls)number)
  • ?(rateinvoice.itemized_call, rate),
  • ?(count(invoice.itemized_call), number)
  • ?(rateinvoice.itemized_call,
    count(invoice.itemized_call))
  • s number calledinvoice.itemized_call ?973
  • Fµ(invoice.itemized_call)
  • s(invoice.xml)

23
Equivalent Rules
  • 14 equivalent rules so far.
  • Definition of Auxiliary Operators for Equiv.
  • A gt B Path expression A is a prefix of B
  • - The null path expression
  • A?B The greatest common prefix of path
    expressions A and B
  • A?B The common prefix of path expressions A and
    B

.
24
Equivalent Rules Examples
  • Rule applications
  • Follow ordering
  • Fµ(A) Fµ(B) Fµ (B)Fµ (A)
  • iff C lt A, C lt B C A?B, or A?B -

.
C
X
X
A
B
B
A
...
25
Equivalent Rules Examples
  • Rule applications
  • Join commutability and associability
  • (A B) C (C B) A

26
Equivalent Rules Examples
  • Rule applications
  • Selection distribution and interchangeability
  • scA B sc1A sc2B
  • where c is a conjoin of the conditions c1 and c2,
    each of which only refers to one of the join
    inputs

27
Equivalent Rules Examples
  • Rule applications
  • Elimination of unused bag elements
  • e(P)(JA) J(e(PA))
  • iff J uses only elements exposed by P

28
XPERANTO
  • Goal
  • XQuery ? SQL
  • References
  • J. Shanmugasundaram, et. Al. Querying XML Views
    of Relational Data, VLDB 2001.
  • J. Shanmugasundaram, et. Al. Efficiently
    Publishing Relational Data as XML Documents, VLDB
    2000.
  • J. Shanmugasundaram, Ph.D. Dissertation. July,
    2001.

29
Query Processing Architecture
User
XQuery
User XML View
XQuery
RDB
30
Data Model
  • Tables of A List of XML Fragments

ltcarriergt carrier lt/carrier ltcarriergt
carrier lt/carriergt .
carriers
carriers


Groupby carrier aggXMLFrags(carrier_entry)
carrier_entry
ltcarriergt carrier lt/carrier
carrier_entry


Project carrier_entry ltcarriergtcarrierlt/carri
ergt
carrier
carrier


Select invoice_id id
invoice_id carrier


invoice_id
carrier
Table Carrier
31
Operators
  • Table, Project, Select, Join, Groupby, Orderby,
    Union, Unnest, View, Function
  • Select, Project, join, groupby, orderby and union
    have the same semantics as their relational
    counterparts.
  • Project to invoke various function defined
  • Table/View to refer to relational table or XML
    view
  • Unnest to unnest XML list
  • Function to invoke XQuery valued functions
  • Groupby to create XML Fragments

32
XML Functions Operators
XML Function Description Operators
1 cr8Elem(Tag, Atts, Clist) Creates an element with tag name Tag, attribute list Atts, and contents Clist Project
2 cr8AttList(A1,..An) Creates a list of attributes from the attributes passed as parameters Project
3 cr8Att(Name, Val) Creates an attribute with name Name and value Val Project
4 cr8XMLFragList(C1,Cn) Creates an XML fragment list from the content parameters Project
5 aggXMLFrags Aggregate XML function that creates an XML fragment list Groupby
6 getTagName(Elem) Returns the element name of the Elem Project, Select
7 getAttributes(Elem) Returns the list of attributes of Elem Project, Select
8 getContents(Elem) Returns the XML fragment list of contents of Elem Project, Select
9 getAttName(Att) Returns the name of attribute Att Project, Select
10 getAttValue Returns the value of the attribute Att Project, Select
11 isElement(E) Returns true if E is an element, returns false otherwise Select
12 isText(T) Returns true if T is text, returns false otherwise Select
13 Unnest(List) Superscalar function that unnests a list Unnest
33
Operators - Examples
elems
ltaccount_numbergt508-753-2352lt/account_numbergt ltbill_periodgt24 july 23 august, 2001lt/bill_periodgt .. .. ..

count
3
elems
Project elems getContents(invoice)
count
Groupby count count(itemized_call)
invoice
ltinvoicegt ltaccount_numbergt508-753-2352lt/account_numbergt ltbill_periodgt24 july 23 august, 2001lt/bill_periodgt .. .. lt/invoicegt
itemized_call
ltitemized_call gt lt/itemized_callgt
ltitemized_call gt lt/itemized_callgt
ltitemized_call gt lt/itemized_callgt
34
Operators - Examples
entries
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
result
ltsummarygt ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt lt/summarygt
entries
result
Groupby entries aggXMLFrags(entry)
Project result cr8Elem(summary, Att, entries)
entry
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt
ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt

entries
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
35
Operator - Examples
elem
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
elem
Unnest elem unnest(elems)
elems
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
36
XML Query
result
Project result ltsummarygt entries lt/summarygt
entries
Groupby entries aggXMLFrags(entry)
entry
  • User XQuery
  • ltsummarygt
  • FOR
  • rate IN distinct(document(invoice)/invoice/item
    ized_call_at_rate)
  • LET
  • itemized_call document(invoice)/invoice/item
    ized_call_at_raterate
  • WHERE
  • itemized_call/_at_number_called LIKE 973
  • RETURN
  • ltrategtratelt/rategt
  • ltnumber_of_callsgtcount(itemized_call)lt/number_of_
    callsgt
  • lt/summarygt

XQGM
Project entry ltrategt rate lt/rategt
ltnumber_of_callsgt count lt/number_of_callsgt
rate
count
count
Join (Correlated)
Groupby count count(itemized_call)
itemized_call
Selection number LIKE 973
rate
itemized_call
Select distinct(rate)
Select rate irate
rate
itemized_call
number
irate
Navigate doc/invoice/itemized_call_at_rate
Navigate irate doc/invoice/itemized_call_at_rate
number doc/invoice/itemized_call_at_number_calle
d
doc
View document(invoice.xml)
37
Navigation in XQGM
XQGM
account_number
account_number
Select getTagName(elem)account_number
Navigate invoice/account_number
elem
invoice
Unnest elem unnest(elems)
elems
Project elems getContents(invoice)
invoice
38
Default XML View
  • ltinvoicegt
  • ltrowgt
  • ltidgt 1 lt/idgt
  • ltaccount_numbergt555 777-3158 573 234
    3lt/account_numbergt
  • ltbill_periodgt Jun 9 Jun 8, 2000 lt/bill_periodgt
  • lttotalgt0.35lt/totalgt
  • lt/rowgt
  • lt/invoicegt
  • ltcarriergt
  • ltrowgt
  • ltinvoice_idgt 1 lt/invoice_idgt
  • ltcarriergtSprintlt/carriergt
  • lt/rowgt
  • lt/carriergt
  • ...

invoice
id account_number bill_period total
1 555 777-3158 573 234 3 Jun 9 Jun 8, 2000 0.35
carrier
invoice_id carrier
1 Sprint
itemized_call
invoice_id no date number_called time rate min amount
1 1 JUN 10 973 555-8888 1017pm NIGHT 1 0.05
1 2 JUN 13 973 650-2222 1019am DAY 1 0.15
1 3 JUN 15 206 365-9999 1025pm NIGHT 3 0.15
39
User Defined XML View
ltinvoicegt ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt ltbill_periodgtJun 9 - Jul 8,
2000lt/bill_periodgt ltcarriergtSprintlt/carriergt
ltitemized_call no1 dateJUN 10
number_called973 555-8888 time1017pm
rateNIGHT min1 amount0.05 /gt
ltitemized_call no2 dateJUN 13
number_called973 650-2222 time1019pm
rateDAY min1 amount0.15 /gt
ltitemized_call no3 dateJUN 15
number_called206 365-9999 time1025pm
rateNIGHT min3 amount0.15 /gt
lttotalgt0.35lt/totalgt lt/invoicegt
Invoice
Id account_number bill_period total
1 555 777-3158 573 234 3 Jun 9 Jun 8, 2000 0.35
Carrier
Invoice_id Carrier
1 Sprint
Itemized_call
Invoice_id No Date Number_called Time Rate Min Amount
1 1 JUN 10 973 555-8888 1017pm NIGHT 1 0.05
1 2 JUN 13 973 650-2222 1019am DAY 1 0.15
1 3 JUN 15 206 365-9999 1025pm NIGHT 3 0.15
40
User Defined XML View Cont.
  • Create view invoice as (
  • FOR
  • invoice IN view(default)/invoice/row
  • RETURN
  • ltinvoicegt
  • ltaccount_numbergtinvoice/account_numberlt/account_n
    umbergt
  • ltbill_periodgtinvoice/bill_periodlt/bill_periodgt
  • FOR
  • carrier in view(default)/carrier/row
  • WHERE
  • carrier/invoice_id invoice/id
  • RETURN
  • ltcarriergtcarrierlt/carriergt
  • FOR
  • itemized_call in view(default)/itemized_call/ro
    w
  • WHERE
  • itemized_call/invoice_id invoice/id
  • RETURN
  • ltitemized_call noitemized_call/no
    dateitemized_call/date number_calleditemized_c
    all/number_called timeitemized_call/time
    rateitemized_call/rate minitemized_call/min
    amountitemized_call/amount /gt

41
XML View XQGM
doc
Project doc ltinvoicegt ltaccount_numbergt
account_number lt/account_numbergt
ltbill_periodgtbill_periodlt/bill_periodgt carriers
itemized_calls lttotalgttotallt/totalgt lt/invoicegt
  • Create view invoice as (
  • FOR
  • invoice IN view(default)/invoice/row
  • RETURN
  • ltinvoicegt
  • ltaccount_numbergtinvoice/account_numberlt/account_n
    umbergt
  • ltbill_periodgtinvoice/bill_periodlt/bill_periodgt
  • FOR
  • carrier in view(default)/carrier/row
  • WHERE
  • carrier/invoice_id invoice/id
  • RETURN
  • ltcarriergtcarrierlt/carriergt
  • FOR
  • itemized_call in view(default)/itemized_call/ro
    w
  • WHERE
  • itemized_call/invoice_id invoice/id
  • RETURN
  • ltitemized_call noitemized_call/no
    dateitemized_call/date number_calleditemized_c
    all/number_called timeitemized_call/time
    rateitemized_call/rate minitemized_call/min
    amountitemized_call/amount /gt

account_number
bill_period
total
items
carriers
Join (Correlated)
carriers
Groupby carrier aggXMLFrags(carrier_entry)
items
Subquery.
carrier_entry
Project carrier_entry ltcarriergtcarrierlt/carri
ergt
invoice_id
carrier
Table Carrier
carrier
Select invoice_id id
invoice_id
carrier
Table Carrier
id
account_number
bill_period
total
Table Invoice
42
View Composition
  • User Query XQGM User View XQGM
  • To cancel out the Navigation operators
  • By using the composition rules

cr8Elem(invoice, cr8AttList(), cr8XMLFragList(
cr8Elem(account_number, cr8AttList(), cr8XMLFra
gList(account_number)), cr8Elem(bill_period,
cr8AttList(), cr8XMLFragList(bill_period)),
carriers, items, cr8Elem(total,
cr8AttList(), cr8XMLFragList(total)) ) )
account_number
Select getTagName(elem)account_number
elem
Unnest elem unnest(elems)
elems
Project elems getContents(invoice)
invoice
43
12 Composition Rules
Function COMPOSES WITH REDUCTION
1 getTagName cr8Elem(Tag, Atts, Clist) Tag
2 getAttributes Cr8Elem(Tag, Atts, Clist) Atts
3 getContents cr8Element(Tag, Atts, Clist) Clist
4 getAttName cr8Att(Name, Val) Name
5 getAttValue cr8Att(Name, Val) Val
6 isElement cr8Element(Tag, Atts, Clist) True
7 isElement Other than cr8Eleme False
8 isText PCDATA True
9 isText Other than PCDATA False
10 Unnest aggXMLFrags(C) C
11 Unnest cr8XMLFragList(C1, ..., Cn) C1 U ... U Cn
12 Unnest cr8AttList(A1, ..., An) A1 U ... U An
44
View Composition Example
account_number
Select getTagName(elem)account_number
elem
account_number
Unnest elem unnest(elems)
Join (Correlated)
elems
Project elems getContents(invoice)
invoice
Project invoice ltinvoicegt ltaccount_numbergt
account_number lt/account_numbergt ltbill_periodgt
bill_period lt/bill_periodgt carriers
itemized_calls lttotalgt total lt/totalgt lt/invoicegt
account_number
bill_period
total
items
carriers
Join (Correlated)
45
Computation Pushdown
  • Goal XQGM ? SQLs Tagger Graph
  • Step1 Query Decorrelation
  • Correlated Join ? Out Unions
  • Reference P. Seshadri, et. Al. Complex Query
    Decorrelation, ICDE 1996.
  • Step2 Tagger Pull-Up
  • XQGM ? Tagger Run-Time Graph
  • Use Sorted Outer Union
  • Reference J. Shanmugasundaram, et. Al.
    Efficiently Publishing Relational Data as XML
    Documents.
  • Separation of SQL and Tagger Operations
  • Semantically equivalent fragment by pattern.

46
Comparison
XPERANTO NIAGARA
Goal XQuery ? SQL XQuery ? Algebra
Algebra XQGM and Tagger Graph XML Algebra
Data Model Tables of a list of XML Fragments A collection of bags of vertices
Operators 10 operators with 13 functions 12 operators
Variable Binding Lot of temporary variables No variables.
Order Sensitive Semi-sensitive (missing orderby)
Regular Expression No Support at operator level Support at operator level
Text-in-context No Support Support
Level of abstraction Function level (lower) Logical level (higher)
Transition rules Composition rules (ad-hoc) 1 Semantically equivalent pattern (ad-hoc) Equivalent rules
Operation History Not maintained Maintained
47
Conclusions and Future Work
  • WE NEED OUR OWN ALGEBRA.
  • More Reading
  • David Beech, et. Al. A Formal Data Model and
    Algebra for XML.
  • Mary Fernandez, et. Al. An Algebra for XML Query.
Write a Comment
User Comments (0)
About PowerShow.com