Title: XML Algebra
1XML Algebra
- Comparison between
- XPERANTO
- NIAGARA
2Part I
- NIAGARA
- XML Query Optimization
- XML Algebra
- Data Model
- Operator
- Query Plan
- Equivalent Rules
- XPERANTO
- XML Query to SQL
- XML Algebra
- Data Model
- Operator
- Query Plan
- Composition Rules
- Translation Example
3Example of Telephone Bill
lt?xml version1.0 encodingUS-ASCII ?gt
lt!DOCTYPE invoice lt!ELEMENT invoice
(account_number, bill_period, carrier,
itemized_call, total)gt lt!ELEMENT
account_number (PCDATA)gt lt!ELEMENT bill_period
(PCDATA)gt lt!ELEMENT carrier (PCDATA)gt lt!ELEMENT
itemized_call EMPTYgt lt!ATTLIST itemized_call no
ID REQUIRED date CDATA REQUIRED number_called
CDATA REQUIRED time CDATA REQUIRED rate
(NIGHTDAY) REQUIRED min CDATA REQUIRED amount
CDATA REQUIREDgt lt!ELEMENT total (PCDATA)gt gt
ltinvoicegt ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt ltbill_periodgtJun 9 - Jul 8,
2000lt/bill_periodgt ltcarriergtSprintlt/carriergt
ltitemized_call no1 dateJUN 10
number_called973 555-8888 time1017pm
rateNIGHT min1 amount0.05 /gt
ltitemized_call no2 dateJUN 13
number_called973 650-2222 time1019pm
rateDAY min1 amount0.15 /gt
ltitemized_call no3 dateJUN 15
number_called206 365-9999 time1025pm
rateNIGHT min3 amount0.15 /gt
lttotalgt0.35lt/totalgt lt/invoicegt
4Example XQuery
- User XQuery
- ltsummarygt
-
- FOR
- rate IN distinct(document(invoice)/invoice/item
ized_call_at_rate) - LET
- itemized_call document(invoice)/invoice/item
ized_call_at_raterate - WHERE
- itemized_call/_at_number_called LIKE 973
- RETURN
- ltrategtratelt/rategt
- ltnumber_of_callsgtcount(itemized_call)lt/number_of_
callsgt -
- lt/summarygt
Count number of itemized_calls in calling area
973 grouped by the calling rate.
5NIAGARA
- Title Following the paths of XML Data An
algebraic framework for XML query evaluation - By Leonidas Galanis, Efstratios Viglas, David
J. DeWitt, Jeffrey. F. Naughton, and David Maier.
6Goals
- Be independent of schema information
- Query on both structure and content
- Generate simple, flexible, yet powerful algebraic
expressions - Allow re-use of traditional optimization
techniques
7Data Model
- A collection of bags of vertices.
- The vertices in the bag have no order.
- Example
-
Root invoice.xml invoice
invoice.account_number
lt account_number gt carrier -element-content lt/
account_number gt
ltinvoicegt Invoice-element-content lt/invoicegt
Rootinvoice.xml, invoice, invoice.
account_number
8Data Model
- Bag elements are reachable by path expressions.
- The path expression consists of two parts
- An entry point
- A relative forward part
- Example account_numberinvoice
9Operators
- Source S , Follow ?, Select ?, Join , Rename
?, Expose ?, Vertex ?, Group ?, Union ?,
Intersection ?, Difference - , Cartesian Product
?.
10 Source Operator S
- Input a list of documents
- Output a collection of singleton bags
- Examples
- S () All Known XML documents
- S (invoice.xml) All XML documents whose
filename matches - invoice.xml
- S (,schema.dtd) All known XML documents that
conform to schema.dtd
11Follow operator ?
- Input a path expression in entry point notation
- Functionality extracts vertices reachable by
path expression - Output a new bag that consist of the extracted
vertex all the contents of the original bag (in
care of unnesting follow)
12Follow operator (Example)
Root invoice.xml , invoice, invoice.carrier
Root invoice.xml invoice
invoice.carrier
ltcarriergt carrier -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
Unnesting Follow
?(carrierinvoice)
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice
13Select operator ?
- Input a set of bags
- Functionality filters the bags of a collection
using a predicate - Output a set of bags that conform to the
predicate - Predicate Logical operator (?,?,?), or simple
qualifications (?,?,?,?,?,?)
14Select operator (Example)
Root invoice.xml , invoice,
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
? invoice.carrier Sprint
Root invoice.xml invoice
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice, Root invoice.xml
, invoice,
15Join operator
- Input two collections of bags
- Functionality Joins the two collections based on
a predicate - Output the concatenation of pairs of pages that
satisfy the predicate
16Join operator (Example)
Root invoice.xml , invoice, Root customer.xml ,
customer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
account_number invoice numbercustomer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
Root invoice.xml , invoice
Root customer.xml , customer
17Expose operator ?
- Input a list of path expressions of vertices to
be exposed - Output a set of bags that contains vertices in
the parameter list with the same order
18Expose operator (Example)
Root invoice.xml , invoice.bill_period,
invoice.carrier
Root invoice.xml invoice.
bill_period invoice.carrier
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt carrier-element-content lt/invoicegt
?(bill_period,carrier)
Root invoice.xml invoice
invoice.carrier invoice.bill_period
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt carrier-element-content lt/invoicegt
Root invoice.xml , invoice, invoice.carrier,
invoice.bill_period
19Vertex operator ?
- Creates the actual XML vertex that will encompass
everything created by an expose operator - Example
? (Customer_invoice)?(?(account)invoice.account_
number, ?(inv_total)invoice.total)
20Other operators
- Group ? is used for arbitrary grouping of
elements based on their values - Aggregate functions can be used with the group
operator (i.e. average) - Rename ? Changes the entry point annotation of
the elements of a bag. - Example ?(invoice.bill_period,date)
21Example XQuery
- User XQuery
- ltsummarygt
-
- FOR
- rate IN distinct(document(invoice)/invoice/item
ized_call_at_rate) - LET
- itemized_call document(invoice)/invoice/item
ized_call_at_raterate - WHERE
- itemized_call/_at_number_called LIKE 973
- RETURN
- ltrategtratelt/rategt
- ltnumber_of_callsgtcount(itemized_call)lt/number_of_
callsgt -
- lt/summarygt
Count number of itemized_calls in calling area
973 grouped by the calling rate.
22Query Plan Algebra
- ?(summary)
- e(?(rate)rate
- ?(number_of_calls)number)
-
- ?(rateinvoice.itemized_call, rate),
- ?(count(invoice.itemized_call), number)
- ?(rateinvoice.itemized_call,
count(invoice.itemized_call)) - s number calledinvoice.itemized_call ?973
- Fµ(invoice.itemized_call)
- s(invoice.xml)
23Equivalent Rules
- 14 equivalent rules so far.
- Definition of Auxiliary Operators for Equiv.
- A gt B Path expression A is a prefix of B
- - The null path expression
- A?B The greatest common prefix of path
expressions A and B - A?B The common prefix of path expressions A and
B
.
24Equivalent Rules Examples
- Rule applications
- Follow ordering
- Fµ(A) Fµ(B) Fµ (B)Fµ (A)
- iff C lt A, C lt B C A?B, or A?B -
.
C
X
X
A
B
B
A
...
25Equivalent Rules Examples
- Rule applications
- Join commutability and associability
- (A B) C (C B) A
26Equivalent Rules Examples
- Rule applications
- Selection distribution and interchangeability
- scA B sc1A sc2B
- where c is a conjoin of the conditions c1 and c2,
each of which only refers to one of the join
inputs
27Equivalent Rules Examples
- Rule applications
- Elimination of unused bag elements
- e(P)(JA) J(e(PA))
- iff J uses only elements exposed by P
28XPERANTO
- Goal
- XQuery ? SQL
- References
- J. Shanmugasundaram, et. Al. Querying XML Views
of Relational Data, VLDB 2001. - J. Shanmugasundaram, et. Al. Efficiently
Publishing Relational Data as XML Documents, VLDB
2000. - J. Shanmugasundaram, Ph.D. Dissertation. July,
2001.
29Query Processing Architecture
User
XQuery
User XML View
XQuery
RDB
30Data Model
- Tables of A List of XML Fragments
ltcarriergt carrier lt/carrier ltcarriergt
carrier lt/carriergt .
carriers
carriers
Groupby carrier aggXMLFrags(carrier_entry)
carrier_entry
ltcarriergt carrier lt/carrier
carrier_entry
Project carrier_entry ltcarriergtcarrierlt/carri
ergt
carrier
carrier
Select invoice_id id
invoice_id carrier
invoice_id
carrier
Table Carrier
31Operators
- Table, Project, Select, Join, Groupby, Orderby,
Union, Unnest, View, Function - Select, Project, join, groupby, orderby and union
have the same semantics as their relational
counterparts. - Project to invoke various function defined
- Table/View to refer to relational table or XML
view - Unnest to unnest XML list
- Function to invoke XQuery valued functions
- Groupby to create XML Fragments
32XML Functions Operators
XML Function Description Operators
1 cr8Elem(Tag, Atts, Clist) Creates an element with tag name Tag, attribute list Atts, and contents Clist Project
2 cr8AttList(A1,..An) Creates a list of attributes from the attributes passed as parameters Project
3 cr8Att(Name, Val) Creates an attribute with name Name and value Val Project
4 cr8XMLFragList(C1,Cn) Creates an XML fragment list from the content parameters Project
5 aggXMLFrags Aggregate XML function that creates an XML fragment list Groupby
6 getTagName(Elem) Returns the element name of the Elem Project, Select
7 getAttributes(Elem) Returns the list of attributes of Elem Project, Select
8 getContents(Elem) Returns the XML fragment list of contents of Elem Project, Select
9 getAttName(Att) Returns the name of attribute Att Project, Select
10 getAttValue Returns the value of the attribute Att Project, Select
11 isElement(E) Returns true if E is an element, returns false otherwise Select
12 isText(T) Returns true if T is text, returns false otherwise Select
13 Unnest(List) Superscalar function that unnests a list Unnest
33Operators - Examples
elems
ltaccount_numbergt508-753-2352lt/account_numbergt ltbill_periodgt24 july 23 august, 2001lt/bill_periodgt .. .. ..
count
3
elems
Project elems getContents(invoice)
count
Groupby count count(itemized_call)
invoice
ltinvoicegt ltaccount_numbergt508-753-2352lt/account_numbergt ltbill_periodgt24 july 23 august, 2001lt/bill_periodgt .. .. lt/invoicegt
itemized_call
ltitemized_call gt lt/itemized_callgt
ltitemized_call gt lt/itemized_callgt
ltitemized_call gt lt/itemized_callgt
34Operators - Examples
entries
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
result
ltsummarygt ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt lt/summarygt
entries
result
Groupby entries aggXMLFrags(entry)
Project result cr8Elem(summary, Att, entries)
entry
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt
ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
entries
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
35Operator - Examples
elem
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
elem
Unnest elem unnest(elems)
elems
ltrategt DAY lt/rategt ltnumber_of_callsgt 20 lt/number_of_callsgt ltrategt NIGHT lt/rategt ltnumber_of_callsgt 23 lt/number_of_callsgt
36XML Query
result
Project result ltsummarygt entries lt/summarygt
entries
Groupby entries aggXMLFrags(entry)
entry
- User XQuery
- ltsummarygt
-
- FOR
- rate IN distinct(document(invoice)/invoice/item
ized_call_at_rate) - LET
- itemized_call document(invoice)/invoice/item
ized_call_at_raterate - WHERE
- itemized_call/_at_number_called LIKE 973
- RETURN
- ltrategtratelt/rategt
- ltnumber_of_callsgtcount(itemized_call)lt/number_of_
callsgt -
- lt/summarygt
XQGM
Project entry ltrategt rate lt/rategt
ltnumber_of_callsgt count lt/number_of_callsgt
rate
count
count
Join (Correlated)
Groupby count count(itemized_call)
itemized_call
Selection number LIKE 973
rate
itemized_call
Select distinct(rate)
Select rate irate
rate
itemized_call
number
irate
Navigate doc/invoice/itemized_call_at_rate
Navigate irate doc/invoice/itemized_call_at_rate
number doc/invoice/itemized_call_at_number_calle
d
doc
View document(invoice.xml)
37Navigation in XQGM
XQGM
account_number
account_number
Select getTagName(elem)account_number
Navigate invoice/account_number
elem
invoice
Unnest elem unnest(elems)
elems
Project elems getContents(invoice)
invoice
38Default XML View
- ltinvoicegt
- ltrowgt
- ltidgt 1 lt/idgt
- ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt - ltbill_periodgt Jun 9 Jun 8, 2000 lt/bill_periodgt
- lttotalgt0.35lt/totalgt
- lt/rowgt
- lt/invoicegt
- ltcarriergt
- ltrowgt
- ltinvoice_idgt 1 lt/invoice_idgt
- ltcarriergtSprintlt/carriergt
- lt/rowgt
- lt/carriergt
- ...
invoice
id account_number bill_period total
1 555 777-3158 573 234 3 Jun 9 Jun 8, 2000 0.35
carrier
invoice_id carrier
1 Sprint
itemized_call
invoice_id no date number_called time rate min amount
1 1 JUN 10 973 555-8888 1017pm NIGHT 1 0.05
1 2 JUN 13 973 650-2222 1019am DAY 1 0.15
1 3 JUN 15 206 365-9999 1025pm NIGHT 3 0.15
39User Defined XML View
ltinvoicegt ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt ltbill_periodgtJun 9 - Jul 8,
2000lt/bill_periodgt ltcarriergtSprintlt/carriergt
ltitemized_call no1 dateJUN 10
number_called973 555-8888 time1017pm
rateNIGHT min1 amount0.05 /gt
ltitemized_call no2 dateJUN 13
number_called973 650-2222 time1019pm
rateDAY min1 amount0.15 /gt
ltitemized_call no3 dateJUN 15
number_called206 365-9999 time1025pm
rateNIGHT min3 amount0.15 /gt
lttotalgt0.35lt/totalgt lt/invoicegt
Invoice
Id account_number bill_period total
1 555 777-3158 573 234 3 Jun 9 Jun 8, 2000 0.35
Carrier
Invoice_id Carrier
1 Sprint
Itemized_call
Invoice_id No Date Number_called Time Rate Min Amount
1 1 JUN 10 973 555-8888 1017pm NIGHT 1 0.05
1 2 JUN 13 973 650-2222 1019am DAY 1 0.15
1 3 JUN 15 206 365-9999 1025pm NIGHT 3 0.15
40User Defined XML View Cont.
- Create view invoice as (
- FOR
- invoice IN view(default)/invoice/row
- RETURN
- ltinvoicegt
- ltaccount_numbergtinvoice/account_numberlt/account_n
umbergt - ltbill_periodgtinvoice/bill_periodlt/bill_periodgt
- FOR
- carrier in view(default)/carrier/row
- WHERE
- carrier/invoice_id invoice/id
- RETURN
- ltcarriergtcarrierlt/carriergt
- FOR
- itemized_call in view(default)/itemized_call/ro
w - WHERE
- itemized_call/invoice_id invoice/id
- RETURN
- ltitemized_call noitemized_call/no
dateitemized_call/date number_calleditemized_c
all/number_called timeitemized_call/time
rateitemized_call/rate minitemized_call/min
amountitemized_call/amount /gt
41XML View XQGM
doc
Project doc ltinvoicegt ltaccount_numbergt
account_number lt/account_numbergt
ltbill_periodgtbill_periodlt/bill_periodgt carriers
itemized_calls lttotalgttotallt/totalgt lt/invoicegt
- Create view invoice as (
- FOR
- invoice IN view(default)/invoice/row
- RETURN
- ltinvoicegt
- ltaccount_numbergtinvoice/account_numberlt/account_n
umbergt - ltbill_periodgtinvoice/bill_periodlt/bill_periodgt
- FOR
- carrier in view(default)/carrier/row
- WHERE
- carrier/invoice_id invoice/id
- RETURN
- ltcarriergtcarrierlt/carriergt
- FOR
- itemized_call in view(default)/itemized_call/ro
w - WHERE
- itemized_call/invoice_id invoice/id
- RETURN
- ltitemized_call noitemized_call/no
dateitemized_call/date number_calleditemized_c
all/number_called timeitemized_call/time
rateitemized_call/rate minitemized_call/min
amountitemized_call/amount /gt
account_number
bill_period
total
items
carriers
Join (Correlated)
carriers
Groupby carrier aggXMLFrags(carrier_entry)
items
Subquery.
carrier_entry
Project carrier_entry ltcarriergtcarrierlt/carri
ergt
invoice_id
carrier
Table Carrier
carrier
Select invoice_id id
invoice_id
carrier
Table Carrier
id
account_number
bill_period
total
Table Invoice
42View Composition
- User Query XQGM User View XQGM
- To cancel out the Navigation operators
- By using the composition rules
cr8Elem(invoice, cr8AttList(), cr8XMLFragList(
cr8Elem(account_number, cr8AttList(), cr8XMLFra
gList(account_number)), cr8Elem(bill_period,
cr8AttList(), cr8XMLFragList(bill_period)),
carriers, items, cr8Elem(total,
cr8AttList(), cr8XMLFragList(total)) ) )
account_number
Select getTagName(elem)account_number
elem
Unnest elem unnest(elems)
elems
Project elems getContents(invoice)
invoice
4312 Composition Rules
Function COMPOSES WITH REDUCTION
1 getTagName cr8Elem(Tag, Atts, Clist) Tag
2 getAttributes Cr8Elem(Tag, Atts, Clist) Atts
3 getContents cr8Element(Tag, Atts, Clist) Clist
4 getAttName cr8Att(Name, Val) Name
5 getAttValue cr8Att(Name, Val) Val
6 isElement cr8Element(Tag, Atts, Clist) True
7 isElement Other than cr8Eleme False
8 isText PCDATA True
9 isText Other than PCDATA False
10 Unnest aggXMLFrags(C) C
11 Unnest cr8XMLFragList(C1, ..., Cn) C1 U ... U Cn
12 Unnest cr8AttList(A1, ..., An) A1 U ... U An
44View Composition Example
account_number
Select getTagName(elem)account_number
elem
account_number
Unnest elem unnest(elems)
Join (Correlated)
elems
Project elems getContents(invoice)
invoice
Project invoice ltinvoicegt ltaccount_numbergt
account_number lt/account_numbergt ltbill_periodgt
bill_period lt/bill_periodgt carriers
itemized_calls lttotalgt total lt/totalgt lt/invoicegt
account_number
bill_period
total
items
carriers
Join (Correlated)
45Computation Pushdown
- Goal XQGM ? SQLs Tagger Graph
- Step1 Query Decorrelation
- Correlated Join ? Out Unions
- Reference P. Seshadri, et. Al. Complex Query
Decorrelation, ICDE 1996. - Step2 Tagger Pull-Up
- XQGM ? Tagger Run-Time Graph
- Use Sorted Outer Union
- Reference J. Shanmugasundaram, et. Al.
Efficiently Publishing Relational Data as XML
Documents. - Separation of SQL and Tagger Operations
- Semantically equivalent fragment by pattern.
46Comparison
XPERANTO NIAGARA
Goal XQuery ? SQL XQuery ? Algebra
Algebra XQGM and Tagger Graph XML Algebra
Data Model Tables of a list of XML Fragments A collection of bags of vertices
Operators 10 operators with 13 functions 12 operators
Variable Binding Lot of temporary variables No variables.
Order Sensitive Semi-sensitive (missing orderby)
Regular Expression No Support at operator level Support at operator level
Text-in-context No Support Support
Level of abstraction Function level (lower) Logical level (higher)
Transition rules Composition rules (ad-hoc) 1 Semantically equivalent pattern (ad-hoc) Equivalent rules
Operation History Not maintained Maintained
47Conclusions and Future Work
- WE NEED OUR OWN ALGEBRA.
- More Reading
- David Beech, et. Al. A Formal Data Model and
Algebra for XML. - Mary Fernandez, et. Al. An Algebra for XML Query.