Title: XQuery to SQL by XAT
1XQuery to SQL by XAT
- Xin Zhang
- Thanks
- Brian, Mukesh, Maged, Lily, Elke
2Outline
- Merged algebra proposed based on
- Niagara
- XPERANTO
- One thorough example of
- XQuery ? SQL
3Data Model
- An Ordered Table in two dimensions
- Tuple order
- Column order.
- Every cell has its own domain
- Every column binds to one variable.
- The domain can be
- SQL domains.
- XML Fragment.
- Can be a list of XML elements.
- Comparison are done by values
4Data Model Examples
- Table of XML Fragments.
- Explicit Naming
- E.g. variable bindings
- Implicit Naming
- E.g. XPath notations.
- Reduce complexity of many internal variables.
invoice_id carrier
ltcarriergt carrier lt/carrier
carrier_entry
ltcarriergt carrier lt/carrier ltcarriergt
carrier lt/carriergt .
carriers
rate
/ /invoice /invoice/account_number
5Naming of Columns
- Implicit
- SQL operators
- Navigate
- Explicit ( name)
- Variable binding
- Holding a set of values.
- Variable name (name) is name of a column
- Rename
- Distinguish in one operator where,
- same names from different sources
- Abbreviate a very long name.
- Create a new name for creation operators
- Need to used with those operators.
- E.g. Tagger
6Operators
- SQL like (9)
- Project, Select, Join (Theta, Outer, Semi),
Groupby, Orderby, Union (Node, Outer), COp. - XML like (4)
- Tagger, Navigate, is(Element, Text), Aggregate.
- Special
- SQL, Function, Source, Name, FOR
7SQL like Operators (9)
Niagara XPERANTO
Project Expose Project
Select Select Select
Theta Join Join Theta Join
Outer Join N/A Outer Join
Semi Join N/A N/A
Groupby Group Groupby
Orderby N/A Orderby
Union Union Union
Outer Union Union Outer Union
COp N/A Correlated Join
8XML like Operators
Niagara XPERANTO
Tagger (pattern) Vertex Project cr8(Elem, AttList, Att, XMLFragList),
Navigate (from, path) Follow Project get(TagName, Attributes, Contents, AttName, AttValue), Unnest
Is N/A Select is(Element, Text),
Aggregate Group AggXMLFrags
9Special Operators
Niagara XPERANTO Description
SQL N/A Input Denote a SQL query.
Function N/A Function Used to represent recursive query
Source Source Table, View Identify a data source.
Name Rename N/A Naming of columns.
FOR N/A N/A FOR iteration.
10Operator Specification
- Description
- Input Specification.
- Output Specification.
- Logic description.
- Illustrative Example
11Naming Operator
- Syntax
- Name(from_name, to_name)
- Simplified Syntax
- to_name from_name
12Steps in Translation
- XQuery ? XML Algebra Tree
- User View ? XML Algebra Tree
- View Composition
- Computation Pushdown
- Optimization
13Example of Telephone Bill
lt?xml version1.0 encodingUS-ASCII ?gt
lt!DOCTYPE invoice lt!ELEMENT invoice
(account_number, bill_period, carrier,
itemized_call, total)gt lt!ELEMENT
account_number (PCDATA)gt lt!ELEMENT bill_period
(PCDATA)gt lt!ELEMENT carrier (PCDATA)gt lt!ELEMENT
itemized_call EMPTYgt lt!ATTLIST itemized_call no
ID REQUIRED date CDATA REQUIRED number_called
CDATA REQUIRED time CDATA REQUIRED rate
(NIGHTDAY) REQUIRED min CDATA REQUIRED amount
CDATA REQUIREDgt lt!ELEMENT total (PCDATA)gt gt
ltinvoicegt ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt ltbill_periodgtJun 9 - Jul 8,
2000lt/bill_periodgt ltcarriergtSprintlt/carriergt
ltitemized_call no1 dateJUN 10
number_called973 555-8888 time1017pm
rateNIGHT min1 amount0.05 /gt
ltitemized_call no2 dateJUN 13
number_called973 650-2222 time1019pm
rateDAY min1 amount0.15 /gt
ltitemized_call no3 dateJUN 15
number_called206 365-9999 time1025pm
rateNIGHT min3 amount0.15 /gt
lttotalgt0.35lt/totalgt lt/invoicegt
14Example XQuery
- User XQuery
- ltsummarygt
-
- FOR
- rate IN distinct(document(invoice)/invoice/item
ized_call_at_rate) - LET
- itemized_call document(invoice)/invoice/item
ized_call_at_raterate - WHERE
- itemized_call/_at_number_called LIKE 973
- RETURN
- ltrategtratelt/rategt
- ltnumber_of_callsgtcount(itemized_call)lt/number_of_
callsgt -
- lt/summarygt
Count number of itemized_calls in calling area
973 grouped by the calling rate.
15XQuery ? XML Algebra Tree
- Divide into query blocks
- Convert each query block into XML Algebra Tree
(XAT). - Identify Correlated Operators
- Combine into one XML Algebra Tree.
- Query decorrelation
16Query Blocks
The block identification is arbitrary (wrong).
B1
- User XQuery
- ltsummarygt
-
- FOR
- rate IN distinct(document(invoice)/invoice/item
ized_call_at_rate) - LET
- itemized_call document(invoice)/invoice/item
ized_call_at_raterate - WHERE
- itemized_call/_at_number_called LIKE 973
- RETURN
- ltrategtratelt/rategt
- ltnumber_of_callsgtcount(itemized_call)lt/number_of_
callsgt -
- lt/summarygt
B2
B3
B1 Construct summary from the result from B2 B2
Get all the distinct rate and iterate through it.
B3 Count itemized call for a given rate.
17XAT of B1
XAT
B1
Name(Tagger(ltsummarygtV1lt/summarygt), V2)
ltsummarygt lt/summarygt
B2
Tagger(ltsummarygt V1 lt/summarygt)
B2
V2 it is a name instead of a part of pattern.
18XAT of B2
Aggregate
XAT
- B3
-
- FOR
- rate IN distinct(document(invoice)/invoice/item
ized_call_at_rate)
FOR(rate)
B3
Name(distinct(invoice/itemized_call/_at_rate/),
rate)
B3
Select(distinct(invoice/itemized_call/_at_rate/))
Navigate(/, invoice/itemized_call/_at_rate)
Source(invoice.xml)
19XAT of B3
XAT
- B4
- LET
- itemized_call document(invoice)
/invoice/itemized_call _at_raterate - WHERE
- itemized_call /_at_number_called LIKE 973
- RETURN
- ltrategtratelt/rategt
- ltnumber_of_callsgt count(itemized_call)
lt/number_of_callsgt
Select(_at_rateitemized_call rate)
B2
Navigate(itemized_call, _at_rate)
Name(invoice/itemized_call/, itemized_call)
Navigate(/, invoice/itemized_call)
Source(invoice.xml)
20XAT of B3 (Cont.)
XAT
- B4
- LET
- itemized_call document(invoice)
/invoice/itemized_call _at_raterate - WHERE
- itemized_call /_at_number_called LIKE 973
- RETURN
- ltrategtratelt/rategt
- ltnumber_of_callsgt count(itemized_call)
lt/number_of_callsgt
Select(_at_number_calleditemized_call like
973)
Navigate(itemized_call, _at_number_called)
21XAT of B3 (Cont.)
XAT
- B4
- LET
- itemized_call document(invoice)
/invoice/itemized_call _at_raterate - WHERE
- itemized_call /_at_number_called LIKE 973
- RETURN
- ltrategtratelt/rategt
- ltnumber_of_callsgt count(itemized_call)
lt/number_of_callsgt
Name(Tagger(ltrategtratelt/rategt
ltnumber_of_callsgt count(itemized_call)lt/number_
of_callsgt), V1)
Tagger(ltrategtratelt/rategt ltnumber_of_callsgt c
ount(itemized_call) lt/number_of_callsgt)
B2
Select(count(itemized_call))
22Put it Together
B1
B3
Name(Tagger(ltsummarygtV1lt/summarygt), V2)
Tagger(ltsummarygtV1 lt/summarygt)
Name(Tagger(ltrategtratelt/rategt
ltnumber_of_callsgt count(itemized_call)lt/number_
of_callsgt), V1)
Tagger(ltrategtratelt/rategtltnumber_of_callsgt coun
t(itemized_call)lt/number_of_callsgt)
Aggregate()
Select(count(itemized_call))
FOR(rate)
Select(_at_number_calleditemized_call like
973)
Name(distinct(invoice/itemized_call/_at_rate/),
rate)
Navigate(itemized_call, _at_number_called)
Select(distinct(invoice/itemized_call/_at_rate/))
Select(_at_rateitemized_call rate)
Navigate(itemized_call, _at_rate)
Navigate(/, invoice/itemized_call/_at_rate)
Name(invoice/itemized_call/, itemized_call)
Source(invoice.xml)
Navigate(/, invoice/itemized_call)
B2
Source(invoice.xml)
23Syntax Suger
B1
B3
V1Tagger(ltrategtratelt/rategtltnumber_of_callsgt
count(itemized_call)lt/number_of_callsgt)
V2 Tagger(ltsummarygtV1 lt/summarygt)
Select(count(itemized_call))
Aggregate()
Select(_at_number_calleditemized_call like
973)
FOR(rate)
Navigate(itemized_call, _at_number_called)
rate Select(distinct(invoice/itemized_call/_at_r
ate/))
Select(_at_rateitemized_call rate)
Navigate(/, invoice/itemized_call/_at_rate)
Navigate(itemized_call, _at_rate)
Source(invoice.xml)
itemized_call Navigate(/,
invoice/itemized_call)
Source(invoice.xml)
B2
24Query Decorrelation for COp
- Top-down approach over XAT Tree.
- Approach
- Correlated Binding (CB)
- Op1COp(CB, Op2)Op3Correlated OperatorA,B
? Op1ROJ(CB)Op2Groupby(CB, Op3)
OperatorCartesianA,B, B - For example
- Correlated Join ? Outer Join with Groupby with
Cartesian
25Query Decorrelation for FOR
- Top-down approach over XAT Tree.
- Approach
- Correlated Binding (CB)
- Op1FOR(CB)Op2Correlated OperatorA,B ?
Op1Groupby(CB, Op2) OperatorCartesianA,B
- Differences
- SQL Decorrelation Return Outer Query
- XQuery Decorrelation Return Inner Query
- CO Return both Outer/Inner Query
26FOR Decorrelation Example
B2
B1
B1
B2
Aggregate
Aggregate
FOR(rate)
1
Groupby(ratel, )
2
Source(invoice.xml)
1
2
B3
Select(_at_rateitemized_call rate)
Source(invoice.xml)
Select(_at_rateitemized_call rate)
Cartesian
3
3
B3
Source(invoice.xml)
Source(invoice.xml)
27Default XML View
- ltinvoicegt
- ltrowgt
- ltidgt 1 lt/idgt
- ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt - ltbill_periodgt Jun 9 Jun 8, 2000 lt/bill_periodgt
- lttotalgt0.35lt/totalgt
- lt/rowgt
- lt/invoicegt
- ltcarriergt
- ltrowgt
- ltinvoice_idgt 1 lt/invoice_idgt
- ltcarriergtSprintlt/carriergt
- lt/rowgt
- lt/carriergt
- ...
invoice
id account_number bill_period total
1 555 777-3158 573 234 3 Jun 9 Jun 8, 2000 0.35
carrier
invoice_id carrier
1 Sprint
itemized_call
invoice_id no date number_called time rate min amount
1 1 JUN 10 973 555-8888 1017pm NIGHT 1 0.05
1 2 JUN 13 973 650-2222 1019am DAY 1 0.15
1 3 JUN 15 206 365-9999 1025pm NIGHT 3 0.15
28User Defined XML View
- ltinvoicegt
- ltrowgt
- ltidgt 1 lt/idgt
- ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt - ltbill_periodgt Jun 9 Jun 8, 2000 lt/bill_periodgt
- lttotalgt0.35lt/totalgt
- lt/rowgt
- lt/invoicegt
- ltcarriergt
- ltrowgt
- ltinvoice_idgt 1 lt/invoice_idgt
- ltcarriergtSprintlt/carriergt
- lt/rowgt
- lt/carriergt
- ltitemized_callgt
- ltrowgt
- ltinvoice_idgt1lt/invoice_idgt
- ltnogt1lt/nogt
-
ltinvoicegt ltaccount_numbergt555 777-3158 573 234
3lt/account_numbergt ltbill_periodgtJun 9 - Jul 8,
2000lt/bill_periodgt ltcarriergtSprintlt/carriergt
ltitemized_call no1 dateJUN 10
number_called973 555-8888 time1017pm
rateNIGHT min1 amount0.05 /gt
ltitemized_call no2 dateJUN 13
number_called973 650-2222 time1019pm
rateDAY min1 amount0.15 /gt
ltitemized_call no3 dateJUN 15
number_called206 365-9999 time1025pm
rateNIGHT min3 amount0.15 /gt
lttotalgt0.35lt/totalgt lt/invoicegt
29User Defined XML View Cont.
- Create view invoice as (
- FOR
- invoice IN view(default)/invoice/row
- RETURN
- ltinvoicegt
- ltaccount_numbergtinvoice/account_number/text()lt/ac
count_numbergt - ltbill_periodgtinvoice/bill_period/text()lt/bill_per
iodgt - FOR
- carrier in view(default)/carrier/row
- WHERE
- carrier/invoice_id invoice/id
- RETURN
- ltcarriergtcarrier/carrier/text()lt/carriergt
- FOR
- itemized_call in view(default)/itemized_call/ro
w - WHERE
- itemized_call/invoice_id invoice/id
- RETURN
- ltitemized_call noitemized_call/no/text()
dateitemized_call/date/text()
number_calleditemized_call/number_called/text()
timeitemized_call/time/text()
rateitemized_call/rate/text()
minitemized_call/min/text() amountitemized_cal
l/amount /text()gt
30User Defined XML View Block
- Create view invoice as (
- FOR
- invoice IN view(default)/invoice/row
- RETURN
- ltinvoicegt
- ltaccount_numbergtinvoice/account_number/text()lt/ac
count_numbergt - ltbill_periodgtinvoice/bill_period/text()lt/bill_per
iodgt - FOR
- carrier in view(default)/carrier/row
- WHERE
- carrier/invoice_id invoice/id
- RETURN
- ltcarriergtcarrier/carrier/text()lt/carriergt
- FOR
- itemized_call in view(default)/itemized_call/ro
w - WHERE
- itemized_call/invoice_id invoice/id
- RETURN
- ltitemized_call noitemized_call/no/text()
dateitemized_call/date/text()
number_calleditemized_call/number_called/text()
timeitemized_call/time/text()
rateitemized_call/rate/text()
minitemized_call/min/text() amountitemized_cal
l/amount /text()gt
B4
B5
B6
31XML View XAT
Aggregate()
Aggregate()
V3 Tagger(ltitemized_call no
datenumber_called time rate min
amount/gt
V4 Tagger(ltinvoicegt ltaccount_numbergtinvoice/a
ccount_number/text()lt/account_numbergt ltbill_perio
dgtinvoice/bill_period/text()lt/bill_periodgt V3
lttotalgtinvoice/total/text()lt/totalgtlt/invoicegt)
Navigate(itemized_call, no/text())
Navigate(itemized_call, amount/text())
FOR(invoice/id)
Select(itemized_call/invoice_idinvoice/id)
Navigate(invoice, id)
Navigate(itemized_call, invoice_id)
invoice Navigate(/,invoice/row )
B5
itemized_call Navigate(/, itemized_call/row)
Source(default..xml)
Source(default.xml)
323-Way Correlation
B4
1
FOR(invoice/id)
2
Source(invoice.xml)
B5
B6
333-Way Decorrelation
B4
1
JOIN(invoice/id)
GB(invoice/id, )
GB(invoice/id, )
B5 with Cartesian
B6 with Cartesian
2
2
Source(default.xml)
Source(default.xml)
34View XAT After Decorrelation
Groupby(invoice/id, Aggregate())
Aggregate()
V3 Tagger(ltitemized_call no
datenumber_called time rate min
amount/gt
V4 Tagger(ltinvoicegt ltaccount_numbergtinvoice/a
ccount_number/text()lt/account_numbergt ltbill_perio
dgtinvoice/bill_period/text()lt/bill_periodgt V3
lttotalgtinvoice/total/text()lt/totalgtlt/invoicegt)
Navigate(itemized_call, no/text())
Join(invoice/id)
Navigate(itemized_call, amount/text())
Join(itemized_call/invoice_idinvoice/id)
Groupby(invoice/id)
Navigate(invoice, id)
B5
Navigate(itemized_call, invoice_id)
Navigate(invoice, id)
itemized_call Navigate(/, itemized_call/row)
invoice Navigate(/,invoice/row )
invoice Navigate(/,invoice/row )
Source(default.xml)
Source(default..xml)
Source(default..xml)
35View Composition
- Input
- User Query XAT User View XAT
- Output
- Simplified composite XAT
- Approach
- XAT Cutting Remove un-referenced columns and
operators. - Pushdown Navigation
- By using the commutative rules
- Cancel out the navigation operators
- By using the composition rules
36XAT Cutting
- Cut Query Blocks
- User query only require itemized_call.
- B5 is cut,
- Invoice is cut
- B4 is simplified.
- B6 is simplified.
- Cut Columns
- User query only used itemized_call_at_rate.
37View XAT After B5 is Cut.
Groupby(invoice/id, Aggregate())
Aggregate()
V3 Tagger(ltitemized_call no
datenumber_called time rate min
amount/gt
V4 Tagger(ltinvoicegt ltaccount_numbergtinvoice/a
ccount_number/text()lt/account_numbergt ltbill_perio
dgtinvoice/bill_period/text()lt/bill_periodV3 ltt
otalgtinvoice/total/text()lt/totalgtlt/invoicegt)
Navigate(itemized_call, no/text())
Navigate(itemized_call, amount/text())
Join(itemized_call/invoice_idinvoice/id)
Navigate(itemized_call, invoice_id)
Navigate(invoice, id)
itemized_call Navigate(/, itemized_call/row)
invoice Navigate(/,invoice/row )
Source(default.xml)
Source(default..xml)
38View After Columns are Cut.
Aggregate()
V4 Tagger(ltinvoicegtV3 lt/invoicegt)
Groupby(invoice/id, Aggregate())
V3 Tagger(ltitemized_call number_called
rate /gt
Navigate(itemized_call, number_called/text())
Navigate(itemized_call, rate/text())
Join(itemized_call/invoice_idinvoice/id)
Navigate(itemized_call, invoice_id)
Navigate(invoice, id)
invoice Navigate(/,invoice/row )
itemized_call Navigate(/, itemized_call/row)
Source(default..xml)
Source(default.xml)
39Navigation Cancel Out
- Navigation Pushdown
- Based on some transformation rules.
- E.g. commutative of navigation and other
operators. - Navigation Tagger Cancel Out
- Composition Rules.
- The cancellation result is renaming
40Query XAT Navi. Pushdown
B3
V1Tagger(ltrategtratelt/rategtltnumber_of_callsgt
count(itemized_call)lt/number_of_callsgt)
V1Tagger(ltrategtratelt/rategtltnumber_of_callsgt
count(itemized_call)lt/number_of_callsgt)
Select(count(itemized_call))
Select(count(itemized_call))
Select(_at_number_calleditemized_call like
973)
Select(_at_number_calleditemized_call like
973)
Navigate(itemized_call, _at_number_called)
Select(_at_rateitemized_call rate)
Select(_at_rateitemized_call rate)
Navigate(itemized_call, _at_number_called)
Navigate(itemized_call, _at_rate)
Navigate(itemized_call, _at_rate)
itemized_call Navigate(/,
invoice/itemized_call)
itemized_call Navigate(/,
invoice/itemized_call)
Source(invoice.xml)
Source(invoice.xml)
41Navi. Tagger Cancel Out
B3
1
Aggregate()
Navigate(itemized_call, _at_number_called)
V4 Tagger(ltinvoicegtV3 lt/invoicegt)
Navigate(itemized_call, _at_rate)
Groupby(invoice/id, Aggregate())
itemized_call Navigate(/,
invoice/itemized_call)
V3 Tagger(ltitemized_call number_called
rate /gt
Source(invoice.xml)
Navigate(itemized_call, number_called/text())
Navigate(itemized_call, rate/text())
2
42The Result of Cancel Out
1
Itemized_call_at_number_called
Navigate(itemized_call, number_called/text())
itemized_call_at_rate Navigate(itemized_call,
rate/text())
2
43Computation Pushdown
- Goal XAT ? SQL operators XML operators
- Step 0 Navigation Pushdown.
- Step 1 XML Default View ? SQL Operators
- Renaming columns
- Step 2 SQL Computation Pushdown.
- By commutative and composition rules.
- E.g predicates pushdown.
44Navigation Pushdown.
Itemized_call_at_number_called
Navigate(itemized_call, number_called/text())
Join(itemized_call/invoice_idinvoice/id)
itemized_call_at_rate Navigate(itemized_call,
rate/text())
Itemized_call_at_number_called
Navigate(itemized_call, number_called/text())
Join(itemized_call/invoice_idinvoice/id)
itemized_call_at_rate Navigate(itemized_call,
rate/text())
Navigate(itemized_call, invoice_id)
itemized_call Navigate(/, itemized_call/row)
itemized_call Navigate(/, itemized_call/row)
Navigate(invoice, id)
Source(default.xml)
invoice Navigate(/,invoice/row )
Source(default.xml)
Source(default..xml)
45XML Default View ? SQL
Itemized_call_at_number_called
Navigate(itemized_call, number_called/text())
itemized_call_at_rate Navigate(itemized_call,
rate/text())
itemized_call Navigate(/, itemized_call/row)
Itemized_call_at_number_called number_called
Source(default.xml)
itemized_call_at_rate rate
Project(rate, number_called)
Source(itemized_call)
46Computation Pushdown
B3
V1Tagger(ltrategtratelt/rategtltnumber_of_callsgt
count(itemized_call)lt/number_of_callsgt)
V1Tagger(ltrategtratelt/rategtltnumber_of_callsgt
count(itemized_call)lt/number_of_callsgt)
Select(count(itemized_call))
Select(count(itemized_call))
Select(_at_number_calleditemized_call like
973)
Select(_at_rateitemized_call rate)
Select(_at_rateitemized_call rate)
Select(_at_number_calleditemized_call like
973)
A SQL Block
A SQL Block
47Result of the Transformation
Tagger(ltsummarygtV1lt/summarygt)
V1 Aggregate
Tagger(ltrategtratelt/rategt ltnumber_of_callsgtcount
()lt/number_of_callsgt)
SQL SELECT rate, count() FROM itemized_call,
invoice WHERE number_called LIKE 973 AND
invoice.id itemized_call.invoice_id GROUPBY rate
48Optimization
- Efficient Publishing XML Views
- Sorted Outer Union.
- Special Tagger implementation
- A lot More!
49Summary
- XQuery ? XAT
- Query Block Identification
- Query Decorrelation
- View Composition
- XAT Cutting
- Navigation Pushdown
- Navigation Cancel Out
- Computation Pushdown
- Navigation Pushdown
- XML Default View ? SQL Operators
- Computation Pushdown
- Optimization