Speeding up Nested Iteration Exploiting Parameter Sort Orders PowerPoint PPT Presentation

presentation player overlay
1 / 39
About This Presentation
Transcript and Presenter's Notes

Title: Speeding up Nested Iteration Exploiting Parameter Sort Orders


1
Speeding up Nested Iteration Exploiting Parameter
Sort Orders
  • Ravindra N. Guravannavar
  • Indian Institute of Technology Bombay
  • Advisor
  • Prof. S. Sudarshan

2
Motivation
  • Complex nested queries are commonly used
  • Queries using expensive user-defined functions
    (UDFs) lead to nested iteration
  • Queries using nested constructs and UDFs are
    sometimes more natural to formulate and easy to
    read than equivalent flat queries

3
Example 1 A Simple Nested Query
  • List the orders none of whose line items have the
    shipping address same as the
  • default shipping address of the order
  • SELECT PO.order_id
  • FROM PURCHASEORDER PO
  • WHERE default_ship_to NOT IN (
  • SELECT ship_to
  • FROM ORDERITEM OI
  • WHERE OI.order_id PO.order_id)
  • - Nested Iteration

ORDERITEM
PURCHASEORDER
4
Example 2 A Nested Aggregate Query
  • List the days on which the sales exceeded the
  • maximum daily sales seen in the past
  • SELECT day, sales
  • FROM DAILYSALES DS1
  • WHERE sales gt (SELECT MAX(sales)
  • FROM DAILYSALES DS2
  • WHERE DS2.day lt DS1.day)

5
Example 3 A Query Invoking a UDF
  • Find the turn-around time for high priority
    orders
  • SELECT orderid, TurnaroundTime(orderid,
    totalprice, orderdate)
  • FROM ORDERS WHERE order_priorityHIGH
  • DEFINE TurnaroundTime(_at_orderid, _at_totalprice,
    _at_orderdate)
  • IF (_at_totalprice lt 2000)
  • SELECT max(L.shipdate _at_orderdate) FROM
    LINEITEM L
  • WHERE L.orderid_at_orderid
  • ELSE
  • SELECT MAX(L.commitdate _at_orderdate)
  • FROM LINEITEM L WHERE
    L.orderid_at_orderid
  • END

6
Representing Nested Queries
A The Apply Operator Operation between the
outer tuple and result of the inner block
7
Evaluation based on Tuple Iteration
  • Inner subquery is evaluated for each outer tuple
  • Cost Cost(OuterBlock) nCost(InnerBlock)
    Where n tuples in the result of outer block
  • Optimizations Proposed in System R
  • Cache the inner subquery result for each distinct
    correlation binding
  • Sort the outer tuples so as to be able to cache a
    single result at any given time

8
Decorrelated Evaluation
  • Rewrite nested query as an equivalent flat query
  • Allows the choice of set-oriented evaluation
    plans such as hash and merge-join

9
Decorrelation Example
  • Original Query
  • SELECT PO.order_id, PO.order_date
  • FROM PURCHASEORDER PO
  • WHERE default_ship_to IN ( SELECT ship_to
  • FROM ORDERITEM OI
  • WHERE OI.order_id PO.order_id)
  • Decorrelated Query
  • SELECT PO.order_id, PO.order_date
  • FROM PURCHASEORDER PO, ORDERITEM OI
  • WHERE PO.order_idOI.order_id AND
  • PO.default_ship_toOI.ship_to
  • Queries are not equivalent when duplicates are
    present

10
Problems with Decorrelation
  • Not always possible
  • E.g., NOT IN predicate
  • Many cases need duplicate elimination, an extra
    outer-join or materialization
  • IN predicate
  • COUNT aggregate in the inner sub-query
  • Non-equality correlation predicates
  • May not be applicable to UDFs unless their
    structure is very simple

11
Our Approach
  • Optimize nested queries keeping their structure
    intact
  • Exploit properties of parameters (such as sort
    order, prior predicates applied etc.) to
    efficiently evaluate the inner sub-query
  • Produces better plans than decorrelation for
    certain types of queries
  • More generic and can be applied to a wider class
    of queries (e.g., Queries invoking complex UDFs)

12
Benefits of Sorting Outer Tuples
  • For ti ? t1, t2, t3, tn do
  • inRes Ø
  • For ui ? u1, u2, u3, um do
  • if (pred1(ti ,ui))
  • Add to ui to inRes
  • done
  • process(ti ,inRes)
  • done
  • Sorting allows caching of a single inner result
    (System R)
  • Advantageous buffer effects (Graefe)
  • A clustered index scan in the inner block will
    access each block at most once irrespective of
    the buffer replacement policy
  • State retaining operators

13
Benefits of Sorting for a Clustered Index
  • Case-1
  • Keys 50, 500,400,80,600,200
  • Potential data block fetches6
  • Assume a single data block
  • can be held in memory
  • Random I/O
  • Case-2
  • Keys 50,80,200,400,500,600
  • Data block fetches3
  • Sequential I/O

400
400
500
600
50
80
200
Data Block-1
Data Block-2
Data Block-3
14
Restartable Table Scan
  • SELECT PO.order_id
  • FROM PURCHASEORDER PO
  • WHERE default_ship_to NOT IN (
  • SELECT ship_to
  • FROM ORDERITEM OI
  • WHERE OI.order_id PO.order_id)

PURCHASEORDER
ORDERITEM
15
Incremental Computation of Aggregates
  • SELECT day, sales
  • FROM DAILYSALES DS1
  • WHERE sales gt (SELECT MAX(sales)
  • FROM DAILYSALES DS2
  • WHERE DS2.day lt DS1.day)
  • Aggregates that can be computed incrementally
  • SUM, COUNT, MAX, MIN, AVG

16
Extensions to the Volcano Optimizer
  • We propose the following extensions to the
  • Volcano optimizer
  • Consider interesting parameter sort orders for
    queries containing external parameters
  • Consider the total cost (cost for n evaluations)
    for queries executed in a loop

17
Extensions to the Volcano Optimizer
  • Contract of the original algorithm for
    optimization
  • Plan FindBestPlan(Expr e, PhysProp rpp, Cost cl)
  • Contract of the modified algorithm for
    optimization
  • Plan FindBestPlan(Expr e, PhysProp rpp, Cost c,
  • Order pso, int callCount)
  • Plans generated and cached for lte, rpp, pso,
    callCountgt
  • Not all possible orderings of the parameters are
    valid
  • Parameter Sort Order (a1, a2, an) is valid iff
    level(ai) lt level(aj) for all i, j s.t. i lt j.
  • Not all valid orders may be interesting (we
    consider only valid, interesting parameter sort
    orders)

18
The Basic Idea
A
Required Output Sort Order
Interesting Parameter Sort Order
A
Query Block-1
Binds a, b
Query Block-2
Query Block-3
Binds c Uses a,b
Uses a, b, c
19
Sort Order Propagation for a Multi-Level
Multi-Branch Expression
20
Search for the Best Plan
  • The best plan is computed recursively by adding
    the cost of each physical operator to the cost of
    the best plans for its inputs and retaining the
    cheapest combination.
  • At the Apply operator, the cost of the inner
    (right) sub-expression is obtained for n
    iterations where n is the estimated number of
    distinct correlation values in the left (bind)
    sub-expression
  • Memorization of best plans is done for
  • ltexpr, o/p phy prop, call count, i/p param sort
    ordergt
  • Operator cost function will take call count and
    parameter sort order as inputs

21
UDFs with Procedural Constructs
  • DEFINE fn(p1, p2, pn) AS
  • BEGIN
  • fnQ1 ltp1, p2gt
  • fnQ2 ltp1, p2, p3gt
  • if (condition)
  • fnQ3ltp2gt
  • else
  • fnQ4ltp3gt
  • OPEN CURSOR ON fnQ5ltp2, p3gt // Binds v1, v2
  • LOOP
  • fnQ6ltp1, p2, v1, v2gt
  • END LOOP
  • END

Exactly-Once Query
At-Most-Once Query
Query Inside Cursor Loop
22
Query with UDF Modeled as Nested Query
A
A
fnQ1
Qi
fnQ2
fnQ3
fnQ4
fnQ5
fnQ6
23
Experiments
Query 1 A NOT IN query with no outer
predicates SELECT PO.order_id FROM
PURCHASEORDER PO WHERE default_ship_to NOT IN
(SELECT ship_to FROM ORDERITEM OI WHERE
OI.order_id PO.order_id)
24
Experiments (Contd.)
Query 2 A Nested Aggregate Query with
Non-Equality Corrl. Predicate SELECT day, sales
FROM DAILYSALES DS1 WHERE sales gt (SELECT
MAX(sales) FROM DAILYSALES DS2 WHERE DS2.day lt
DS1.day)
25
Experiments (Contd.)
Query 3 TPC-H MIN COST Supplier Query
26
Conclusion
  • Nested iteration cannot be completely avoided and
    may be quite efficient if implemented properly
  • State-retention techniques can perform better
    than decorrelation when the selectivity of the
    outer predicates is low
  • We showed how a cost-based optimizer can be
    extended to consider interesting parameter sort
    orders

27
Future Work
  • Analyzing function body to find expected
    execution count for each query
  • Study of parameter properties other than sort
    order that would be interesting to nested queries
    and functions
  • Implementing the proposed Optimizer extensions
    and studying its performance

28
References
  • Won Kim. On Optimizing an SQL-like Nested Query.
    In ACM Trans on Database Systems, 1982
  • Ganski and Wong. Optimization of Nested SQL
    Queries Revisited. In ACM SIGMOD, 1987
  • U. Dayal. Of Nests and Trees A Unified Approach
    to Processing Queries that Contain Nested
    Subqueries, Aggregates and Quantifiers, In VLDB,
    1987
  • P. Seshadri et al. Complex Query Decorrelation.
    In ICDE, 1996
  • G. Graefe. Executing Nested Queries. In DBTW,
    2003
  • G. Graefe and McKenna. The Volcano Optimizer
    Generator Extensibility and Efficient Search. In
    ICDE 1993
  • Selinger et al. Access Path Selection in a
    Relational Database Management System. In ACM
    SIGMOD, 1979
  • P. Roy, Multi-Query Optimization and
    Applications, Ph.D. Thesis, IIT Bombay, 2000.

29
Questions?
30
Extra Slides
31
Physical Plan Space Generation
  • PhysEqNode PhysDAGGen(LogEQNode e, PhyProp p,
    ParamSortOrder s)
  • If a physical equivalence node np exists for e,
    p, s
  • return np
  • Create an equivalence node np for e, p, s
  • For each logical operation node o below e
  • If(o is an instance of ApplyOp)
  • ProcApplyNode(o, s, np)
  • else
  • ProcLogOpNode(o, p, s, np)
  • For each enforcer f that generates property p
  • Create an enforcer node of under np
  • Set the input of of PhysDAGGen(e, null, s)
  • return np
  • End

32
Processing a Non-Apply Node
  • void ProcLogOpNode(LogOpNode o, PhysProp
    p,ParamSortOrder s,
  • PhysEqNode np)
  • For each algorithm a for o that guarantees p and
  • requires no stronger sort order than s
  • Create an algorithm node oa under np
  • For each input i of oa
  • Let oi be the i th input of oa
  • Let pi be the physical property required
  • from input i by algorithm a
  • Set input i of oa PhysDAGGen(oi, pi, s)
  • End

33
Processing the Apply Node
  • void ProcApplyNode(LogOpNode o, ParamSortOrder s,
    PhysEqNode np)
  • Initialize i_ords to be an empty set or sort
    orders
  • For each use expression u under o
  • uOrds GetInterestingOrders(u)
  • i_ords i_ords Union uOrds
  • l_ords GetLocalOrders(i ords, o.bindInput)
  • For each order ord in l_ords and null
  • leq PhysDAGGen(lop.bindInput, ord, s)
  • Let newOrd concat(s, ord)
  • applyOp create new applyPhysOp(o.TYPE)
  • applyOp.lchild leq
  • For each use expression u of o
  • ueq PhysDAGGen(u, null, newOrd)
  • Add ueq as a child node of applyOp
  • np.addChild(applyOp)
  • End

34
Generating Interesting Parameter Orders
  • SetltOrdergt GetInterestingOrders(LogEqNode e)
  • if the set of interesting orders i_ords for e
    is already found
  • return i_ords
  • Create an empty set result of sort orders
  • for each logical operation node o under e
  • for each algorithm a for o
  • Let sa be the sort order of interest
  • to a on the unbound parameters in e
  • if sa is a valid order and sa is not
    in result
  • Add sa to result
  • for each input logical equivalence
    node ei of a
  • childOrd GetInterestingOrders(ei
    )
  • if (o is an Apply operator AND ei
    is a use input)
  • childOrd GetAncestorOrders(c
    hildOrd, o.bindInput)
  • result result Union childOrd
  • return result
  • End

35
Extracting Ancestor Orders
  • SetltOrdergt GetAncestorOrders(SetltOrdergt i_ords,
    LogEqNode e)
  • Initialize l_ords to be an empty set of
    sort orders
  • for each order ord in i_ords
  • newOrd Empty vector
  • for (i 1 i ltlength(ord) i i
    1)
  • if ordi is NOT bound by e
  • append(ordi, newOrd)
  • else
  • break
  • add newOrd to l_ords
  • return l_ords
  • End

36
Extracting Local Orders
  • SetltOrdergt GetLocalOrders(SetltOrdergt i_ords,
    LogEqNode e)
  • Initialize l_ords to be an empty set or sort
    orders
  • For each ord in i_ords
  • newOrd Empty vector
  • For (i length(ord) i gt 0 i i 1 )
  • If ordi is bound by e
  • prepend(ordi, newOrd)
  • Else
  • break
  • add newOrd to l_ords
  • return l_ords
  • End

37
Decorrelation Example
  • Original
  • SELECT ename, sal FROM EMP E1 WHERE age lt 40 AND
  • sal (SELECT MAX(sal) FROM EMP E2 WHERE
  • E2.depidE1.depid)
  • Decorrelated
  • Temp(C1, C2) SELECT depid as C1, max(sal) as C2
    FROM EMP
  • GROUP BY EMP.depid
  • SELECT ename, sal FROM EMP E, Temp T WHERE E.age
    lt 40
  • AND E.depidT.C1 AND E.salT.C2

38
Optimization Steps
  • Logical Plan Space LQDAG
  • Physical Plan Space PQDAG

39
An Example LQDAG
  • Expression
  • s(R3) Apply pf()(s(R4))
  • The function f() uses
  • parameters a, b, c
Write a Comment
User Comments (0)
About PowerShow.com