Title: Multi-Query Optimization
1Multi-Query Optimization
- Prasan Roy
- Indian Institute of Technology - Bombay
2Overview
- Multi-Query Optimization What?
- Problem statement
- Multi-Query Optimization Why?
- Application scenarios
- Multi-Query Optimization How?
- A cost-based practical approach
- Prototyping Multi-Query Optimization
- On MS SQL-Server at Microsoft
- Research prototype at IIT-Bombay
3Multi-Query Optimization What?
- Exploit common subexpressions (CSEs) in query
optimization - Consider DAG execution plans in addition to tree
execution plans
4Example
B
A
C
D
B
C
Best Plan for A JOIN B JOIN C
Best Plan for B JOIN C JOIN D
5Example (contd)
D
A
B
C
Common Subexpression
6Multi-Query Optimization Why?
- Queries on views, nested queries,
- Overlapping query batches generated by
applications - Update expressions for materialized views
- Query invocations with different parameters
- . . .
Practical solutions needed!
7Multi-Query Optimization How?
- Set up the search space
- Identify the common subexpressions
- Explore the search space efficiently
- Find the best way to exploit the common
subexpressions
8Problems
- Materializing and sharing a CSE not necessarily
cheaper - Mutually exclusive alternatives
- (A JOIN B JOIN C)
- (B JOIN C JOIN D)
- (C JOIN D JOIN E)
- What to share (B JOIN C) or (C JOIN D) ?
Huge search space!
9Earlier Work Practical Solutions
- As early as 1976
- Preprocess query before optimization Hall,
IBM-JRD76 - As late as 1998
- Postprocess optimized plans Subramanium and
Venkataraman, SIGMOD98
Query optimizer is not aware!
10Earlier Work Theoretical Studies
- Sellis, TODS88, Cosar et al., CIKM93, Shim
et al., DKE94,... - Set of queries Q1, Q2, , Qn
- For each query Qi, set of execution plans
Pi1, Pi2, , Pim - Pij is a set of tasks from a common pool
- Pick a plan for each query such that the cost of
tasks in the union is minimized
Not integrated with existing optimizers, no
practical study
11Microsoft Experience
- with Paul Larson,
- Microsoft Research
12Prototyping MQO on SQL-Server
- Add multi-query optimization capability to
SQL-Server - Well integrated with the existing optimization
framework - another optimization level
- minimal changes, minimal extra lines of code
- First cut exhaustive
- How slow can it be?
- A working prototype by the summer-end
13What (almost) already exists in the SQL-Server
Optimizer
- AND/OR Query-DAG representation of plan space
Group (OR node)
14What actually exists in the SQL-Server Optimizer
- Relations cloned for each use
B1
A
C1
D
B2
C2
15Preprocessing Step Query-DAG
Unification
- Performed in a bottom-up traversal
?
?
?
?
?
?
B1
A
C1
D
B2
C2
16Common Subexpression Identification
Common Subexpression
A
B
C
D
17Exploring the Search Space A Naïve Algorithm
- For each set S of common subexpressions
- materialize each node in S
- MatCost(S) sum of materialization costs of the
nodes in S - invoke optimizer to find the best plan for the
root and for each node S - CompCost(S) sum of costs of above plans
- Cost(S) MatCost(S) CompCost(S)
- Pick S with the minimum Cost
18Doing Better Incremental
Reoptimization
- Goal best plan for Si ? best plan for Sj
- Observation
- Best plans change for only the ancestors of
nodes in Si XOR Sj - Algorithm
- Propagate changed costs in bottom-up
topological order from nodes in Si XOR Sj - Update min-cost plan at each node visited
- Do not propagate further up if min-cost plan
remains unchanged at a node
Work done at IIT-Bombay
19Incremental Optimization Example
min-cost
A
B
C
D
20Incremental Optimization Example
?
?
Now materialized
?
Previous min-cost
New min-cost
A
B
C
D
21Current Status
- A first-cut implementation working
- Lines of C code added 1500 approx.
22Future Work
- Performance tuning and smarter data structures
needed - Ways to restrict enumeration taking DAG structure
into account
23Research at IIT-Bombay Heuristics for
MQO
- with S. Sudarshan, S. Seshadri
24A Greedy Heuristic
- Pick nodes for materialization one at a time, in
benefit order - Benefit(n) reduction in cost on materialization
of n
Benefit computation is expensive
25Monotonicity Assumption
- Benefit of a node does not increase due to
materialization of other nodes - Exploited to avoid some benefit computations
Optimization costs decrease by 90
26A Postpass Heuristic Volcano-SH
- No change in Volcano best plan computation
- Cost-based materialization of nodes in best
Volcano plan - Implementation easy
- Low overhead
Optimizer is not aware
27A Volcano Variant Volcano-RU
- Volcano best plan search aware of best plans for
earlier queries - Cost based materialization of best plan nodes
that are used by later queries - Implementation easy
- Low overhead
Local decisions, plan quality sensitive to query
sequence
28Experimental Conclusion
- Greedy
- Expensive, but practical
- Overheads typically offset by plan quality
- especially for expensive canned queries
- Almost linear scaleup with query batch size
- typically, only the width of the Query DAG
affected - Volcano-RU
- Mostly better than Volcano-SH, same overhead
- Negligible overhead over Volcano
- recommended for cheap but complex queries
29Conclusion
- Multi-query optimization is needed
- Multi-query optimization is practical!
- Multi-query optimization is an easy next step for
DAG-based optimizers