Title: Distributed Query Optimization
1Distributed Query Optimization
- By
- Shrinivas Lakshmikant
- CSE, IIT Bombay
2Overview
- Motivation for query optimization
- Basic techniques for query optimization
- Issues in distributed query processing
- Query processing techniques in distributed DBs
- Semi Join Technique
3Motivation
- Distributed data processing is becoming common
- Modern distributed systems are large complex
- Automated analysis decision support systems
require complex queries
4Basic Query Optimization
- Classical Dynamic Programming algorithm
- Performs join order optimization
- Input Join query on n relations
- Output Best join order
5The Algorithm
for i 1 to n do optPlan(Ri)
accessPlans(Ri) prunePlans(optPlan(Ri)) for
i 2 to n do for all S ? R1, R2 Rn
such that S i do optPlan(S) ?
for all O ? S do optPlan(S) optPlan(S) ?
joinPlans(optPlan(O), optPlan(S
O)) prunePlans(optPlan(S)) return
optPlan(R1, R2, Rn)
6Issues in Distributed Databases
- Plan enumeration
- The time and space complexity of traditional
dynamic programming algorithm is very large - Iterative Dynamic Programming
- Cost Models
- Classic Cost Model
- Response Time Model
- Economic Models
7Query Execution Techniques for Distributed
Databases
- Row Blocking
- Multi-cast Optimization
- Multi-threaded execution
- Joins with horizontal partitioning
- Semi Joins
- Top n queries
8Special techniques for Client-Server Architectures
- Shipping techniques
- Query shipping
- Data shipping
- Hybrid shipping
- Query Optimization
- Site Selection
- Where to optimize
- Two Phase Optimization
9Special techniques for Federated Database Systems
- Wrapper architecture
- Query optimization
- Query capabilities
- Cost estimation
- Calibration Approach
- Wrapper Cost Model
- Learning Curve Approach
- Parameter Binding
10Semi Joins for Distributed Query Processing
- Motivation
- Reducing communication costs
- Exploiting machine resources (parallelism)
- Generating plans with Semi Joins
- AccessRoot
- JoinRoot
11AccessRoot Algorithm
for i 1 to n do optPlan(Ri)
accessPlans(Ri) prunePlans(optPlan(Ri)) for
i 2 to n do for all S ? R1, R2 Rn
such that S i do optPlan(S) ?
for all O ? S do optPlan(S) optPlan(S) ?
semiJoinPlans(optPlan(O), optPlan(S
O)) prunePlans(optPlan(S))
traverseAndMovePlans(optPlan)
12AccessRoot (cont)
for i 1 to n do prunePlans(optPlan(Ri) fo
r i 2 to n do for all S ? R1, R2 Rn
such that S i do optPlan(S) ?
for all O ? S do optPlan(S) optPlan(S) ?
joinPlans(optPlan(O), optPlan(S
O)) prunePlans(optPlan(S)) return
optPlan(R1, R2, Rn)
13Discussion
- AccessRoot only reduces base tables
- Non disjoint subsets are not generated
- Need for a more complete algorithm
14JoinRoot Algorithm
for i 1 to n do optPlan(Ri)
accessPlans(Ri) prunePlans(optPlan(Ri)) for
i 2 to n do for all S ? R1, R2 Rn
such that S i do optPlan(S) ?
for all O ? S do for all P ? O do
optPlan(S) optPlan(S) ?
joinPlans(optPlan(O), optPlan((S O) ?
P)) optPlan(S) optPlan(S) ?
semiJoinPlans(optPlan(O), optPlan((S O) ?
P)) prunePlans(optPlan(S))
fixPointIteration(S) return optPlan(R1,
R2, Rn)
15Discussion
- Enumerates complete search space
- Enumerates non disjoint subsets (to exploit
parallelism) - Fix Point Iteration (for plan completeness)
- Vertical Pruning
16Some Results
- Optimization time of AccessRoot is not much more
than Classical Dynamic Programming - Optimization time of JoinRoot is significantly
higher than Classical DP, for queries on about
9-10 relations - AccessRoot produces reasonable plans with
reasonable running time for most network
topologies
17Summary
- Basic Query Optimization
- Issues specific to distributed databases
- One technique useful in Distributed Databases
- Volcano based optimizers Semi joins (ongoing
work)
18References
- The state of the art in distributed query
processing D. Kossman (ACM Computing Surveys,
2000) - Access path selection in a relational database
management system, Selinger, Astrahan,
Chamberlin et. al. (ACM SIGMOD 1979) - Integrating semi join reducers into state of the
art query processors, Stocker, Kossman,
Braumandl, Kemper (ICDE 2001)