Distributed Query Optimization - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Distributed Query Optimization

Description:

... the art in distributed query processing' D. Kossman (ACM Computing Surveys, 2000) ... management system', Selinger, Astrahan, Chamberlin et. al. (ACM SIGMOD 1979) ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 19
Provided by: Pac99
Category:

less

Transcript and Presenter's Notes

Title: Distributed Query Optimization


1
Distributed Query Optimization
  • By
  • Shrinivas Lakshmikant
  • CSE, IIT Bombay

2
Overview
  • Motivation for query optimization
  • Basic techniques for query optimization
  • Issues in distributed query processing
  • Query processing techniques in distributed DBs
  • Semi Join Technique

3
Motivation
  • Distributed data processing is becoming common
  • Modern distributed systems are large complex
  • Automated analysis decision support systems
    require complex queries

4
Basic Query Optimization
  • Classical Dynamic Programming algorithm
  • Performs join order optimization
  • Input Join query on n relations
  • Output Best join order

5
The Algorithm
for i 1 to n do optPlan(Ri)
accessPlans(Ri) prunePlans(optPlan(Ri)) for
i 2 to n do for all S ? R1, R2 Rn
such that S i do optPlan(S) ?
for all O ? S do optPlan(S) optPlan(S) ?
joinPlans(optPlan(O), optPlan(S
O)) prunePlans(optPlan(S)) return
optPlan(R1, R2, Rn)
6
Issues in Distributed Databases
  • Plan enumeration
  • The time and space complexity of traditional
    dynamic programming algorithm is very large
  • Iterative Dynamic Programming
  • Cost Models
  • Classic Cost Model
  • Response Time Model
  • Economic Models

7
Query Execution Techniques for Distributed
Databases
  • Row Blocking
  • Multi-cast Optimization
  • Multi-threaded execution
  • Joins with horizontal partitioning
  • Semi Joins
  • Top n queries

8
Special techniques for Client-Server Architectures
  • Shipping techniques
  • Query shipping
  • Data shipping
  • Hybrid shipping
  • Query Optimization
  • Site Selection
  • Where to optimize
  • Two Phase Optimization

9
Special techniques for Federated Database Systems
  • Wrapper architecture
  • Query optimization
  • Query capabilities
  • Cost estimation
  • Calibration Approach
  • Wrapper Cost Model
  • Learning Curve Approach
  • Parameter Binding

10
Semi Joins for Distributed Query Processing
  • Motivation
  • Reducing communication costs
  • Exploiting machine resources (parallelism)
  • Generating plans with Semi Joins
  • AccessRoot
  • JoinRoot

11
AccessRoot Algorithm
for i 1 to n do optPlan(Ri)
accessPlans(Ri) prunePlans(optPlan(Ri)) for
i 2 to n do for all S ? R1, R2 Rn
such that S i do optPlan(S) ?
for all O ? S do optPlan(S) optPlan(S) ?
semiJoinPlans(optPlan(O), optPlan(S
O)) prunePlans(optPlan(S))
traverseAndMovePlans(optPlan)
12
AccessRoot (cont)
for i 1 to n do prunePlans(optPlan(Ri) fo
r i 2 to n do for all S ? R1, R2 Rn
such that S i do optPlan(S) ?
for all O ? S do optPlan(S) optPlan(S) ?
joinPlans(optPlan(O), optPlan(S
O)) prunePlans(optPlan(S)) return
optPlan(R1, R2, Rn)
13
Discussion
  • AccessRoot only reduces base tables
  • Non disjoint subsets are not generated
  • Need for a more complete algorithm

14
JoinRoot Algorithm
for i 1 to n do optPlan(Ri)
accessPlans(Ri) prunePlans(optPlan(Ri)) for
i 2 to n do for all S ? R1, R2 Rn
such that S i do optPlan(S) ?
for all O ? S do for all P ? O do
optPlan(S) optPlan(S) ?
joinPlans(optPlan(O), optPlan((S O) ?
P)) optPlan(S) optPlan(S) ?
semiJoinPlans(optPlan(O), optPlan((S O) ?
P)) prunePlans(optPlan(S))
fixPointIteration(S) return optPlan(R1,
R2, Rn)
15
Discussion
  • Enumerates complete search space
  • Enumerates non disjoint subsets (to exploit
    parallelism)
  • Fix Point Iteration (for plan completeness)
  • Vertical Pruning

16
Some Results
  • Optimization time of AccessRoot is not much more
    than Classical Dynamic Programming
  • Optimization time of JoinRoot is significantly
    higher than Classical DP, for queries on about
    9-10 relations
  • AccessRoot produces reasonable plans with
    reasonable running time for most network
    topologies

17
Summary
  • Basic Query Optimization
  • Issues specific to distributed databases
  • One technique useful in Distributed Databases
  • Volcano based optimizers Semi joins (ongoing
    work)

18
References
  • The state of the art in distributed query
    processing D. Kossman (ACM Computing Surveys,
    2000)
  • Access path selection in a relational database
    management system, Selinger, Astrahan,
    Chamberlin et. al. (ACM SIGMOD 1979)
  • Integrating semi join reducers into state of the
    art query processors, Stocker, Kossman,
    Braumandl, Kemper (ICDE 2001)
Write a Comment
User Comments (0)
About PowerShow.com