ICOM 6005 Database Management Systems Design - PowerPoint PPT Presentation

About This Presentation
Title:

ICOM 6005 Database Management Systems Design

Description:

ICOM 6005 Database Management Systems Design. Dr. Manuel ... Alterative plans are built and compared. Cheapest one is kept. Two major algorithms exist ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 29
Provided by: ValuedGate2254
Learn more at: http://www.ece.uprm.edu
Category:

less

Transcript and Presenter's Notes

Title: ICOM 6005 Database Management Systems Design


1
ICOM 6005 Database Management Systems Design
  • Dr. Manuel Rodríguez-Martínez
  • Electrical and Computer Engineering Department
  • Lecture 15 Query Optimization

2
Query Optimization
  • Read
  • Chapter 12, sec 12.4
  • Chapter 15
  • SAC79 Pages
  • Purpose
  • Study different algorithms to optimize queries
    submitted to the DBMS

3
Introduction
  • SQL query gets translated into relational algebra
    expression
  • Relational algebra expression is represented as
    tree
  • This is what DBMS understands how to process
  • Expression becomes a plan once we identify access
    methods for each operator
  • Relational algebra expression might have an
    equivalent expression
  • Example R(A,B,C) , S(A, D, F)
  • But, each expression might have different cost
  • How do we find the cheapest expression?

4
Relational DBMS Architecture
Client API
Client
Query Parser
Query Optimizer
Relational Operators
Execution Engine
File and Access Methods
Concurrency and Recovery
Buffer Management
Disk Space Management
DB
5
Query Optimizer
  • Module in DBMS in charge of finding cheapest
    available plan to execute a query
  • Building one is not easy!
  • Optimizer searches for plans and compares then
    based on cost
  • Cost can be
  • Resource usage
  • Response time
  • Power consumption
  • Number of I/Os
  • Network transmission cost

6
Query Plans
  • Query plan specifies the operations to be
    executed
  • Tree of operators
  • Each operator corresponds to a relational
    operator
  • Leaf nodes usually represent base tables

?A,B
??
??
?A,B
??
?Agt2
T
R
S
R
S
S
S
7
Executing Query Plans
  • Plans generated by the optimizer are fed to the
    execution engine
  • Plans support iterator interface
  • Open initialize the operator
  • Next get next tuple from operator
  • Close de-allocate resources from operator
  • Execution engine invokes each method
  • Invocation triggers cascade of calls
  • Each operator call the corresponding methods on
    child nodes
  • Example open on join, causes call to open on
    outer table and call to open on inner table. Same
    for next and close.

8
Pipelined vs Materialized Execution
  • Pipelined
  • The output tuple from one operator immediately
    becomes input tuple to its parent operator in the
    tree
  • Materialized
  • The output tuples from one operator must be
    stored to disk first (into a temporary table)
  • Once the operator finishes, its parent operator
    can access the materialed tuples
  • Most execution engines use pipelined
  • Saving in I/O can be substantial!
  • Some operator cannot be pipelined
  • Sorting, projections with duplicate elimination
  • Query optimizer must be aware of this issue!

9
Generation of Query Plans
  • Optimizer generates query plan after a search
    finds the optimal one
  • According to some criteria
  • Search is a search by construction
  • Alterative plans are built and compared
  • Cheapest one is kept
  • Two major algorithms exist
  • Dynamic programming (SAC79)
  • Exhaustive search of plan space
  • Finds the optimal
  • Randomized Algorithm
  • Random search of plan space
  • Quickly finds sub-optimal but good plan
  • Optimization philosophy find a good plan quick,
    avoid bad ones!

10
Left-Deep Join vs Bushy Plans
  • Query Optimizer generate two major types of plans
  • Left-deep plans
  • Bushy plans
  • Left-deep plans
  • Every join has a base table as the inner join
    table
  • Use in commercial systems (first in System R)
  • Good for dynamic programming
  • Good for optimizing resource usage
  • Bushy plans
  • Joins might have intermediate tables as input to
    the join
  • Good for randomized search
  • Use in research prototypes for distributed
    databases
  • Good for optimizing response time

11
Left-deep plans
??
??
??
??
T
U
R
??
S
R
S
??
T
Each join always has base table as inner table
R
S
12
Bushy Plans
??
??
??
??
R
S
R
??
S
T
??
U
??
V
??
R
S
T
U
13
Left-Deep vs Bushy Plans
  • Bushy plans
  • Enable parallelism in operator evaluation
  • Operator can execute at different rates
  • Good in distributed environments
  • More complicate to build (harder optimizer)
  • Left-deep plans
  • Joins are run in sequence
  • Susceptible to bottleneck at some operator
  • Simpler to build (easier optimizer)
  • Good for single site systems
  • Everything runs on the same machine
  • Commercial DBMS systems use Left-deep plans

14
Cost of a plan
  • The cost of plan depends on the metric you wish
    to optimize
  • Resource usage (CPU I/O Network)
  • Cost is the sum of the resources used by each
    operator
  • Response time
  • Cost of the slowest path in the tree
  • Number of I/Os
  • Cost is the sum of the I/Os generated by each
    operator
  • Network cost
  • Cost is sum of cost in moving data between
    operators

15
Organization of an optimizer
Query Parser
SQL Query
Parse Tree
Catalog Manager
Plan Generator
Cost Estimator
Query Optimizer
Query Plan
Catalog
Query Execution Engine
16
Generating Alternatives
  • Relational equivalences are used by the optimizer
    to generate different operators that do the same
  • Selection equivalences
  • Cascading selections
  • Commutative selections

17
Equivalence Rules
  • Projections
  • Cascading projections
  • Joins
  • Commutative rule
  • Associative rule

18
Equivalence Rules(2)
  • Commute selections and projections
  • Pushing selections
  • Decomposing selections

19
System R Optimizer
  • Based on left-deep plans and dynamic programming
  • Most commercial systems use a System R type of
    optimizer
  • Cost is based on resource usage
  • Cost CPU Cost I/O Cost
  • Given a plan P, cost of P is computed as
  • Cost(P) operatorCost(P.root)
    Cost(P.root.leftChild) Cost(P.root.rightChild)

20
Estimating Cost of Operators
  • Key feature for this is selectivity factors,
    selectivity, and join costs
  • Example
  • R has no index
  • R 100,000, R 5000
  • S has un-clustered Index on
  • Join attribute
  • S 70,000, S 2500
  • What algorithm shall be use for join?
  • Chose between BNLJ, INLJ, e GHJ with 20 B
  • What is the cost?

??
R
S
21
System R Search Algorithm
  • Idea
  • Build every possible plan and keep track of
  • Cheapest plan (overall)
  • Cheapest plans that bring data in sorted order
    (called interesting orders)
  • Dynamic Programming (divide-and-conquer)
  • To find plan for n-way join you
  • First find singe table plans
  • Then find plans for all (n-1)-way join and find a
    plan to join missing table with an (n-1)-way join
  • Plan for smaller joins are saved on a table

22
System R Search Algorithm (2)
  • Process
  • For an n-way join between tables R1, R2, , Rn
  • Find the access path to access each table
  • Plans access to get R1, R2, , Rn
  • This includes application of selection and
    projections for each table
  • Find the access path to compute 2-way joins
  • 2-way joins for all possible pairs of tables
  • Find the access path to compute 3-way joins
  • Add a table to a 2-way join (forms all possible
    3-way joins)
  • Find the access path to compute 4-way joins
  • Add a table to a 3-way join (forms all possible
    4-way joins)
  • Find the access path to compute the n-way join
  • Add a table to a n-1 way join (forms all possible
    n-way joins)

23
System R Search Algorithm (2)
  • Plan SystemROptimizer(R1, R2, , Rn)
  • for (int i 0 i lt n i) // single table
    access paths
  • optPlan(Ri) selectPlan(Ri)
  • for (int i2 I lt n i) // join access
    paths, start with 2 tables, then 3, ,
  • for all S ? R1, R2, R3, Rn s.t. S i
    // S is the next set to join
  • bestPlan dummy plan with infinite cost
  • for all Rj, Sj s.t. S Sj ?Rj // Sj Rj
    are pieces of S
  • P joinPlan(optPlan(Sj), optPlan(Rj))
  • if (cost(P) lt cost(bestPlan))
  • bestPlan P
  • optPlan(Sj) bestPlan
  • S R1, R2, , Rn)
  • return optPlan(S)

24
Illustration
  • How does the algorithm works for this case
  • Tables
  • R - R 100,000, R 8,000
  • S - S 90,000, S 6,000
  • T - T 120,000 T 10,000
  • U - U 80,000 U 4,000
  • All tables are stored in heap files
  • DBMS has Blocked nested loops join, Hash Join
    and 25 free buffers
  • What is the best plan for query
  • R??S ??T ??U

25
Illustration (2)
  • How does the algorithm works for this case
  • Tables
  • R - R 100,000, R 8,000
  • R is stored on a clustered Btree matching join
    attribute with T
  • S - S 90,000, S 6,000
  • R is stored on
  • T - T 120,000 T 10,000
  • DBMS has Blocked nested loops join,
    Indexed-nested loops join, and Hash Join and 3
    free buffers
  • What is the best plan for query
  • R??S ??T

26
Illustration (3)
  • How does the algorithm works for this case
  • Tables
  • R - R 100,000, R 8,000
  • R is stored on a clustered Btree matching join
    attribute with T
  • S - S 90,000, S 6,000
  • R is stored on
  • T - T 120,000 T 10,000
  • DBMS has Blocked nested loops join,
    Indexed-nested loops join, and Hash Join and 25
    free buffers
  • What is the best plan for query
  • ?Agt3 ?B NY (R)??S ??T
  • If SFAgt3 .10 and SF B NY 0.05 and Agt3
    matches index on R.

27
Issues with SystemR optimizer
  • Algorithm performs exhaustive search of left-deep
    plans
  • Dynamic Programming is ill-suited for
    optimization of response time
  • Principle of optimality is not observed
  • Difficult (but not impossible) to modify for
    bushy plans
  • Search space is huge
  • Need pruning techniques to cut on the number of
    plans stored
  • Do we need exhaustive search?
  • Optimal plan vs sub-optimal that is good and
    quick to find
  • Disaster avoidance More important to avoid bad
    plans!!!

28
Alternative approaches
  • Randomized Query Optimization
  • Use randomized algorithms to build and search
    plans
  • Good for bushy plans
  • Rule-based Query Optimization
  • Use rules to guide the search and better prune
    space
  • Good to apply special cases and pruning
  • Parametric Query Optimization
  • Add run-time parameters to really capture the
    reality of the system
  • Multiple-query Optimization
  • Optimizer takes 2 or more queries at a time for
    optimization
Write a Comment
User Comments (0)
About PowerShow.com