Title: Query Processing and Optimization
1Lecture 29 (12/01/2005)
- Query Processing and Optimization
2Query Representation
- Steps in query processing?
- Query trees
- Tree data structure that corresponds to a
relational algebra expression - Input relations of the query as leaf nodes of the
tree - The relational algebra operations as internal
nodes - An execution of the query tree consists of
- executing an internal node operation whenever its
operands are available - replacing that internal node by the relation that
results from executing the operation - There are many trees for the same query
- Trees always have a strict order among their
operations - Query optimization must find best order
3Sample Query
- Example For every project located in Stafford,
retrieve the project number, the controlling
department number and the department managers
last name, address and birthdate - SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
E.BDATEFROM PROJECT AS P,DEPARTMENT AS D,
EMPLOYEE AS EWHERE P.DNUMD.DNUMBER AND
D.MGRSSNE.SSN AND
P.PLOCATIONSTAFFORD - Relational algebra
- ?PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((?PLOCATIONSTAFFORD(PROJECT)) - DNUMDNUMBER (DEPARTMENT)) MGRSSNSSN
(EMPLOYEE)) -
4- Internal nodes are
- Executed when inputs are ready
- - Replaced by results
5Heuristics in Query Optimization
- Example For every project located in Stafford,
retrieve the project number, the controlling
department number and the department managers
last name, address and birthdate - SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
E.BDATEFROM PROJECT AS P,DEPARTMENT AS D,
EMPLOYEE AS EWHERE P.DNUMD.DNUMBER AND
D.MGRSSNE.SSN AND
P.PLOCATIONSTAFFORD - Relational algebra
- ?PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((?PLOCATIONSTAFFORD(PROJECT)) - DNUMDNUMBER (DEPARTMENT)) MGRSSNSSN
(EMPLOYEE)) -
6Different representation for the same algebra
expression assumed to be the initial form
There are many trees for the same query - strict
order among their operations - Query
optimization must find best order
7Heuristics in Query Optimization
- The main heuristic is to first apply the
operations that reduce the size of intermediate
results - E.g., Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations - General heuristic optimization Algorithm
- 1- Push selections down
- 2- Apply more restrictive selections first
- Selectivity estimated by DBMS
- 3- Combine cross products and selections to
become joins - 4- Push projections down
8Steps in converting a query tree during heuristic
optimization
- Select names of employees working on the
Aquarius Project and born after 1957 - Select LnameFrom Employee, Works_On,
ProjectWhere Pname Acquarius and bithdategt
12/31/1957 and SSNESSN and PNumber PNO - Usually, we start with Cross Products, followed
by selections, followed by Projects
9Steps in converting a query tree during heuristic
optimization
(a) Initial query tree for the SQL query made by
parser
1-Push selections down 2-Apply more restrictive
selections first -e.g. equalities before
range queries 3-Combine cross products and
selections to become joins 4-Push projections down
10(b) Moving SELECT operations down the query tree
11(c) Applying the more restrictive SELECT
operation first
12(d) Replacing CARTESIAN PRODUCT and SELECT with
JOIN operations
13(e) Moving PROJECT operations down the query tree.
14Transformation Rules
- Transformation rules transform one relational
algebra expression to AN EQUIVALENT ONE - Used by the query optimizer to optimize query
tree - Any rule, if applied, makes sure that the
resulting tree is equivalent ? resulting
execution plan is equivalent - General Transformation Rules
- (1) Cascade of s A conjunctive selection
condition can be broken up into a cascade
(sequence) of individual s operations - s c1 AND c2 ANDAND cn(R) sc1 (sc2
(...(scn(R))...) ) - Push selections down
- (2) Commutativity of s The s operation is
commutative - sc1 (sc2(R)) sc2 (sc1(R))
- More selective selections first
15Transformation Rules
- (3) Cascade of p In a cascade (sequence) of p
operations, all but the last one can be ignored - pList1 (pList2 (...(pListn(R))...) ) pList1(R)
- (5) Commutativity of ( and x ) The
operation is commutative as is the x operation - R C S S C R
- R x S S x R
- (6.a) Commuting s with (or x ) If all the
attributes in the selection condition c involve
only the attributes of one of the relations being
joinedsay Rthe two operations can be commuted
as follows - sc ( R S ) (sc (R)) S
- Combinesselections and cross products to become
joins - etc
- Read rest in the book p. 518-520
16Heuristics in Query Optimization
- Outline of a Heuristic Algebraic Optimization
Algorithm - Break up any select operations with conjunctive
conditions into a cascade of select operations - Move each select operation as far down the query
tree as is permitted by the attributes involved
in the selection condition - Rearrange the leaf nodes of the tree so that the
leaf node relations with the most restrictive
select operations are executed first in the query
tree representation.
17Heuristics in Query Optimization
- Combine a cross product operation with a
subsequent select operation in the tree into a
join operation - Break down and move lists of projection
attributes down the tree as far as possible by
creating new project operations as needed - (Importantgiven a query ? optimize using this
algorithm) - SQL
- For every query block
- Relational algebra
- Initial query tree
- Start with Cross Products, then selections, then
Projects - Apply algorithm
18Selectivity and Cost Estimates in Query
Optimization
- A query optimizer does not depend completely on
heuristics - Not always right
- Cost-based query optimization
- Estimate and compare the costs of executing a
query using different execution strategies and
choose the one with the lowest cost estimate - Issues
- Cost function
- Number of execution strategies to be considered
- Limit the number
- Much better for compiled queries where
optimization is done once at compile time and the
query is executed many times - PreparedStatements VS Statements
19Selectivity and Cost Estimates in Query
Optimization
- Cost Components for Query Execution
- Access cost to secondary storage
- Searching, reading, writing, updating, etc
- Memory usage cost
- Number of memory buffers needed for the query
- Storage cost
- Storing any intermediate files that are generated
by an execution strategy for the query (NEXT
THREE SLIDES) - Communication cost
- Shipping the results from the database site to
the users site - Computation cost
- Of performing in-memory operations on the data
buffers during the execution plan (searching,
sorting, joining, arithmetic)
20Combining Operations using Pipelining
- Weve selected the best representation so far
- A query is mapped into a sequence of operations
embedded in an execution plan - Evaluating the internal representation can be
done in two ways - Select Lname and SSN of all employees in the
Design Department born after 1950 - Materialized evaluation the result of an
operation is stored as a temporary relation - Each execution produces a temporary result (file)
for each operation (one operation at a time) - Generating and saving temporary files on disk is
time consuming and expensive
21Combining Operations using Pipelining
- Pipelined evaluation as the result of an
operator is produced, it is forwarded to the
next operator in sequence - Avoid constructing temporary results as much as
possible - Pipeline the data through multiple operations -
pass the result of a previous operator to the
next without waiting to complete the previous
operation - Also known as stream-based processing
- ? (DepartmentNameAccounting') (Employee
DNODepartmentNumber Department)
22Combining Operations using Pipelining
- Example For a join on two tables with a
selection on each and then a projection - P A, B (? R.A ? (? S.B ? (R S)))
- 4 operations ? 4 temporary results
- Result of the Join is fed in a "Pipeline" into
the 1st selection, the 2nd selection and finally
the projection - When is it good to materialize the result?
- Previous results being reused ? e.g. Nested
uncorrelated queries
23Exercise Heuristic Optimization
- Query in SQL
- For every query block
- Query in Relational algebra
- Initial query tree
- Start with Cross Products, then selections, then
Projects - Apply algorithm
- Find Lname and SSN of all employees in the
Design Department working on project 5 who earn
more than the highest paid employee working on
the Project X Project