Query Processing and Optimization - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Query Processing and Optimization

Description:

The relational algebra operations as internal nodes. An execution of the query tree consists of ... Communication cost. Shipping the results from the database ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 24
Provided by: imadr
Category:

less

Transcript and Presenter's Notes

Title: Query Processing and Optimization


1
Lecture 29 (12/01/2005)
  • Query Processing and Optimization

2
Query Representation
  • Steps in query processing?
  • Query trees
  • Tree data structure that corresponds to a
    relational algebra expression
  • Input relations of the query as leaf nodes of the
    tree
  • The relational algebra operations as internal
    nodes
  • An execution of the query tree consists of
  • executing an internal node operation whenever its
    operands are available
  • replacing that internal node by the relation that
    results from executing the operation
  • There are many trees for the same query
  • Trees always have a strict order among their
    operations
  • Query optimization must find best order

3
Sample Query
  • Example For every project located in Stafford,
    retrieve the project number, the controlling
    department number and the department managers
    last name, address and birthdate
  • SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
    E.BDATEFROM PROJECT AS P,DEPARTMENT AS D,
    EMPLOYEE AS EWHERE P.DNUMD.DNUMBER AND
    D.MGRSSNE.SSN AND
    P.PLOCATIONSTAFFORD
  • Relational algebra
  • ?PNUMBER, DNUM, LNAME, ADDRESS, BDATE
    (((?PLOCATIONSTAFFORD(PROJECT))
  • DNUMDNUMBER (DEPARTMENT)) MGRSSNSSN
    (EMPLOYEE))

4
  • Internal nodes are
  • Executed when inputs are ready
  • - Replaced by results

5
Heuristics in Query Optimization
  • Example For every project located in Stafford,
    retrieve the project number, the controlling
    department number and the department managers
    last name, address and birthdate
  • SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
    E.BDATEFROM PROJECT AS P,DEPARTMENT AS D,
    EMPLOYEE AS EWHERE P.DNUMD.DNUMBER AND
    D.MGRSSNE.SSN AND
    P.PLOCATIONSTAFFORD
  • Relational algebra
  • ?PNUMBER, DNUM, LNAME, ADDRESS, BDATE
    (((?PLOCATIONSTAFFORD(PROJECT))
  • DNUMDNUMBER (DEPARTMENT)) MGRSSNSSN
    (EMPLOYEE))

6
Different representation for the same algebra
expression assumed to be the initial form
There are many trees for the same query - strict
order among their operations - Query
optimization must find best order
7
Heuristics in Query Optimization
  • The main heuristic is to first apply the
    operations that reduce the size of intermediate
    results
  • E.g., Apply SELECT and PROJECT operations before
    applying the JOIN or other binary operations
  • General heuristic optimization Algorithm
  • 1- Push selections down
  • 2- Apply more restrictive selections first
  • Selectivity estimated by DBMS
  • 3- Combine cross products and selections to
    become joins
  • 4- Push projections down

8
Steps in converting a query tree during heuristic
optimization
  • Select names of employees working on the
    Aquarius Project and born after 1957
  • Select LnameFrom Employee, Works_On,
    ProjectWhere Pname Acquarius and bithdategt
    12/31/1957 and SSNESSN and PNumber PNO
  • Usually, we start with Cross Products, followed
    by selections, followed by Projects

9
Steps in converting a query tree during heuristic
optimization
(a) Initial query tree for the SQL query made by
parser
1-Push selections down 2-Apply more restrictive
selections first -e.g. equalities before
range queries 3-Combine cross products and
selections to become joins 4-Push projections down
10
(b) Moving SELECT operations down the query tree
11
(c) Applying the more restrictive SELECT
operation first
12
(d) Replacing CARTESIAN PRODUCT and SELECT with
JOIN operations
13
(e) Moving PROJECT operations down the query tree.
14
Transformation Rules
  • Transformation rules transform one relational
    algebra expression to AN EQUIVALENT ONE
  • Used by the query optimizer to optimize query
    tree
  • Any rule, if applied, makes sure that the
    resulting tree is equivalent ? resulting
    execution plan is equivalent
  • General Transformation Rules
  • (1) Cascade of s A conjunctive selection
    condition can be broken up into a cascade
    (sequence) of individual s operations
  • s c1 AND c2 ANDAND cn(R) sc1 (sc2
    (...(scn(R))...) )
  • Push selections down
  • (2) Commutativity of s The s operation is
    commutative
  • sc1 (sc2(R)) sc2 (sc1(R))
  • More selective selections first

15
Transformation Rules
  • (3) Cascade of p In a cascade (sequence) of p
    operations, all but the last one can be ignored
  • pList1 (pList2 (...(pListn(R))...) ) pList1(R)
  • (5) Commutativity of ( and x ) The
    operation is commutative as is the x operation
  • R C S S C R
  • R x S S x R
  • (6.a) Commuting s with (or x ) If all the
    attributes in the selection condition c involve
    only the attributes of one of the relations being
    joinedsay Rthe two operations can be commuted
    as follows
  • sc ( R S ) (sc (R)) S
  • Combinesselections and cross products to become
    joins
  • etc
  • Read rest in the book p. 518-520

16
Heuristics in Query Optimization
  • Outline of a Heuristic Algebraic Optimization
    Algorithm
  • Break up any select operations with conjunctive
    conditions into a cascade of select operations
  • Move each select operation as far down the query
    tree as is permitted by the attributes involved
    in the selection condition
  • Rearrange the leaf nodes of the tree so that the
    leaf node relations with the most restrictive
    select operations are executed first in the query
    tree representation.

17
Heuristics in Query Optimization
  • Combine a cross product operation with a
    subsequent select operation in the tree into a
    join operation
  • Break down and move lists of projection
    attributes down the tree as far as possible by
    creating new project operations as needed
  • (Importantgiven a query ? optimize using this
    algorithm)
  • SQL
  • For every query block
  • Relational algebra
  • Initial query tree
  • Start with Cross Products, then selections, then
    Projects
  • Apply algorithm

18
Selectivity and Cost Estimates in Query
Optimization
  • A query optimizer does not depend completely on
    heuristics
  • Not always right
  • Cost-based query optimization
  • Estimate and compare the costs of executing a
    query using different execution strategies and
    choose the one with the lowest cost estimate
  • Issues
  • Cost function
  • Number of execution strategies to be considered
  • Limit the number
  • Much better for compiled queries where
    optimization is done once at compile time and the
    query is executed many times
  • PreparedStatements VS Statements

19
Selectivity and Cost Estimates in Query
Optimization
  • Cost Components for Query Execution
  • Access cost to secondary storage
  • Searching, reading, writing, updating, etc
  • Memory usage cost
  • Number of memory buffers needed for the query
  • Storage cost
  • Storing any intermediate files that are generated
    by an execution strategy for the query (NEXT
    THREE SLIDES)
  • Communication cost
  • Shipping the results from the database site to
    the users site
  • Computation cost
  • Of performing in-memory operations on the data
    buffers during the execution plan (searching,
    sorting, joining, arithmetic)

20
Combining Operations using Pipelining
  • Weve selected the best representation so far
  • A query is mapped into a sequence of operations
    embedded in an execution plan
  • Evaluating the internal representation can be
    done in two ways
  • Select Lname and SSN of all employees in the
    Design Department born after 1950
  • Materialized evaluation the result of an
    operation is stored as a temporary relation
  • Each execution produces a temporary result (file)
    for each operation (one operation at a time)
  • Generating and saving temporary files on disk is
    time consuming and expensive

21
Combining Operations using Pipelining
  • Pipelined evaluation as the result of an
    operator is produced, it is forwarded to the
    next operator in sequence
  • Avoid constructing temporary results as much as
    possible
  • Pipeline the data through multiple operations -
    pass the result of a previous operator to the
    next without waiting to complete the previous
    operation
  • Also known as stream-based processing
  • ? (DepartmentNameAccounting') (Employee
    DNODepartmentNumber Department)

22
Combining Operations using Pipelining
  • Example For a join on two tables with a
    selection on each and then a projection
  • P A, B (? R.A ? (? S.B ? (R S)))
  • 4 operations ? 4 temporary results
  • Result of the Join is fed in a "Pipeline" into
    the 1st selection, the 2nd selection and finally
    the projection
  • When is it good to materialize the result?
  • Previous results being reused ? e.g. Nested
    uncorrelated queries

23
Exercise Heuristic Optimization
  • Query in SQL
  • For every query block
  • Query in Relational algebra
  • Initial query tree
  • Start with Cross Products, then selections, then
    Projects
  • Apply algorithm
  • Find Lname and SSN of all employees in the
    Design Department working on project 5 who earn
    more than the highest paid employee working on
    the Project X Project
Write a Comment
User Comments (0)
About PowerShow.com