Query Processing and Optimization Dr. Muhammad Shafique - PowerPoint PPT Presentation

About This Presentation
Title:

Query Processing and Optimization Dr. Muhammad Shafique

Description:

Title: Semantic Consistency in Information Exchange Author: John C Mitchell Last modified by: Muhammad Shafique Created Date: 9/7/1997 8:51:32 PM Document ... – PowerPoint PPT presentation

Number of Views:851
Avg rating:3.0/5.0
Slides: 62
Provided by: johnc313
Category:

less

Transcript and Presenter's Notes

Title: Query Processing and Optimization Dr. Muhammad Shafique


1
Query Processing and Optimization Dr. Muhammad
Shafique
2
Outline
  • Background review
  • Processing a query
  • SQL queries and relational algebra
  • Implementing basic query operations
  • Heuristics-based query optimization
  • Cost-function-based query optimization
  • Semantic-based query optimization
  • Query optimization in Oracle DBMS

3
Processing a Query
  • Query processing and query optimization
  • Problem formulation
  • Given a query q, a space of execution plans, E,
    and a cost function, cost(p) that assigns a
    numeric cost to an execution plan p ? E, find the
    minimum cost execution plan that computes q.

4
Processing a Query
  • Typical steps in processing a high-level query
  • Query in a high-level query language like SQL
  • Scanning, parsing, and validation
  • Intermediate-form of query like query tree
  • Query optimizer
  • Execution plan
  • Query code generator
  • Object-code for the query
  • Run-time database processor
  • Results of query

5
SQL Queries and Relational Algebra
  • SQL query is translated into an equivalent
    extended relational algebra expression ---
    represented as a query tree
  • In order to transform a given query into a query
    tree, the query is decomposed into query blocks
  • A query block contains a single
    SELECT-FROM-WHERE expression along with GROUP-BY
    and HAVING clauses.
  • The query optimizer chooses an execution plan for
    each block

6
(No Transcript)
7
(No Transcript)
8
SQL Queries and Relational Algebra
  • Example
  • SELECT Lname, Fname
  • FROM EMPLOYEE
  • WHERE Salary gt ( SELECT MAX(Salary)
  • FROM EMPLOYEE
  • WHERE Dno 5 )

9
Translating SQL Queries into Relational Algebra
(1)
  • SELECT LNAME, FNAME
  • FROM EMPLOYEE
  • WHERE SALARY gt ( SELECT MAX (SALARY)
  • FROM EMPLOYEE
  • WHERE DNO 5)

SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO 5
SELECT LNAME, FNAME FROM EMPLOYEE WHERE
SALARY gt C
pLNAME, FNAME (sSALARYgtC(EMPLOYEE))
FMAX SALARY (sDNO5 (EMPLOYEE))
10
SQL Queries and Relational Algebra (2)
  • Uncorrelated nested queries Vs Correlated nested
    queries
  • Example
  • Retrieve the name of each employee who works on
    all the projects controlled by department number
    5.
  • SELECT FNAME, LNAME FROM EMPLOYEE WHERE
    ( (SELECT PNO FROM WORKS_ON
    WHERE SSNESSN) CONTAINS
    (SELECT PNUMBER FROM PROJECT
    WHERE DNUM5) )

11
SQL Queries and Relational Algebra (3)
  • Example
  • For every project located in Stafford,
    retrieve the project number, the controlling
    department number and the department managers
    last name, address and birthdate.
  • SQL query
  • SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
    E.BDATE
  • FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
  • WHERE P.DNUMD.DNUMBER AND D.MGRSSNE.SSN
    AND P.PLOCATIONSTAFFORD
  • Relation algebra
  • ?PNUMBER, DNUM, LNAME, ADDRESS, BDATE
    (((?PLOCATIONSTAFFORD(PROJECT)) DNUMDNUMBER
    (DEPARTMENT)) MGRSSNSSN (EMPLOYEE))

12
SQL Queries and Relational Algebra (4)
13
Implementing Basic Query Operations
  • An RDBMS must provide implementation(s) for all
    the required operations including relational
    operators and more
  • External sorting
  • Sort-merge strategy
  • Sorting phase
  • Number of file blocks (b)
  • Number of available buffers (nB)
  • Runs --- (b / nB)
  • Merging phase --- passes
  • Degree of merging --- the number of runs that are
    merged together in each pass

14
Algorithms for External Sorting (1)
  • External sorting
  • Refers to sorting algorithms that are suitable
    for large files of records stored on disk that do
    not fit entirely in main memory, such as most
    database files.
  • Sort-Merge strategy
  • Starts by sorting small subfiles (runs) of the
    main file and then merges the sorted runs,
    creating larger sorted subfiles that are merged
    in turn.

15
Algorithms for External Sorting (2)
16
Algorithms for External Sorting (3)
  • Analysis
  • Number of file blocks b
  • Number of initial runs nR
  • Available buffer space nB
  • Sorting phase nR ?(b/nB)?
  • Degree of merging dM Min (nB-1, nR)
  • Number of passes nP ?(logdM(nR))?
  • Number of block accesses (2 b) (2 b
    (logdM(nR)))
  • Example done in the class

17
Heuristic-Based Query Optimization
  • Query tree and query transformations
  • General transformation rules for relational
    algebra operations

18
General Transformation Rules for Relational
Algebra Operations
  • Cascade of ? A conjunctive selection condition
    can be broken up into a cascade (that is, a
    sequence) of individual ? operations ? C1 AND
    C2 AND .AND Cn (R) ? C1 (?C2( (?Cn(R)))
  • Commutativity of ? The ? operation is
    commutative ?C1(?C2(R)) ?C2(?C1(R))
  • Cascade of ? In a cascade (sequence) of ?
    operations, all but the last one can be ignored
  • Commuting ? with ? If the selection condition
    c involves only those attributes A1, ..., An in
    the projection list, the two operations can be
    commuted
  • And more

19
Heuristic-Based Query Optimization
  • Outline of heuristic algebraic optimization
    algorithm
  • Break up SELECT operations with conjunctive
    conditions into a cascade of SELECT operations
  • Using the commutativity of SELECT with other
    operations, move each SELECT operation as far
    down the query tree as is permitted by the
    attributes involved in the select condition
  • Using commutativity and associativity of binary
    operations, rearrange the leaf nodes of the tree
  • Combine a CARTESIAN PRODUCT operation with a
    subsequent SELECT operation in the tree into a
    JOIN operation, if the condition represents a
    join condition
  • Using the cascading of PROJECT and the commuting
    of PROJECT with other operations, break down and
    move lists of projection attributes down the tree
    as far as possible by creating new PROJECT
    operations as needed
  • Identify sub-trees that represent groups of
    operations that can be executed by a single
    algorithm

20
Heuristic-Based Query Optimization Example
  • Query
  • "Find the last names of employees born after
    1957 who work on a project named Aquarius."
  • SQL
  • SELECT LNAME   
  • FROM EMPLOYEE, WORKS_ON, PROJECT   
  • WHERE PNAMEAquarius AND PNUMBERPNO AND
    ESSNSSN AND BDATE.1957-12-31

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Implementing Basic Query Operations
  • Combining operations using pipelining
  • Temporary files based processing
  • Pipelining or stream-based processing
  • Example consider the execution of the following
    query
  • ?list of attributes( (? c1(R) ? (? c2 (S))

27
Implementing Basic Query Operations
  • Estimates of selectivity
  • Selectivity is the ratio of the number of tuples
    that satisfy the condition to the total number of
    tuples in the relation.
  • SELECT ( ? ) operator implementation
  • Linear search
  • Binary search
  • Using a primary index (or hash key)
  • Using primary index to retrieve multiple records
  • Using clustering index to retrieve multiple
    records
  • Using a secondary index on an equality comparison
  • Conjunctive selection using an individual index
  • Conjunctive selection using a composite index
  • Conjunctive selection by intersection of record
    pointers

28
Algorithms for JOIN Operations
  • Implementing the JOIN Operation
  • Join (EQUIJOIN, NATURAL JOIN)
  • twoway join a join on two files
  • e.g. R AB S
  • multi-way joins joins involving more than two
    files.
  • e.g. R AB S CD T
  • Examples
  • (OP6) EMPLOYEE DNODNUMBER DEPARTMENT
  • (OP7) DEPARTMENT MGRSSNSSN EMPLOYEE

29
Algorithms for JOIN Operations
  • Implementing the JOIN Operation (contd.)
  • Methods for implementing joins
  • J1 Nested-loop join (brute force)
  • For each record t in R (outer loop), retrieve
    every record s from S (inner loop) and test
    whether the two records satisfy the join
    condition tA sB.
  • J2 Single-loop join (Using an access structure to
    retrieve the matching records)
  • If an index (or hash key) exists for one of the
    two join attributes say, B of S retrieve each
    record t in R, one at a time, and then use the
    access structure to retrieve directly all
    matching records s from S that satisfy sB
    tA.

30
Algorithms for JOIN Operations
  • Implementing the JOIN Operation (contd.)
  • Methods for implementing joins
  • J3 Sort-merge join
  • If the records of R and S are physically sorted
    (ordered) by value of the join attributes A and
    B, respectively, we can implement the join in the
    most efficient way possible.
  • Both files are scanned in order of the join
    attributes, matching the records that have the
    same values for A and B.
  • In this method, the records of each file are
    scanned only once each for matching with the
    other fileunless both A and B are non-key
    attributes, in which case the method needs to be
    modified slightly.

31
Algorithms for JOIN Operations
  • Implementing the JOIN Operation (contd.)
  • Methods for implementing joins
  • J4 Hash-join
  • The records of files R and S are both hashed to
    the same hash file, using the same hashing
    function on the join attributes A of R and B of S
    as hash keys.
  • A single pass through the file with fewer records
    (say, R) hashes its records to the hash file
    buckets.
  • A single pass through the other file (S) then
    hashes each of its records to the appropriate
    bucket, where the record is combined with all
    matching records from R.

32
Algorithms for JOIN Operations
  • Implementing the JOIN Operation (contd.)
  • Factors affecting JOIN performance
  • Available buffer space
  • Join selection factor
  • Choice of inner VS outer relation

33
Buffer Space and Join performance
  • In the nested-loop join, it makes a difference
    which file is chosen for the outer loop and which
    for the inner loop. If EMPLOYEE is used for the
    outer loop, each block of EMPLOYEE is read once,
    and the entire DEPARTMENT file (each of its
    blocks) is read once for each time we read in (nB
    - 2) blocks of the EMPLOYEE file. We get the
    following
  • Total number of blocks accessed for outer file
    bE
  • Number of times ( nB - 2) blocks of outer file
    are loaded ? bE / nB 2 ?
  • Total number of blocks accessed for inner file
    bD ?bE / nB 2 ?
  • Hence, we get the following total number of
    block accesses
  • bE (? bE / nB 2 ? bD) 2000 (? (2000/5)
    ? 10) 6000 blocks
  • On the other hand, if we use the DEPARTMENT
    records in the outer loop, by symmetry we get the
    following total number of block accesses
  • bD (? bD / nB 2 ? bE) 10 (?(10/5) ?
    2000) 4010 blocks

34
Algorithms for JOIN Operations
  • Implementing the JOIN Operation (contd.)
  • Other types of JOIN algorithms
  • Partition hash join
  • Partitioning phase
  • Each file (R and S) is first partitioned into M
    partitions using a partitioning hash function on
    the join attributes 
  • R1 , R2 , R3 , ...... RM and S1 , S2 , S3 ,
    ...... SM
  • Minimum number of in-memory buffers needed for
    the partitioning phase M1.
  • A disk sub-file is created per partition to store
    the tuples for that partition.  
  • Joining or probing phase
  • Involves M iterations, one per partitioned file.
  • Iteration i involves joining partitions Ri and
    Si.

35
Algorithms for JOIN Operations
  • Implementing the JOIN Operation (contd.)
  • Partitioned Hash Join Procedure
  • Assume Ri is smaller than Si.
  • Copy records from Ri into memory buffers.
  • Read all blocks from Si, one at a time and each
    record from Si is used to probe for a matching
    record(s) from partition Si.
  • Write matching record from Ri after joining to
    the record from Si into the result file.

36
Algorithms for JOIN Operations
  • Implementing the JOIN Operation (contd.)
  • Cost analysis of partition hash join
  • Reading and writing each record from R and S
    during the partitioning phase (bR bS),
    (bR bS)
  • Reading each record during the joining
    phase (bR bS)
  • Writing the result of join bRES
  • Total Cost
  • 3 (bR bS) bRES

37
Query Optimization Using Selectivity and Cost
Function
  • Cost-based query optimization
  • Estimate and compare the costs of executing a
    query using different execution strategies and
    choose the strategy with the lowest cost
    estimate.
  • Issues
  • Cost function
  • Number of execution strategies to be considered
  • Compare to heuristic query optimization

38
Query Optimization Using Selectivity and Cost
Function
  • Cost Components for Query Execution
  • Access cost to secondary storage
  • Storage cost
  • Computation cost
  • Memory usage cost
  • Communication cost
  • Note Different database systems may focus on
    different cost components.

39
Query Optimization Using Selectivity and Cost
Function
  • Catalog Information Used in Cost Functions
  • Information about the size of a file
  • number of records (tuples) (r),
  • record size (R),
  • number of blocks (b)
  • blocking factor (bfr)
  • Information about indexes and indexing attributes
    of a file
  • Number of levels (x) of each multilevel index
  • Number of first-level index blocks (bI1)
  • Number of distinct values (d) of an attribute
  • Selectivity (sl) of an attribute
  • Selection cardinality (s) of an attribute. (s
    sl r)

40
Query Optimization Using Selectivity and Cost
Function
  • Examples of Cost Functions for SELECT
  • S1. Linear search (brute force) approach
  • CS1a b
  • For an equality condition on a key, CS1a (b/2)
    if the record is found otherwise CS1a b.
  • S2. Binary search
  • CS2 log2b (s/bfr)? 1
  • For an equality condition on a unique (key)
    attribute, CS2 log2b
  • S3. Using a primary index (S3a) or hash key (S3b)
    to retrieve a single record
  • CS3a x 1 CS3b 1 for static or linear
    hashing
  • CS3b 2 for extendible hashing

41
Query Optimization Using Selectivity and Cost
Function
  • Examples of Cost Functions for SELECT (contd.)
  • S4. Using an ordering index to retrieve multiple
    records
  • For the comparison condition on a key field with
    an ordering index, CS4 x (b/2)
  • S5. Using a clustering index to retrieve multiple
    records
  • CS5 x (s/bfr)
  • S6. Using a secondary (B-tree) index
  • For an equality comparison, CS6a x s
  • For a comparison condition such as gt, lt, gt, or
    lt,
  • CS6a x (bI1/2) (r/2)

42
Query Optimization Using Selectivity and Cost
Function
  • Examples of Cost Functions for SELECT (contd.)
  • S7. Conjunctive selection
  • Use either S1 or one of the methods S2 to S6 to
    solve.
  • For the latter case, use one condition to
    retrieve the records and then check in the memory
    buffer whether each retrieved record satisfies
    the remaining conditions in the conjunction.
  • S8. Conjunctive selection using a composite
    index
  • Same as S3a, S5 or S6a, depending on the type of
    index.
  • Examples of using the cost functions.

43
Query Optimization Using Selectivity and Cost
Function
  • Examples of Cost Functions for JOIN
  • Join selectivity (js)
  • js (R C S) / R x S (R C S)
    / (R S )
  • If condition C does not exist, js 1
  • If no tuples from the relations satisfy condition
    C, js 0
  • Usually, 0 lt js lt 1
  • Size of the result file after join operation
  • (R C S) js R S

44
Query Optimization Using Selectivity and Cost
Function
  • Examples of Cost Functions for JOIN (contd.)
  • J1. Nested-loop join
  • CJ1 bR (bRbS) ((js R S)/bfrRS)
  • (Use R for outer loop)
  • J2. Single-loop join (using an access structure
    to retrieve the matching record(s))
  • If an index exists for the join attribute B of S
    with index levels xB, we can retrieve each record
    s in R and then use the index to retrieve all the
    matching records t from S that satisfy tB
    sA.
  • The cost depends on the type of index.

45
Query Optimization Using Selectivity and Cost
Function
  • Examples of Cost Functions for JOIN (contd.)
  • J2. Single-loop join (contd.)
  • For a secondary index,
  • CJ2a bR (R (xB sB)) ((js R
    S)/bfrRS)
  • For a clustering index,
  • CJ2b bR (R (xB (sB/bfrB))) ((js
    R S)/bfrRS)
  • For a primary index,
  • CJ2c bR (R (xB 1)) ((js R
    S)/bfrRS)
  • If a hash key exists for one of the two join
    attributes B of S
  • CJ2d bR (R h) ((js R S)/bfrRS)
  • J3. Sort-merge join
  • CJ3a CS bR bS ((js R S)/bfrRS)
  • (CS Cost for sorting files)

46
Query Optimization Using Selectivity and Cost
Function
  • Multiple Relation Queries and Join Ordering
  • A query joining n relations will have n-1 join
    operations, and hence can have a large number of
    different join orders when we apply the algebraic
    transformation rules.
  • Current query optimizers typically limit the
    structure of a (join) query tree to that of
    left-deep (or right-deep) trees.
  • Left-deep tree
  • A binary tree where the right child of each
    non-leaf node is always a base relation.
  • Amenable to pipelining
  • Could utilize any access paths on the base
    relation (the right child) when executing the
    join.

47
Query Optimization Using Selectivity and Cost
Function
  • Example

48
Cost-Function Based Query Optimization
  • Example
  • SELECT pnumber, dnum, lname, address, bdate
  • FROM Project, Department, Employee
  • WHERE dnumdnumber AND mgrssnssn AND plocation
    Dhahran
  • Join order selection
  • Project Department Employee
  • Department Project Employee
  • Department Employee Project
  • Employee Department Project

49
Cost-Function Based Query Optimization
50
Traditional Query optimizers
  • Three design components of commercial query
    optimizers
  • Execution space
  • Normally represented as an annotated query tree
  • All trees that compute the query are considered
    legal plans for the query
  • A finite subset of the infinite space of legal
    plans is the search space
  • Cost model
  • It assigns an integer cost to a plan based on
    some assumptions about the statistical
    distribution of data and the abstract machine
  • Search algorithm
  • Algorithm used to search the search space for the
    plan with minimum cost. For example, search
    algorithm for System R is dynamic programming

51
A Model for Parallel Execution
  • Two possible sources of parallelism
  • Inter-operator parallelism
  • Data independent like sub-trees and data
    dependent like pipelined
  • Intra-operator parallelism
  • Like sort
  • Two deterrents of parallelism
  • Data dependencies between operators
  • Resource contention
  • Several operators running in parallel and
    competing for the same resources
  • Operator tree
  • Operator tree has nodes that are atomic, i.e.,
    run-time scheduler cannot subdivide the operation
    further

52
Dynamic Programming Algorithm
  • DP Algorithm for left-Deep join trees
  • Input An SPJ query q on relations R1, R2, , Rn
  • Output A query plan Popt for q with optimal work
  • for i 1 to n do
  • optPlan(Ri) accessPlan(Ri)
  • for i 2 to n do
  • for all S ? R1, R2, , Rn s.t. S i do
  • bestPlan dummy plan with infinite cost
  • for all Rj, Sj s.t. S Rj ? Sj do
  • p joinPlan(optPlan(Sj) , Rj)
  • if p ? work bestPlan then bestPlan p
  • optPlan(S) bestPlan
  • Popt optPlan( R1, R2, ., Rn )
  • Principle of optimality An optimal plan for a
    set of relations is an extension of an optimal
    plan for some subset of the set

53
Failure of Dynamic Programming
  • Fundamental assumptions in DP
  • Principle of optimality
  • p1 cost p2 ? (?i) joinplan(p1, Ri) cost
    joinplan(p2, Ri)
  • Total order
  • (?p1p2) not(p1 cost p2) ? (p2 cost p1)
  • DP and response time
  • Any reasonable cost model for response time will
    violate the principle of optimality
  • Generalization of DP for partial orders

54
Overview of Query Optimization in Oracle
  • Rule-based query optimization the optimizer
    chooses execution plans based on heuristically
    ranked operations.
  • May be phased out
  • Cost-based query optimization the optimizer
    examines alternative access paths and operator
    algorithms and chooses the execution plan with
    lowest estimate cost.
  • The query cost is calculated based on the
    estimated usage of resources such as I/O, CPU and
    memory needed.
  • Application developers could specify hints to the
    ORACLE query optimizer.
  • application developer might know more information
    about the data.
  • SELECT / ...hint... / rest of query
  • SELECT / index(t1 t1_abc) index(t2 t2_abc) /
    COUNT()FROM t1, t2WHERE t1.col1 t2.col1

55
Query Processing and Optimization in Oracle RDB
  • Applications using
  • Embedded SQL
  • SQL module language
  • Dynamic SQL
  • Interactive SQL
  • Distributed and heterogeneous query processing
  • DEC DB Integrator (also known as Access Works)
  • Sub-queries can be submitted to other DBMSs (DB2,
    Sybase
  • DEC DB Integrator has its own query optimizer

56
Query Processing and Optimization in Oracle RDB
  • Oracle query processor designed for wide range of
    queries from simple to very complex
  • Queries with up to 128 joins can be evaluated
  • Query processor consists of two parts
  • Query compiler
  • Query executor
  • Each part has its own optimizer
  • Query transformation
  • Takes place in the SQL front-end
  • SQL text to polish notation to linked structure
    of internal blocks
  • Other transformations take place during the other
    stages like code generation

57
Query Processing and Optimization in Oracle RDB
  • Execution plan selection
  • To determine a join order
  • Exhaustive search
  • Left-deep tree of two-way joins for each
    sub-query
  • The least expensive join order for each sub-query
    starting from innermost
  • Execution plan selection based on the cost model
  • Join strategies
  • Supports nested-loop join
  • Two-way merge join works on two sorted streams

58
Query Processing and Optimization in Oracle RDB
  • Single table scan
  • Sequential scan
  • Scan of one or more indexes
  • Direct access of a row by the row identifier
    (RID)
  • Dynamic optimization
  • Advanced features
  • BLOBs
  • Triggers
  • NOT NULL and UNIQUE constraints are not evaluated
    when a row is deleted from a table

59
Semantic Query Optimization
  • Semantic query optimization uses constraints
    specified in the database schema
  • The technique may be used in combination with
    other techniques
  • Example
  • SELECT E.LNAME, M.LNAME
  • FROM EMPLOYEE AS E, EMPLOYEE AS M
  • WHERE E.SUPERSSNM.SSN AND E.SALARY gtM.SALARY
  • Constraint --- no employee can earn more than his
    manager
  • No need to execute the query
  • Searching through the constraints to find out
    which one is applicable !!!

60
References
  • Required
  • Textbook Chapter 15
  • Execution Strategies for SQL Subqueries
    Mostafa Elhamali et al SIGMOD07 June 12-14,
    2007, Beijing China pp 993-1004
  • Recommended
  • Adaptive Query Processing Why, How, When, What
    next Amol Deshpande et al, SIGMOD 2006, pp
    806-807.
  • Efficient and Extensible Algorithms for Multi
    Query OptimizationsPrasan Rop, S. Seshadri, S.
    Sudarshan, and S. bhobeACM SIGMOD Record volume
    29, Number 2, June 2000, pp 249 260
  • Query Processing and Optimization in Oracle
    RDBGennady Antoshenkov and Mohamed ZiauddinThe
    VLDB Journal Springer-Verlag, VLDB(1996) 5 229
    237

61
Summary
  • Background review
  • Processing a query
  • SQL queries and relational algebra
  • Implementing basic query operations
  • Heuristics-based query optimization
  • Cost-function-based query optimization
  • Query Processing and Optimization in Oracle RDB
Write a Comment
User Comments (0)
About PowerShow.com