Query Processing and Optimization Dr. Muhammad Shafique

About This Presentation

Title:

Query Processing and Optimization Dr. Muhammad Shafique

Description:

Title: Semantic Consistency in Information Exchange Author: John C Mitchell Last modified by: Muhammad Shafique Created Date: 9/7/1997 8:51:32 PM Document ... – PowerPoint PPT presentation

Number of Views:851

Avg rating:3.0/5.0

Slides: 62

Provided by: johnc313

Category:

more less

Transcript and Presenter's Notes

Title: Query Processing and Optimization Dr. Muhammad Shafique

1
Query Processing and Optimization Dr. Muhammad
Shafique
2
Outline

Background review
Processing a query
SQL queries and relational algebra
Implementing basic query operations
Heuristics-based query optimization
Cost-function-based query optimization
Semantic-based query optimization
Query optimization in Oracle DBMS

3
Processing a Query

Query processing and query optimization
Problem formulation
Given a query q, a space of execution plans, E,
and a cost function, cost(p) that assigns a
numeric cost to an execution plan p ? E, find the
minimum cost execution plan that computes q.

4
Processing a Query

Typical steps in processing a high-level query
Query in a high-level query language like SQL
Scanning, parsing, and validation
Intermediate-form of query like query tree
Query optimizer
Execution plan
Query code generator
Object-code for the query
Run-time database processor
Results of query

5
SQL Queries and Relational Algebra

SQL query is translated into an equivalent
extended relational algebra expression ---
represented as a query tree
In order to transform a given query into a query
tree, the query is decomposed into query blocks
A query block contains a single
SELECT-FROM-WHERE expression along with GROUP-BY
and HAVING clauses.
The query optimizer chooses an execution plan for
each block

6
(No Transcript)
7
(No Transcript)
8
SQL Queries and Relational Algebra

Example
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary gt ( SELECT MAX(Salary)
FROM EMPLOYEE
WHERE Dno 5 )

9
Translating SQL Queries into Relational Algebra
(1)

SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY gt ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO 5)

SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO 5
SELECT LNAME, FNAME FROM EMPLOYEE WHERE
SALARY gt C
pLNAME, FNAME (sSALARYgtC(EMPLOYEE))
FMAX SALARY (sDNO5 (EMPLOYEE))
10
SQL Queries and Relational Algebra (2)

Uncorrelated nested queries Vs Correlated nested
queries
Example
Retrieve the name of each employee who works on
all the projects controlled by department number
5.
SELECT FNAME, LNAME FROM EMPLOYEE WHERE
( (SELECT PNO FROM WORKS_ON
WHERE SSNESSN) CONTAINS
(SELECT PNUMBER FROM PROJECT
WHERE DNUM5) )

11
SQL Queries and Relational Algebra (3)

Example
For every project located in Stafford,
retrieve the project number, the controlling
department number and the department managers
last name, address and birthdate.
SQL query
SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
E.BDATE
FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUMD.DNUMBER AND D.MGRSSNE.SSN
AND P.PLOCATIONSTAFFORD
Relation algebra
?PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((?PLOCATIONSTAFFORD(PROJECT)) DNUMDNUMBER
(DEPARTMENT)) MGRSSNSSN (EMPLOYEE))

12
SQL Queries and Relational Algebra (4)
13
Implementing Basic Query Operations

An RDBMS must provide implementation(s) for all
the required operations including relational
operators and more
External sorting
Sort-merge strategy
Sorting phase
Number of file blocks (b)
Number of available buffers (nB)
Runs --- (b / nB)
Merging phase --- passes
Degree of merging --- the number of runs that are
merged together in each pass

14
Algorithms for External Sorting (1)

External sorting
Refers to sorting algorithms that are suitable
for large files of records stored on disk that do
not fit entirely in main memory, such as most
database files.
Sort-Merge strategy
Starts by sorting small subfiles (runs) of the
main file and then merges the sorted runs,
creating larger sorted subfiles that are merged
in turn.

15
Algorithms for External Sorting (2)
16
Algorithms for External Sorting (3)

Analysis
Number of file blocks b
Number of initial runs nR
Available buffer space nB
Sorting phase nR ?(b/nB)?
Degree of merging dM Min (nB-1, nR)
Number of passes nP ?(logdM(nR))?
Number of block accesses (2 b) (2 b
(logdM(nR)))
Example done in the class

17
Heuristic-Based Query Optimization

Query tree and query transformations
General transformation rules for relational
algebra operations

18
General Transformation Rules for Relational
Algebra Operations

Cascade of ? A conjunctive selection condition
can be broken up into a cascade (that is, a
sequence) of individual ? operations ? C1 AND
C2 AND .AND Cn (R) ? C1 (?C2( (?Cn(R)))
Commutativity of ? The ? operation is
commutative ?C1(?C2(R)) ?C2(?C1(R))
Cascade of ? In a cascade (sequence) of ?
operations, all but the last one can be ignored
Commuting ? with ? If the selection condition
c involves only those attributes A1, ..., An in
the projection list, the two operations can be
commuted
And more

19
Heuristic-Based Query Optimization

Outline of heuristic algebraic optimization
algorithm
Break up SELECT operations with conjunctive
conditions into a cascade of SELECT operations
Using the commutativity of SELECT with other
operations, move each SELECT operation as far
down the query tree as is permitted by the
attributes involved in the select condition
Using commutativity and associativity of binary
operations, rearrange the leaf nodes of the tree
Combine a CARTESIAN PRODUCT operation with a
subsequent SELECT operation in the tree into a
JOIN operation, if the condition represents a
join condition
Using the cascading of PROJECT and the commuting
of PROJECT with other operations, break down and
move lists of projection attributes down the tree
as far as possible by creating new PROJECT
operations as needed
Identify sub-trees that represent groups of
operations that can be executed by a single
algorithm

20
Heuristic-Based Query Optimization Example

Query
"Find the last names of employees born after
1957 who work on a project named Aquarius."
SQL
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAMEAquarius AND PNUMBERPNO AND
ESSNSSN AND BDATE.1957-12-31

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Implementing Basic Query Operations

Combining operations using pipelining
Temporary files based processing
Pipelining or stream-based processing
Example consider the execution of the following
query
?list of attributes( (? c1(R) ? (? c2 (S))

27
Implementing Basic Query Operations

Estimates of selectivity
Selectivity is the ratio of the number of tuples
that satisfy the condition to the total number of
tuples in the relation.
SELECT ( ? ) operator implementation
Linear search
Binary search
Using a primary index (or hash key)
Using primary index to retrieve multiple records
Using clustering index to retrieve multiple
records
Using a secondary index on an equality comparison
Conjunctive selection using an individual index
Conjunctive selection using a composite index
Conjunctive selection by intersection of record
pointers

28
Algorithms for JOIN Operations

Implementing the JOIN Operation
Join (EQUIJOIN, NATURAL JOIN)
twoway join a join on two files
e.g. R AB S
multi-way joins joins involving more than two
files.
e.g. R AB S CD T
Examples
(OP6) EMPLOYEE DNODNUMBER DEPARTMENT
(OP7) DEPARTMENT MGRSSNSSN EMPLOYEE

29
Algorithms for JOIN Operations

Implementing the JOIN Operation (contd.)
Methods for implementing joins
J1 Nested-loop join (brute force)
For each record t in R (outer loop), retrieve
every record s from S (inner loop) and test
whether the two records satisfy the join
condition tA sB.
J2 Single-loop join (Using an access structure to
retrieve the matching records)
If an index (or hash key) exists for one of the
two join attributes say, B of S retrieve each
record t in R, one at a time, and then use the
access structure to retrieve directly all
matching records s from S that satisfy sB
tA.

30
Algorithms for JOIN Operations

Implementing the JOIN Operation (contd.)
Methods for implementing joins
J3 Sort-merge join
If the records of R and S are physically sorted
(ordered) by value of the join attributes A and
B, respectively, we can implement the join in the
most efficient way possible.
Both files are scanned in order of the join
attributes, matching the records that have the
same values for A and B.
In this method, the records of each file are
scanned only once each for matching with the
other fileunless both A and B are non-key
attributes, in which case the method needs to be
modified slightly.

31
Algorithms for JOIN Operations

Implementing the JOIN Operation (contd.)
Methods for implementing joins
J4 Hash-join
The records of files R and S are both hashed to
the same hash file, using the same hashing
function on the join attributes A of R and B of S
as hash keys.
A single pass through the file with fewer records
(say, R) hashes its records to the hash file
buckets.
A single pass through the other file (S) then
hashes each of its records to the appropriate
bucket, where the record is combined with all
matching records from R.

32
Algorithms for JOIN Operations

Implementing the JOIN Operation (contd.)
Factors affecting JOIN performance
Available buffer space
Join selection factor
Choice of inner VS outer relation

33
Buffer Space and Join performance

In the nested-loop join, it makes a difference
which file is chosen for the outer loop and which
for the inner loop. If EMPLOYEE is used for the
outer loop, each block of EMPLOYEE is read once,
and the entire DEPARTMENT file (each of its
blocks) is read once for each time we read in (nB
- 2) blocks of the EMPLOYEE file. We get the
following
Total number of blocks accessed for outer file
bE
Number of times ( nB - 2) blocks of outer file
are loaded ? bE / nB 2 ?
Total number of blocks accessed for inner file
bD ?bE / nB 2 ?
Hence, we get the following total number of
block accesses
bE (? bE / nB 2 ? bD) 2000 (? (2000/5)
? 10) 6000 blocks
On the other hand, if we use the DEPARTMENT
records in the outer loop, by symmetry we get the
following total number of block accesses
bD (? bD / nB 2 ? bE) 10 (?(10/5) ?
2000) 4010 blocks

34
Algorithms for JOIN Operations

Implementing the JOIN Operation (contd.)
Other types of JOIN algorithms
Partition hash join
Partitioning phase
Each file (R and S) is first partitioned into M
partitions using a partitioning hash function on
the join attributes
R1 , R2 , R3 , ...... RM and S1 , S2 , S3 ,
...... SM
Minimum number of in-memory buffers needed for
the partitioning phase M1.
A disk sub-file is created per partition to store
the tuples for that partition.
Joining or probing phase
Involves M iterations, one per partitioned file.
Iteration i involves joining partitions Ri and
Si.

35
Algorithms for JOIN Operations

Implementing the JOIN Operation (contd.)
Partitioned Hash Join Procedure
Assume Ri is smaller than Si.
Copy records from Ri into memory buffers.
Read all blocks from Si, one at a time and each
record from Si is used to probe for a matching
record(s) from partition Si.
Write matching record from Ri after joining to
the record from Si into the result file.

36
Algorithms for JOIN Operations

Implementing the JOIN Operation (contd.)
Cost analysis of partition hash join
Reading and writing each record from R and S
during the partitioning phase (bR bS),
(bR bS)
Reading each record during the joining
phase (bR bS)
Writing the result of join bRES
Total Cost
3 (bR bS) bRES

37
Query Optimization Using Selectivity and Cost
Function

Cost-based query optimization
Estimate and compare the costs of executing a
query using different execution strategies and
choose the strategy with the lowest cost
estimate.
Issues
Cost function
Number of execution strategies to be considered
Compare to heuristic query optimization

38
Query Optimization Using Selectivity and Cost
Function

Cost Components for Query Execution
Access cost to secondary storage
Storage cost
Computation cost
Memory usage cost
Communication cost
Note Different database systems may focus on
different cost components.

39
Query Optimization Using Selectivity and Cost
Function

Catalog Information Used in Cost Functions
Information about the size of a file
number of records (tuples) (r),
record size (R),
number of blocks (b)
blocking factor (bfr)
Information about indexes and indexing attributes
of a file
Number of levels (x) of each multilevel index
Number of first-level index blocks (bI1)
Number of distinct values (d) of an attribute
Selectivity (sl) of an attribute
Selection cardinality (s) of an attribute. (s
sl r)

40
Query Optimization Using Selectivity and Cost
Function

Examples of Cost Functions for SELECT
S1. Linear search (brute force) approach
CS1a b
For an equality condition on a key, CS1a (b/2)
if the record is found otherwise CS1a b.
S2. Binary search
CS2 log2b (s/bfr)? 1
For an equality condition on a unique (key)
attribute, CS2 log2b
S3. Using a primary index (S3a) or hash key (S3b)
to retrieve a single record
CS3a x 1 CS3b 1 for static or linear
hashing
CS3b 2 for extendible hashing

41
Query Optimization Using Selectivity and Cost
Function

Examples of Cost Functions for SELECT (contd.)
S4. Using an ordering index to retrieve multiple
records
For the comparison condition on a key field with
an ordering index, CS4 x (b/2)
S5. Using a clustering index to retrieve multiple
records
CS5 x (s/bfr)
S6. Using a secondary (B-tree) index
For an equality comparison, CS6a x s
For a comparison condition such as gt, lt, gt, or
lt,
CS6a x (bI1/2) (r/2)

42
Query Optimization Using Selectivity and Cost
Function

Examples of Cost Functions for SELECT (contd.)
S7. Conjunctive selection
Use either S1 or one of the methods S2 to S6 to
solve.
For the latter case, use one condition to
retrieve the records and then check in the memory
buffer whether each retrieved record satisfies
the remaining conditions in the conjunction.
S8. Conjunctive selection using a composite
index
Same as S3a, S5 or S6a, depending on the type of
index.
Examples of using the cost functions.

43
Query Optimization Using Selectivity and Cost
Function

Examples of Cost Functions for JOIN
Join selectivity (js)
js (R C S) / R x S (R C S)
/ (R S )
If condition C does not exist, js 1
If no tuples from the relations satisfy condition
C, js 0
Usually, 0 lt js lt 1
Size of the result file after join operation
(R C S) js R S

44
Query Optimization Using Selectivity and Cost
Function

Examples of Cost Functions for JOIN (contd.)
J1. Nested-loop join
CJ1 bR (bRbS) ((js R S)/bfrRS)
(Use R for outer loop)
J2. Single-loop join (using an access structure
to retrieve the matching record(s))
If an index exists for the join attribute B of S
with index levels xB, we can retrieve each record
s in R and then use the index to retrieve all the
matching records t from S that satisfy tB
sA.
The cost depends on the type of index.

45
Query Optimization Using Selectivity and Cost
Function

Examples of Cost Functions for JOIN (contd.)
J2. Single-loop join (contd.)
For a secondary index,
CJ2a bR (R (xB sB)) ((js R
S)/bfrRS)
For a clustering index,
CJ2b bR (R (xB (sB/bfrB))) ((js
R S)/bfrRS)
For a primary index,
CJ2c bR (R (xB 1)) ((js R
S)/bfrRS)
If a hash key exists for one of the two join
attributes B of S
CJ2d bR (R h) ((js R S)/bfrRS)
J3. Sort-merge join
CJ3a CS bR bS ((js R S)/bfrRS)
(CS Cost for sorting files)

46
Query Optimization Using Selectivity and Cost
Function

Multiple Relation Queries and Join Ordering
A query joining n relations will have n-1 join
operations, and hence can have a large number of
different join orders when we apply the algebraic
transformation rules.
Current query optimizers typically limit the
structure of a (join) query tree to that of
left-deep (or right-deep) trees.
Left-deep tree
A binary tree where the right child of each
non-leaf node is always a base relation.
Amenable to pipelining
Could utilize any access paths on the base
relation (the right child) when executing the
join.

47
Query Optimization Using Selectivity and Cost
Function

Example

48
Cost-Function Based Query Optimization

Example
SELECT pnumber, dnum, lname, address, bdate
FROM Project, Department, Employee
WHERE dnumdnumber AND mgrssnssn AND plocation
Dhahran
Join order selection
Project Department Employee
Department Project Employee
Department Employee Project
Employee Department Project

49
Cost-Function Based Query Optimization
50
Traditional Query optimizers

Three design components of commercial query
optimizers
Execution space
Normally represented as an annotated query tree
All trees that compute the query are considered
legal plans for the query
A finite subset of the infinite space of legal
plans is the search space
Cost model
It assigns an integer cost to a plan based on
some assumptions about the statistical
distribution of data and the abstract machine
Search algorithm
Algorithm used to search the search space for the
plan with minimum cost. For example, search
algorithm for System R is dynamic programming

51
A Model for Parallel Execution

Two possible sources of parallelism
Inter-operator parallelism
Data independent like sub-trees and data
dependent like pipelined
Intra-operator parallelism
Like sort
Two deterrents of parallelism
Data dependencies between operators
Resource contention
Several operators running in parallel and
competing for the same resources
Operator tree
Operator tree has nodes that are atomic, i.e.,
run-time scheduler cannot subdivide the operation
further

52
Dynamic Programming Algorithm

DP Algorithm for left-Deep join trees
Input An SPJ query q on relations R1, R2, , Rn
Output A query plan Popt for q with optimal work
for i 1 to n do
optPlan(Ri) accessPlan(Ri)
for i 2 to n do
for all S ? R1, R2, , Rn s.t. S i do
bestPlan dummy plan with infinite cost
for all Rj, Sj s.t. S Rj ? Sj do
p joinPlan(optPlan(Sj) , Rj)
if p ? work bestPlan then bestPlan p
optPlan(S) bestPlan
Popt optPlan( R1, R2, ., Rn )
Principle of optimality An optimal plan for a
set of relations is an extension of an optimal
plan for some subset of the set

53
Failure of Dynamic Programming

Fundamental assumptions in DP
Principle of optimality
p1 cost p2 ? (?i) joinplan(p1, Ri) cost
joinplan(p2, Ri)
Total order
(?p1p2) not(p1 cost p2) ? (p2 cost p1)
DP and response time
Any reasonable cost model for response time will
violate the principle of optimality
Generalization of DP for partial orders

54
Overview of Query Optimization in Oracle

Rule-based query optimization the optimizer
chooses execution plans based on heuristically
ranked operations.
May be phased out
Cost-based query optimization the optimizer
examines alternative access paths and operator
algorithms and chooses the execution plan with
lowest estimate cost.
The query cost is calculated based on the
estimated usage of resources such as I/O, CPU and
memory needed.
Application developers could specify hints to the
ORACLE query optimizer.
application developer might know more information
about the data.
SELECT / ...hint... / rest of query
SELECT / index(t1 t1_abc) index(t2 t2_abc) /
COUNT()FROM t1, t2WHERE t1.col1 t2.col1

55
Query Processing and Optimization in Oracle RDB

Applications using
Embedded SQL
SQL module language
Dynamic SQL
Interactive SQL
Distributed and heterogeneous query processing
DEC DB Integrator (also known as Access Works)
Sub-queries can be submitted to other DBMSs (DB2,
Sybase
DEC DB Integrator has its own query optimizer

56
Query Processing and Optimization in Oracle RDB

Oracle query processor designed for wide range of
queries from simple to very complex
Queries with up to 128 joins can be evaluated
Query processor consists of two parts
Query compiler
Query executor
Each part has its own optimizer
Query transformation
Takes place in the SQL front-end
SQL text to polish notation to linked structure
of internal blocks
Other transformations take place during the other
stages like code generation

57
Query Processing and Optimization in Oracle RDB

Execution plan selection
To determine a join order
Exhaustive search
Left-deep tree of two-way joins for each
sub-query
The least expensive join order for each sub-query
starting from innermost
Execution plan selection based on the cost model
Join strategies
Supports nested-loop join
Two-way merge join works on two sorted streams

58
Query Processing and Optimization in Oracle RDB

Single table scan
Sequential scan
Scan of one or more indexes
Direct access of a row by the row identifier
(RID)
Dynamic optimization
Advanced features
BLOBs
Triggers
NOT NULL and UNIQUE constraints are not evaluated
when a row is deleted from a table

59
Semantic Query Optimization

Semantic query optimization uses constraints
specified in the database schema
The technique may be used in combination with
other techniques
Example
SELECT E.LNAME, M.LNAME
FROM EMPLOYEE AS E, EMPLOYEE AS M
WHERE E.SUPERSSNM.SSN AND E.SALARY gtM.SALARY
Constraint --- no employee can earn more than his
manager
No need to execute the query
Searching through the constraints to find out
which one is applicable !!!

60
References

Required
Textbook Chapter 15
Execution Strategies for SQL Subqueries
Mostafa Elhamali et al SIGMOD07 June 12-14,
2007, Beijing China pp 993-1004
Recommended
Adaptive Query Processing Why, How, When, What
next Amol Deshpande et al, SIGMOD 2006, pp
806-807.
Efficient and Extensible Algorithms for Multi
Query OptimizationsPrasan Rop, S. Seshadri, S.
Sudarshan, and S. bhobeACM SIGMOD Record volume
29, Number 2, June 2000, pp 249 260
Query Processing and Optimization in Oracle
RDBGennady Antoshenkov and Mohamed ZiauddinThe
VLDB Journal Springer-Verlag, VLDB(1996) 5 229
237

61
Summary

Background review
Processing a query
SQL queries and relational algebra
Implementing basic query operations
Heuristics-based query optimization
Cost-function-based query optimization
Query Processing and Optimization in Oracle RDB

Write a Comment

User Comments (0)

About PowerShow.com

Query Processing and Optimization Dr. Muhammad Shafique - PowerPoint PPT Presentation

Query Processing and Optimization Dr. Muhammad Shafique

Title: Semantic Consistency in Information Exchange Author: John C Mitchell Last modified by: Muhammad Shafique Created Date: 9/7/1997 8:51:32 PM Document ... – PowerPoint PPT presentation