Title: CS263 Lecture 19 Query Optimisation
1CS263Lecture 19 Query Optimisation
2LECTURE PLAN
- Motivation for Query Optimisation
- Phases of Query Processing
- Query Trees
- RA Transformation Rules
- Heuristic Processing Strategies
- Cost Estimation for RA Operations
3Motivation for Query Optimisation
- List all the managers that work in the sales
department. - SELECT
- FROM emp, dept
- WHERE emp.deptno dept.deptno
- AND emp.job Manager
- AND dept.name Sales
There are at least three alternative ways of
representing this query as a Relational Algebra
expression.
4Motivation for Query Optimisation
Metrics 1000 tuples in the EMP relation 50
tuples in the DEPT relation 50 employees are
Managers (one per department) 5 separate Sales
departments (across the country)
Cost of processing the following query alternate
5Motivation for Query Optimisation
Metrics 1000 tuples in the EMP relation 50
tuples in the DEPT relation 50 employees are
Managers (one per department) 5 separate Sales
departments (across the country)
Cost of processing the following query alternate
6Motivation for Query Optimisation
Cost of processing the following query
7Phases of Query Processing
8Query Processing Stage - 1
- Cast the query into internal form
- This involves the conversion of the original
(SQL) query into some internal representation
more suitable for machine manipulation. - The internal representation typically chosen is
either some kind of abstract syntax tree, or a
relational algebra query tree.
9Relational Algebra Query Trees
- A Relational Algebra query can be represented as
a query tree. For example the query to list all
the managers that work in the sales department
could be described as one of the following
Root
Intermediate operations
Leaves
10Relational Algebra Query Trees
- A Relational Algebra query can be represented as
a query tree. For example the query to list all
the managers that work in the sales department
could be described as one of the following
Root
Intermediate operations
Leaves
11Relational Algebra Query Trees
Alternativequery tree for the query to list all
the managers that work in the sales department
?(job Manager) Ù (nameSales) (EMP
emp.deptno dept.deptno DEPT)
12Relational Algebra Query Trees
- Alternativequery tree for the query to list all
the managers that work in the sales department
13Query Processing Stage - 2
- Convert to canonical form
- Find a more efficient representation of the
query by converting the internal representation
into some equivalent (canonical) form through the
application of a set of well-defined
transformation rules. - The set of transformation rules to apply will
generally be the result of the application of
specific heuristic processing strategies
associated with particular DBMSs.
14Transformation Rules for RA Operations
- Conjunctive selection operations can cascade into
individual selection operations (and vice versa). - Sometimes referred to as cascade of selection.
- spÙqÙr(R) sp(sq(sr(R)))
-
- Example
- sdeptno10 Ù salgt1000(Emp)
sdeptno10(ssalgt1000(Emp))
15Transformation Rules for RA Operations
- Commutativity of selection
-
- sp(sq(R)) sq(sp(R))
-
- Example
- ssalgt1000(sdeptno10(Emp))
sdeptno10(ssalgt1000(Emp))
16Transformation Rules for RA Operations
- In a sequence of projection operations, only the
last in the sequence is required. - PLPM PN(R) PL (R)
-
- Example
- PdeptnoPname(Dept) Pdeptno (Dept))
17Transformation Rules for RA Operations
- Commutativity of selection and projection.
- PAi, , Am(sp(R)) sp(PAi, , Am(R))
- where p ÎA1, A2, , Am
-
- Example
- Pname, job(snameSmith(Emp))
snameSmith'(Pname, job(Staff))
Selection predicate (p) is only made up of
projected attributes
18Transformation Rules for RA Operations
- Commutativity of theta-join (and Cartesian
product). - R p S S p R
NOTE Theta-join is a generalisation of both the
equi-join and natural-join
R X S S X R
Example
EMP emp.deptno dept.deptno DEPT
DEPT emp.deptno dept.deptno EMP
19Transformation Rules for RA Operations
- Commutativity of selection and theta-join
- (or Cartesian product).
Selection predicate (p) is only made up of join
attributes
Example
(semp.deptno10 (EMP)) emp.deptno
dept.deptno DEPT
semp.deptno10 (EMP emp.deptno
dept.deptno DEPT)
20Transformation Rules for RA Operations
- Commutativity of projection and theta-join
- (or Cartesian product).
PL(R r S) (PL1(R)) r (PL2(S))
Project attributes L L1 ? L2, where L1 are
attributes of R, and L2 are attributes of S. L
will also contain the join attributes
Example
P job, location, deptno (EMP emp.deptno
dept.deptno DEPT)
(P job, deptno (EMP)) emp.deptno
dept.deptno (P location, deptno (DEPT))
21Transformation Rules for RA Operations
- Commutativity of union and intersection
- (but not set difference).
- R È S S È R
- R Ç S S Ç R
-
22Transformation Rules for RA Operations
- Commutativity of selection and set operations
(union, intersection, and set difference). -
- Union
- sp(R È S) sp(S) È sp(R)
- Intersection
- sp(R Ç S) sp(S) Ç sp(R)
-
- Set Difference
- sp(R - S) sp(S) - sp(R)
-
23Transformation Rules for RA Operations
- 10 Commutativity of projection and union
-
- PL(R È S) PL(S) È PL(R)
-
24Transformation Rules for RA Operations
- 11 Associativity of natural join (and Cartesian
product) -
- Natural Join
- (R S) T R (S T)
- Cartesian Product
- (R X S) X T R X (S X T)
-
-
25Transformation Rules for RA Operations
- 12 Associativity of union and intersection (but
not set difference) -
- Union
- (R È S) È T S È (R È T)
- Intersection
- (R Ç S) Ç T S Ç (R Ç T)
-
-
26Heuristic Processing Strategies
- Perform selection operations as early as possible
- Translate a Cartesian product and subsequent
selection (whose predicate represents a join
condition) into a join operation. - Use associativity of binary operations to ensure
that the most restrictive selection operations
are executed first - Perform projections as early as possible.
- Compute common expressions once
27Heuristic Processing - Example
28Query Processing Stage - 3
- Choose candidate low-level procedures
- Consider the (optimised canonical) query as a
series of low-level operations (join, restrict,
etc). - For each of these operations generate alternative
execution strategies and calculate the cost of
such strategies on the basis of statistical
information held about the database tables
(files).
29Query Processing Stage - 4
- Generate query plans and choose the cheapest
- Construct a set of candidate Query Execution
Plans (QEPs). - Each QEP is constructed by selecting a candidate
implementation procedure for each operation in
the canonical query and then combining them to
form a string of associated operations. - Each QEP will have an (estimated) cost associated
with it the sum of the cost of each of its
operations. - Choose the QEP with the least cost.
30Cost Based Optimisation
- Cost Based Optimisation (stages 3 4)
- A good declarative query optimiser does not rely
solely on heuristic processing strategies. - It chooses the QEP with the lowest estimated
cost. - After heuristic rules are applied to a query,
there still remains a number of alternative ways
to execute it . - The Query Optimiser estimates the cost of
executing each one (or at least a number) of
these alternatives, and selects the cheapest one.
31Costs associated with query execution
- Secondary storage access costs
- Searching for data blocks on disk,
- Reading data blocks from disk
- Writing data block to disk
- Storage costs
- Cost of storing intermediate (temp) files
- Computation costs
- Cost of CPU usage
- Main memory usage costs
- Cost of buffering data
- Communication costs
- Cost of moving data across
32Database statistics used in cost estimation
- Information held on each relation
- number of tuples
- number of blocks
- blocking factor
- primary access method
- primary access attributes
- secondary indexes
- secondary indexing attributes
- number of levels for each index
- number of distinct values of each attribute
33Physical Data Structures File Types
- Heap (Sequential, Unordered)
- no key columns
- queries, other than appends, scan every page
- rows are appended at the end
- duplicate rows are allowed
- Ordered
- physically sorted data file with no index
- Hash (Random, Direct)
- data is located based on the (calculated) value
of a hash field (key) - Indexed Sequential (ISAM)
- sorted data file with a primary index
- BTree
- dynamic multilevel index
- reuses deleted space on associated data pages
34Strategies for implementing the RESTRICT operation
- Different access strategies dependant upon the
structure of the file in which the relation is
stored, and whether the predicate attribute(s)
have been indexed/hashed Each uses a different
cost algorithm (which refers to specific database
statistics). - Linear Search (Heap)
- Binary Search (Ordered)
- Equality on Hash Key
- Equality condition on primary key
- Inequality condition on primary key
- Equality condition on secondary index
- Inequality condition on secondary BTree index
- If the selection predicate is a composite (AND
OR) then there are additional cost
considerations!
35Strategies for implementing the JOIN operation
- Different access strategies dependant upon the
structure of the files in which the relations to
be joined are stored, and whether the join
attributes have been indexed/hashed Each uses
its own cost algorithm (which refers to specific
database statistics). - Block nested loop join
- Indexed nested loop join
- Sort-merge join
- Hash join
36Query Optimisation Summary
- The aims of query processing are to transform a
query written in a high-level language (SQL),
into a correct and efficient execution strategy
expressed in a low-level language (Relational
Algebra), and to execute the strategy to retrieve
the required data. - There are many equivalent transformations of the
same high-level query, the DBMS has to choose the
one that minimises resource usage. - There are two main techniques for query
optimisation. The first uses heuristic rules that
order the operations in a query. The second
compares different execution strategies for those
operations, based on their relative costs, and
selects the least resource intensive (cheapest)
ones.