CS263 Lecture 19 Query Optimisation - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

CS263 Lecture 19 Query Optimisation

Description:

CS263. Lecture 19. Query Optimisation. Motivation for Query ... sdeptno=10 sal 1000(Emp) = sdeptno=10(ssal 1000(Emp)) Transformation Rules for RA Operations ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 37

Provided by: RaghuRama78

Category:

more less

Transcript and Presenter's Notes

Title: CS263 Lecture 19 Query Optimisation

1
CS263Lecture 19 Query Optimisation
2
LECTURE PLAN

Motivation for Query Optimisation
Phases of Query Processing
Query Trees
RA Transformation Rules
Heuristic Processing Strategies
Cost Estimation for RA Operations

3
Motivation for Query Optimisation

List all the managers that work in the sales
department.
SELECT
FROM emp, dept
WHERE emp.deptno dept.deptno
AND emp.job Manager
AND dept.name Sales

There are at least three alternative ways of
representing this query as a Relational Algebra
expression.
4
Motivation for Query Optimisation
Metrics 1000 tuples in the EMP relation 50
tuples in the DEPT relation 50 employees are
Managers (one per department) 5 separate Sales
departments (across the country)
Cost of processing the following query alternate
5
Motivation for Query Optimisation
Metrics 1000 tuples in the EMP relation 50
tuples in the DEPT relation 50 employees are
Managers (one per department) 5 separate Sales
departments (across the country)
Cost of processing the following query alternate
6
Motivation for Query Optimisation
Cost of processing the following query
7
Phases of Query Processing
8
Query Processing Stage - 1

Cast the query into internal form
This involves the conversion of the original
(SQL) query into some internal representation
more suitable for machine manipulation.
The internal representation typically chosen is
either some kind of abstract syntax tree, or a
relational algebra query tree.

9
Relational Algebra Query Trees

A Relational Algebra query can be represented as
a query tree. For example the query to list all
the managers that work in the sales department
could be described as one of the following

Root
Intermediate operations
Leaves
10
Relational Algebra Query Trees

A Relational Algebra query can be represented as
a query tree. For example the query to list all
the managers that work in the sales department
could be described as one of the following

Root
Intermediate operations
Leaves
11
Relational Algebra Query Trees
Alternativequery tree for the query to list all
the managers that work in the sales department
?(job Manager) Ù (nameSales) (EMP
emp.deptno dept.deptno DEPT)
12
Relational Algebra Query Trees

Alternativequery tree for the query to list all
the managers that work in the sales department

13
Query Processing Stage - 2

Convert to canonical form
Find a more efficient representation of the
query by converting the internal representation
into some equivalent (canonical) form through the
application of a set of well-defined
transformation rules.
The set of transformation rules to apply will
generally be the result of the application of
specific heuristic processing strategies
associated with particular DBMSs.

14
Transformation Rules for RA Operations

Conjunctive selection operations can cascade into
individual selection operations (and vice versa).
Sometimes referred to as cascade of selection.
spÙqÙr(R) sp(sq(sr(R)))
Example
sdeptno10 Ù salgt1000(Emp)
sdeptno10(ssalgt1000(Emp))

15
Transformation Rules for RA Operations

Commutativity of selection
sp(sq(R)) sq(sp(R))
Example
ssalgt1000(sdeptno10(Emp))
sdeptno10(ssalgt1000(Emp))

16
Transformation Rules for RA Operations

In a sequence of projection operations, only the
last in the sequence is required.
PLPM PN(R) PL (R)
Example
PdeptnoPname(Dept) Pdeptno (Dept))

17
Transformation Rules for RA Operations

Commutativity of selection and projection.
PAi, , Am(sp(R)) sp(PAi, , Am(R))
where p ÎA1, A2, , Am
Example
Pname, job(snameSmith(Emp))
snameSmith'(Pname, job(Staff))

Selection predicate (p) is only made up of
projected attributes
18
Transformation Rules for RA Operations

Commutativity of theta-join (and Cartesian
product).
R p S S p R

NOTE Theta-join is a generalisation of both the
equi-join and natural-join
R X S S X R
Example
EMP emp.deptno dept.deptno DEPT
DEPT emp.deptno dept.deptno EMP
19
Transformation Rules for RA Operations

Commutativity of selection and theta-join
(or Cartesian product).

Selection predicate (p) is only made up of join
attributes
Example
(semp.deptno10 (EMP)) emp.deptno
dept.deptno DEPT
semp.deptno10 (EMP emp.deptno
dept.deptno DEPT)
20
Transformation Rules for RA Operations

Commutativity of projection and theta-join
(or Cartesian product).

PL(R r S) (PL1(R)) r (PL2(S))
Project attributes L L1 ? L2, where L1 are
attributes of R, and L2 are attributes of S. L
will also contain the join attributes
Example
P job, location, deptno (EMP emp.deptno
dept.deptno DEPT)
(P job, deptno (EMP)) emp.deptno
dept.deptno (P location, deptno (DEPT))
21
Transformation Rules for RA Operations

Commutativity of union and intersection
(but not set difference).
R È S S È R
R Ç S S Ç R

22
Transformation Rules for RA Operations

Commutativity of selection and set operations
(union, intersection, and set difference).
Union
sp(R È S) sp(S) È sp(R)
Intersection
sp(R Ç S) sp(S) Ç sp(R)
Set Difference
sp(R - S) sp(S) - sp(R)

23
Transformation Rules for RA Operations

10 Commutativity of projection and union
PL(R È S) PL(S) È PL(R)

24
Transformation Rules for RA Operations

11 Associativity of natural join (and Cartesian
product)
Natural Join
(R S) T R (S T)
Cartesian Product
(R X S) X T R X (S X T)

25
Transformation Rules for RA Operations

12 Associativity of union and intersection (but
not set difference)
Union
(R È S) È T S È (R È T)
Intersection
(R Ç S) Ç T S Ç (R Ç T)

26
Heuristic Processing Strategies

Perform selection operations as early as possible
Translate a Cartesian product and subsequent
selection (whose predicate represents a join
condition) into a join operation.
Use associativity of binary operations to ensure
that the most restrictive selection operations
are executed first
Perform projections as early as possible.
Compute common expressions once

27
Heuristic Processing - Example
28
Query Processing Stage - 3

Choose candidate low-level procedures
Consider the (optimised canonical) query as a
series of low-level operations (join, restrict,
etc).
For each of these operations generate alternative
execution strategies and calculate the cost of
such strategies on the basis of statistical
information held about the database tables
(files).

29
Query Processing Stage - 4

Generate query plans and choose the cheapest
Construct a set of candidate Query Execution
Plans (QEPs).
Each QEP is constructed by selecting a candidate
implementation procedure for each operation in
the canonical query and then combining them to
form a string of associated operations.
Each QEP will have an (estimated) cost associated
with it the sum of the cost of each of its
operations.
Choose the QEP with the least cost.

30
Cost Based Optimisation

Cost Based Optimisation (stages 3 4)
A good declarative query optimiser does not rely
solely on heuristic processing strategies.
It chooses the QEP with the lowest estimated
cost.
After heuristic rules are applied to a query,
there still remains a number of alternative ways
to execute it .
The Query Optimiser estimates the cost of
executing each one (or at least a number) of
these alternatives, and selects the cheapest one.

31
Costs associated with query execution

Secondary storage access costs
Searching for data blocks on disk,
Reading data blocks from disk
Writing data block to disk
Storage costs
Cost of storing intermediate (temp) files
Computation costs
Cost of CPU usage
Main memory usage costs
Cost of buffering data
Communication costs
Cost of moving data across

32
Database statistics used in cost estimation

Information held on each relation
number of tuples
number of blocks
blocking factor
primary access method
primary access attributes
secondary indexes
secondary indexing attributes
number of levels for each index
number of distinct values of each attribute

33
Physical Data Structures File Types

Heap (Sequential, Unordered)
no key columns
queries, other than appends, scan every page
rows are appended at the end
duplicate rows are allowed
Ordered
physically sorted data file with no index
Hash (Random, Direct)
data is located based on the (calculated) value
of a hash field (key)
Indexed Sequential (ISAM)
sorted data file with a primary index
BTree
dynamic multilevel index
reuses deleted space on associated data pages

34
Strategies for implementing the RESTRICT operation

Different access strategies dependant upon the
structure of the file in which the relation is
stored, and whether the predicate attribute(s)
have been indexed/hashed Each uses a different
cost algorithm (which refers to specific database
statistics).
Linear Search (Heap)
Binary Search (Ordered)
Equality on Hash Key
Equality condition on primary key
Inequality condition on primary key
Equality condition on secondary index
Inequality condition on secondary BTree index
If the selection predicate is a composite (AND
OR) then there are additional cost
considerations!

35
Strategies for implementing the JOIN operation

Different access strategies dependant upon the
structure of the files in which the relations to
be joined are stored, and whether the join
attributes have been indexed/hashed Each uses
its own cost algorithm (which refers to specific
database statistics).
Block nested loop join
Indexed nested loop join
Sort-merge join
Hash join

36
Query Optimisation Summary

The aims of query processing are to transform a
query written in a high-level language (SQL),
into a correct and efficient execution strategy
expressed in a low-level language (Relational
Algebra), and to execute the strategy to retrieve
the required data.
There are many equivalent transformations of the
same high-level query, the DBMS has to choose the
one that minimises resource usage.
There are two main techniques for query
optimisation. The first uses heuristic rules that
order the operations in a query. The second
compares different execution strategies for those
operations, based on their relative costs, and
selects the least resource intensive (cheapest)
ones.