CS263 Lecture 19 Query Optimisation - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

CS263 Lecture 19 Query Optimisation

Description:

CS263. Lecture 19. Query Optimisation. Motivation for Query ... sdeptno=10 sal 1000(Emp) = sdeptno=10(ssal 1000(Emp)) Transformation Rules for RA Operations ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 37
Provided by: RaghuRama78
Category:

less

Transcript and Presenter's Notes

Title: CS263 Lecture 19 Query Optimisation


1
CS263Lecture 19 Query Optimisation
2
LECTURE PLAN
  • Motivation for Query Optimisation
  • Phases of Query Processing
  • Query Trees
  • RA Transformation Rules
  • Heuristic Processing Strategies
  • Cost Estimation for RA Operations

3
Motivation for Query Optimisation
  • List all the managers that work in the sales
    department.
  • SELECT
  • FROM emp, dept
  • WHERE emp.deptno dept.deptno
  • AND emp.job Manager
  • AND dept.name Sales

There are at least three alternative ways of
representing this query as a Relational Algebra
expression.
4
Motivation for Query Optimisation
Metrics 1000 tuples in the EMP relation 50
tuples in the DEPT relation 50 employees are
Managers (one per department) 5 separate Sales
departments (across the country)
Cost of processing the following query alternate
5
Motivation for Query Optimisation
Metrics 1000 tuples in the EMP relation 50
tuples in the DEPT relation 50 employees are
Managers (one per department) 5 separate Sales
departments (across the country)
Cost of processing the following query alternate
6
Motivation for Query Optimisation
Cost of processing the following query
7
Phases of Query Processing
8
Query Processing Stage - 1
  • Cast the query into internal form
  • This involves the conversion of the original
    (SQL) query into some internal representation
    more suitable for machine manipulation.
  • The internal representation typically chosen is
    either some kind of abstract syntax tree, or a
    relational algebra query tree.

9
Relational Algebra Query Trees
  • A Relational Algebra query can be represented as
    a query tree. For example the query to list all
    the managers that work in the sales department
    could be described as one of the following

Root
Intermediate operations
Leaves
10
Relational Algebra Query Trees
  • A Relational Algebra query can be represented as
    a query tree. For example the query to list all
    the managers that work in the sales department
    could be described as one of the following

Root
Intermediate operations
Leaves
11
Relational Algebra Query Trees
Alternativequery tree for the query to list all
the managers that work in the sales department
?(job Manager) Ù (nameSales) (EMP
emp.deptno dept.deptno DEPT)
12
Relational Algebra Query Trees
  • Alternativequery tree for the query to list all
    the managers that work in the sales department

13
Query Processing Stage - 2
  • Convert to canonical form
  • Find a more efficient representation of the
    query by converting the internal representation
    into some equivalent (canonical) form through the
    application of a set of well-defined
    transformation rules.
  • The set of transformation rules to apply will
    generally be the result of the application of
    specific heuristic processing strategies
    associated with particular DBMSs.

14
Transformation Rules for RA Operations
  • Conjunctive selection operations can cascade into
    individual selection operations (and vice versa).
  • Sometimes referred to as cascade of selection.
  • spÙqÙr(R) sp(sq(sr(R)))
  • Example
  • sdeptno10 Ù salgt1000(Emp)
    sdeptno10(ssalgt1000(Emp))

15
Transformation Rules for RA Operations
  • Commutativity of selection
  • sp(sq(R)) sq(sp(R))
  • Example
  • ssalgt1000(sdeptno10(Emp))
    sdeptno10(ssalgt1000(Emp))

16
Transformation Rules for RA Operations
  • In a sequence of projection operations, only the
    last in the sequence is required.
  • PLPM PN(R) PL (R)
  • Example
  • PdeptnoPname(Dept) Pdeptno (Dept))

17
Transformation Rules for RA Operations
  • Commutativity of selection and projection.
  • PAi, , Am(sp(R)) sp(PAi, , Am(R))
  • where p ÎA1, A2, , Am
  • Example
  • Pname, job(snameSmith(Emp))
    snameSmith'(Pname, job(Staff))

Selection predicate (p) is only made up of
projected attributes
18
Transformation Rules for RA Operations
  • Commutativity of theta-join (and Cartesian
    product).
  • R p S S p R

NOTE Theta-join is a generalisation of both the
equi-join and natural-join
R X S S X R
Example
EMP emp.deptno dept.deptno DEPT
DEPT emp.deptno dept.deptno EMP
19
Transformation Rules for RA Operations
  • Commutativity of selection and theta-join
  • (or Cartesian product).

Selection predicate (p) is only made up of join
attributes
Example
(semp.deptno10 (EMP)) emp.deptno
dept.deptno DEPT
semp.deptno10 (EMP emp.deptno
dept.deptno DEPT)
20
Transformation Rules for RA Operations
  • Commutativity of projection and theta-join
  • (or Cartesian product).

PL(R r S) (PL1(R)) r (PL2(S))
Project attributes L L1 ? L2, where L1 are
attributes of R, and L2 are attributes of S. L
will also contain the join attributes
Example
P job, location, deptno (EMP emp.deptno
dept.deptno DEPT)
(P job, deptno (EMP)) emp.deptno
dept.deptno (P location, deptno (DEPT))
21
Transformation Rules for RA Operations
  • Commutativity of union and intersection
  • (but not set difference).
  • R È S S È R
  • R Ç S S Ç R

22
Transformation Rules for RA Operations
  • Commutativity of selection and set operations
    (union, intersection, and set difference).
  • Union
  • sp(R È S) sp(S) È sp(R)
  • Intersection
  • sp(R Ç S) sp(S) Ç sp(R)
  • Set Difference
  • sp(R - S) sp(S) - sp(R)

23
Transformation Rules for RA Operations
  • 10 Commutativity of projection and union
  • PL(R È S) PL(S) È PL(R)

24
Transformation Rules for RA Operations
  • 11 Associativity of natural join (and Cartesian
    product)
  • Natural Join
  • (R S) T R (S T)
  • Cartesian Product
  • (R X S) X T R X (S X T)

25
Transformation Rules for RA Operations
  • 12 Associativity of union and intersection (but
    not set difference)
  • Union
  • (R È S) È T S È (R È T)
  • Intersection
  • (R Ç S) Ç T S Ç (R Ç T)

26
Heuristic Processing Strategies
  • Perform selection operations as early as possible
  • Translate a Cartesian product and subsequent
    selection (whose predicate represents a join
    condition) into a join operation.
  • Use associativity of binary operations to ensure
    that the most restrictive selection operations
    are executed first
  • Perform projections as early as possible.
  • Compute common expressions once

27
Heuristic Processing - Example
28
Query Processing Stage - 3
  • Choose candidate low-level procedures
  • Consider the (optimised canonical) query as a
    series of low-level operations (join, restrict,
    etc).
  • For each of these operations generate alternative
    execution strategies and calculate the cost of
    such strategies on the basis of statistical
    information held about the database tables
    (files).

29
Query Processing Stage - 4
  • Generate query plans and choose the cheapest
  • Construct a set of candidate Query Execution
    Plans (QEPs).
  • Each QEP is constructed by selecting a candidate
    implementation procedure for each operation in
    the canonical query and then combining them to
    form a string of associated operations.
  • Each QEP will have an (estimated) cost associated
    with it the sum of the cost of each of its
    operations.
  • Choose the QEP with the least cost.

30
Cost Based Optimisation
  • Cost Based Optimisation (stages 3 4)
  • A good declarative query optimiser does not rely
    solely on heuristic processing strategies.
  • It chooses the QEP with the lowest estimated
    cost.
  • After heuristic rules are applied to a query,
    there still remains a number of alternative ways
    to execute it .
  • The Query Optimiser estimates the cost of
    executing each one (or at least a number) of
    these alternatives, and selects the cheapest one.

31
Costs associated with query execution
  • Secondary storage access costs
  • Searching for data blocks on disk,
  • Reading data blocks from disk
  • Writing data block to disk
  • Storage costs
  • Cost of storing intermediate (temp) files
  • Computation costs
  • Cost of CPU usage
  • Main memory usage costs
  • Cost of buffering data
  • Communication costs
  • Cost of moving data across

32
Database statistics used in cost estimation
  • Information held on each relation
  • number of tuples
  • number of blocks
  • blocking factor
  • primary access method
  • primary access attributes
  • secondary indexes
  • secondary indexing attributes
  • number of levels for each index
  • number of distinct values of each attribute

33
Physical Data Structures File Types
  • Heap (Sequential, Unordered)
  • no key columns
  • queries, other than appends, scan every page
  • rows are appended at the end
  • duplicate rows are allowed
  • Ordered
  • physically sorted data file with no index
  • Hash (Random, Direct)
  • data is located based on the (calculated) value
    of a hash field (key)
  • Indexed Sequential (ISAM)
  • sorted data file with a primary index
  • BTree
  • dynamic multilevel index
  • reuses deleted space on associated data pages

34
Strategies for implementing the RESTRICT operation
  • Different access strategies dependant upon the
    structure of the file in which the relation is
    stored, and whether the predicate attribute(s)
    have been indexed/hashed Each uses a different
    cost algorithm (which refers to specific database
    statistics).
  • Linear Search (Heap)
  • Binary Search (Ordered)
  • Equality on Hash Key
  • Equality condition on primary key
  • Inequality condition on primary key
  • Equality condition on secondary index
  • Inequality condition on secondary BTree index
  • If the selection predicate is a composite (AND
    OR) then there are additional cost
    considerations!

35
Strategies for implementing the JOIN operation
  • Different access strategies dependant upon the
    structure of the files in which the relations to
    be joined are stored, and whether the join
    attributes have been indexed/hashed Each uses
    its own cost algorithm (which refers to specific
    database statistics).
  • Block nested loop join
  • Indexed nested loop join
  • Sort-merge join
  • Hash join

36
Query Optimisation Summary
  • The aims of query processing are to transform a
    query written in a high-level language (SQL),
    into a correct and efficient execution strategy
    expressed in a low-level language (Relational
    Algebra), and to execute the strategy to retrieve
    the required data.
  • There are many equivalent transformations of the
    same high-level query, the DBMS has to choose the
    one that minimises resource usage.
  • There are two main techniques for query
    optimisation. The first uses heuristic rules that
    order the operations in a query. The second
    compares different execution strategies for those
    operations, based on their relative costs, and
    selects the least resource intensive (cheapest)
    ones.
Write a Comment
User Comments (0)
About PowerShow.com