Title: Query Optimization
1Query Optimization
2General Overview
- Relational model - SQL
- Formal commercial query languages
- Functional Dependencies
- Normalization
- Physical Design
- Indexing
- Query Processing and Optimization
3Review QP O
SQL Query
Query Processor
Parser
Query Optimizer
Algebraic Expression
Execution plan
Evaluator
Data result of the query
4Review QP O
Query Optimizer
Algebraic Representation
Query Rewriter
Algebraic Representation
Data Stats
Plan Generator
Query Execution Plan
5Review-Plan Generation
- Metadata DBMS maintains statistics about each
relation, attribute and index. - Plan generation
- Generate many alternative plans
- We saw many for selections, joins
- Estimate cost for each and choose best
- Plans examined
- Selection Linear, binary, PI, SI
- Range PI, SI
- Joins NLJ, BNLJ, INLJ, SMJ, HJ
6Review-Plan Generation
- Depends upon a cost model
- For any query, must know
- ?its estimated cardinality
- ?its estimated cost (in of I/Os)
- E.g. ?A K (R )
- ? cardinality SC(A, R)
- ? cost depends on the plan, attribute
- Linear Scan bR /2
bR - Binary Search ?log2(bR)? ?log2(bR)??sc(A,
R)/fR? -1 - PI Scan HTi 1 HTi ?sc(A, R) / fR?
7Hybrid Merge Join
- hybrid merge-join If one relation is sorted, and
the other has a secondary B-tree index on the
join attribute - Merge the sorted relation with the leaf entries
of the B-tree . - Sort the result on the addresses of the unsorted
relations tuples - Scan the unsorted relation in physical address
order and merge with previous result, to replace
addresses by the actual tuples - Sequential scan more efficient than random lookup
8Hash- joins
- Applicable only to natural joins, equijoins
- Depends upon hash function h, used to partition
both relations - ? must map values of join attributes to 0, ...,
n-1 s.t. n partitions
9Hash-Join Algorithm
Algorithm Hash Join
- Partition the relation S using hashing function h
so that each si fits in memory. Use 1 block of
memory as the output buffer for each partition.
(at least n blocks) - 2. Partition R using h.
- For each partition i (0, n-1)
- Use BNLJ to compute the join between Ri and Si
Ri Si - (optimal since si fits in memory, inner relation)
S is called the build input and
R is called the probe input.
Note can reduce CPU costs by building in-memory
hash index for each si using a different hash
function than h.
10Hash Join
- Partitioning
- must choose
- of partitions, n
- hashing function, h (each tuple ? 0, ..., n-1)
- Goals (in order of importance)
- 1. Each partition of build relation should fit
in memory - (gt h is uniform, n is large)
- 2. For partitioning step, can fit 1 output block
of each partition in memory - (gt n is small (lt M-1))
- Strategy
- Ensure 1.
- Deal with violations of 2 when needed.
11Hash Join
- Goal 1 Partitions of build relations should fit
in memory
1
...
Memory (M blocks)
n
n should be? Ans (reserving 2
blocks for R partition, output of BNLJ) (In
practice, a little larger (fudge factor1.2) as
not all memory available for partition joins)
12Hash Join
- Goal 2 keep n lt M
- what if not possible?
- Recursive partitioning!
- Idea
- Iteration 1 Partition S into M-1 partitions
using h1 - Iteration 2 Partition each partition of S
into M-1 partitions using a different hash
function h2 - ......
- repeat until partition S into gt
13Cost of Hash-Join
- Cost
- case 1 No recursive partitioning
- 1. Partition S bS reads and bS
n writes. - Why n?
- 2. Rartition R bR reads and bR
n writes. - 3. n partition joins bR bS 2n
Reads - Total 3(bR bS ) 4 n
Typically n is small enough (roughly ) so
it can be ignored.
14Cost of Hash-Join
- case 2 Recursive Partitioning
- Recall partition build relation M-1 ways at
each time. - So, total number of iterations
- ?logM1(n)? ?logM1(bS / M-2)? ?logM1(bS
/ M-1)? - ?logM1 bS - 1?
- Cost
- 1. partition S 2 bS (?logM1 bS - 1?)
- 2. partition R 2 bR (?logM1 bS - 1?)
- 3. n partition joins bR bS
- Total cost estimate is
- 2(bR bS )( ?logM1(bS)-1?) bR bS
15Example of Cost of Hash-Join
customer depositor
- Assume that memory size is M3 blocks
- bdepositor 100 and bcustomer 400.
- depositor is to be used as build input.
-
NO
Recursive partitioning 2(bcust bdep )
(?log2(bdep) -1?) bdep bcust 1000 (6)
500 6500 I/Os !
Why ever use SMJ? 1) both input relations
already sorted 2) skewless hash functions hard
sometimes.
16Hybrid HashJoin
- Useful when memory sizes are relatively large,
and the build input is bigger than memory. - Main feature of hybrid hash join
- Keep the smaller partition of the build
relation in memory.
s0
Sn-2
Sn-1
input
R
S
s0
Rn-2
Rn-1
input
output
R0
17Complex Joins
- Join with a conjunctive condition
- r ?1? ? 2?... ? ? n s
- Either use nested loops/block nested loops, or
- Compute the result of one of the simpler joins r
?i s - final result comprises those tuples in the
intermediate result that satisfy the remaining
conditions - ?1 ? . . . ? ?i 1 ? ?i 1 ? . . . ? ?n
- Join with a disjunctive condition
- r ?1 ? ?2 ?... ? ?n s
- Either use nested loops/block nested loops, or
- Compute as the union of the records in
individual joins r ? i s - (r ?1 s) ? (r ?2 s) ? . . . ? (r
?n s)
18Other Operations
- Duplicate elimination can be implemented via
hashing or sorting. - Optimization duplicates can be deleted during
run generation as well as at intermediate merge
steps in external sort-merge. - Hashing is similar duplicates will come into
the same bucket. - Projection is implemented by performing
projection on each tuple followed by duplicate
elimination.
19Other Operations Aggregation
- Aggregation can be implemented in a manner
similar to duplicate elimination. - Hashing or sorting
- Optimization combine tuples in the same group
during run generation and intermediate merges, by
computing partial aggregate values - For count, min, max, sum keep aggregate values
on tuples found so far in the group. - When combining partial aggregate for count, add
up the aggregates - For avg, keep sum and count, and divide sum by
count at the end - True for all distributive aggregates, i.e.
- aggr(S) f(aggr(S1), aggr(S2)), S S1 ? S2
20Evaluation of Expressions
- So far we have seen algorithms for individual
operations - Alternatives for evaluating an entire expression
(operator) tree - Materialization generate results of an
expression whose inputs are relations or are
already computed, materialize (store) it on disk.
Repeat. - Pipelining pass on tuples to parent operations
even as an operation is being executed
21Materialization
- Materialized evaluation evaluate the expression
bottom-up, storing intermediate results on disk - E.g., in figure below, compute and store
- then compute the store its join with customer,
and finally compute the projections on
customer-name.
22Materialization (Cont.)
- Materialized evaluation is always applicable
- Cost of writing results to disk and reading them
back can be quite high - Overall cost Sum of costs of individual
operations cost of
writing intermediate results to disk - Double buffering use two output buffers for each
operation, when one is full write it to disk
while the other is getting filled - Allows overlap of disk writes with computation
and reduces execution time
23Pipelining
- Pipelined evaluation evaluate several
operations simultaneously, passing the results of
one operation on to the next. - E.g., in previous expression tree, dont store
result of - instead, pass tuples directly to the join..
Similarly, dont store result of join, pass
tuples directly to projection. - For pipelining to be effective, use evaluation
algorithms that generate output tuples even as
tuples are received for inputs to the operation. - Pipelines can be executed in two ways demand
driven and producer driven
24Pipelining (Cont.)
- In demand driven or lazy evaluation
- Each operation is implemented as an iterator
implementing the following operations - init()
- E.g. file scan initialize file scan, store
pointer to beginning of file as state - next()
- E.g. for file scan Output next tuple, and
advance and store file pointer - close()
25Pipelining (Cont.)
- In produce-driven or eager pipelining
- Operators produce tuples eagerly and pass them up
to their parents - Buffer maintained between operators, child puts
tuples in buffer, parent removes tuples from
buffer - if buffer is full, child waits till there is
space in the buffer, and then generates more
tuples - System schedules operations that have space in
output buffer and can process more input tuples
26Query Optimization
- Query Rewriting
- Given a relational algebra expression produce and
equivalent expression that can be evaluated more
efficiently - Plan generator
- Choose the best algorithm for each operator given
statistics about the database, main memory
constraints and available indices
27Transformation of Relational Expressions
- Two RA expressions are equivalent if they produce
the same results on the same inputs - In SQL, inputs and outputs are multisets of
tuples - Two expressions in the multiset version of the
relational algebra are said to be equivalent if
on every legal database instance the two
expressions generate the same multiset of tuples - An equivalence rule says that expressions of two
forms are equivalent - Can replace expression of first form by second,
or vice versa
28Equivalence Rules
- 1. Conjunctive selection operations can be
deconstructed into a sequence of individual
selections. - 2. Selection operations are commutative.
- 3. Only the last in a sequence of projection
operations is needed, the others can be
omitted. - Selections can be combined with Cartesian
products and theta joins. - ??(R1 X R2) R1 ? R2
- ??1(R1 ?2 R2) R1 ?1? ?2 R2
29?
?
R1
R2
R2
R1
R3
R1
R1
R2
R2
R3
30Equivalence Rules (Cont.)
- 5. Theta-join operations (and natural joins) are
commutative. R1 ? R2 R2 ? R1 - 6. (a) Natural join operations are associative
- (R1 R2) R3 R1 (R2 R3)(b)
Theta joins are associative in the following
manner (R1 ?1 R2) ?2? ? 3 R3 R1
?1? ?3 (R2 ?2 R3) where ?2
involves attributes from only R2 and R3.
31Equivalence Rules (Cont.)
- The selection operation distributes over the
theta join operation under the following two
conditions - (a) When all the attributes in ?0 involve
only the attributes of one of the
expressions (R1) being joined.
??0?R1 ? R2) (??0(R1)) ? R2 - (b) When ? 1 involves only the attributes of R1
and ?2 involves only the attributes of
R2. - ??1??? ?R1 ? R2)
(??1(R1)) ? (??? (R2))
32Equivalence Rules (Cont.)
- 8. The projections operation distributes over the
theta join operation as follows - (a) if ? involves only attributes from L1 ?
L2 - (b) Consider a join E1 ? E2.
- Let L1 and L2 be sets of attributes from E1 and
E2, respectively. - Let L3 be attributes of E1 that are involved in
join condition ?, but are not in L1 ? L2, and - let L4 be attributes of E2 that are involved in
join condition ?, but are not in L1 ? L2.
33Transformation Example
- Query Find the names of all customers who have
an account at some branch located in Brooklyn. - ?customer-name(?branch-city Brooklyn (branch
(account depositor))) - Transformation using rule 7a.
- ?customer-name ((?branch-city Brooklyn
(branch)) (account depositor)) - Performing the selection as early as possible
reduces the size of the relation to be joined.
34Multiple Transformations (Cont.)
35Enumeration of Equivalent Expressions
- Query optimizers use equivalence rules to
generate equivalent expressions - 1st Approach Generate all equivalent expressions
- But... Very expensive
- 2nd Approach Exploit common sub-expressions
- when E1 is generated from E2 by an equivalence
rule, usually only the top level of the two are
different, subtrees below are the same and can be
shared - E.g. when applying join associativity
- Time requirements are reduced by not generating
all expressions
36Evaluation Plan Example