Title: ICS 214A: Database Management Systems Winter 2004
1ICS 214A Database Management Systems Winter 2004
- Lecture 15 Query Optimization (cont)
- Professor Chen Li
2Some transformation rules work for sets, not bags
- Example sp1vp2 (R) sp1(R) ? sp2(R) is true for
sets, but not for bags. - For instance Ra,a,b,b,b,c, P1 satisfied by
a,b P2 satisfied by b,c - sp1(R) a,a,b,b,b
- sp2(R) b,b,b,c
- sp1vp2 (R) a,a,b,b,b,c
- sp1(R) ? sp2(R) a,a,b,b,b,b,b,b,c ltgtsp1vp2
(R) !
3Rules s combined
- Let p predicate with only R attributes
- q predicate with only S attributes
- m predicate with only R, S attributes
- sp (R S)
- sq (R S)
- Example
- R(dept, id, salary), S(id, city, tel)
- p depttoy, q city irvine, m id gt 500
sp (R) S R sq (S)
4Rules s combined (continued)
- Some rules can be derived
sp?q (R S) sp sq (R S) sp R
sq (S) sp (R) sq (S)
5Derive the following by yourself
- sp?q?m (R S) sm (sp R) (sq S)
- spvq (R S) (sp R) S?R (sq S)
6Rules p, s combined
- Let x subset of R attributes
- z attributes in predicate p (subset of R
attributes) - pxsp (R)
- Example
- R(dept, id, salary)
- x salary
- p depttoy and id gt 500 ? z dept, id
sp px (R)
7Rules p, combined
Let x subset of R attributes y subset
of S attributes z intersection of R,S
attributes pxy (R S)pxypxz (R) pyz
(S)
Example R(dept, id, salary), S(id, city,
tel) x dept, y city zid
8Example R(dept, id, salary), S(id, city,
tel) x dept, y city, zid p (id gt
500 city irvine)
9Rules s, ? combined
- sp(R ? S) sp(R) ? sp(S)
- sp(R - S) sp(R) - S sp(R) - sp(S)
10Some basic transformations
- sp1?p2 (R) ? sp1sp2 (R)
- sp (R S) ? sp (R) S
- R S ? S R
- px sp(R) ? pxsp pxz (R)
11Conventional wisdom do projects early
- Example R(A,B,C,D,E) xE P
(A3) ? (Bcat) - px sp (R) versus pE sppABE(R)
12Query Optimizer
- Find the best physical plan
- Comprises the following
- Plan space (search space)
- huge number of alternative, semantically
equivalent plans - computationally expensive to examine all
- Conventional wisdom avoid bad plans
- need to include plans that have low cost
- Cost model
- facilitate comparisons of alternative plans
- has to be accurate
- Enumeration algorithm
- search strategy (optimization algorithm) that
searches through the plan space - has to be efficient (low optimization overhead)
13Plan Space
- Left-deep trees right child has to be a base
table - Right-deep trees left child has to be a base
table - Deep trees one of the two children is a base
table - Bushy tree unrestricted
Deep tree
Bushy tree
Left-deep tree
14Search Algorithms
- Exhaustive enumerate each possible plan, and
pick the best - Greedy Techniques
- Randomized/Transformation Techniques
- System-R approach
- Dynamic programming with pruning
151. Greedy Algorithm (Example)
- Smallest relation next
- Suppose Ri lt Rk for i lt k
R1
All plans must begin with R1
R2
R1
R3
R2
R4
R5
R3
R4
R5
All plans beginning with R2-R5 have been pruned!
162. Randomized Techniques
- Employ randomized/transformation techniques for
query optimization - Hill climbing
- State space -- space of plans, State -- plan
- Each state has a cost associated with it
- determined by some cost model
- A move is a perturbation applied to a state to
get to another state - a move set is the set of moves available to go
from one state to another - any one move is chosen from this move set
randomly - each move set has a probability associated to
indicate the probability of selecting the move
17Randomized Algorithm (Example)
- We have covered many equivalence-preservation
rules.
R4
R4
R3
R3
R1
R2
R2
R1
R4
R4
R3
R3
R1
R2
R2
R1
18Search space
19Randomized Techniques (cont)
- Two states are neighboring states if one move
suffices to go from one state to the other - A local minimum in the state space is a state
such that its cost is lower than that of all
neighboring states - A global minimum is a state which has the lowest
cost among all local minima - at most one global minimum value (could have
multiple states) - A move that takes one state to another with a
lower cost is called a downward move otherwise
it is an upward move - in a local/global minimum, all moves are upward
moves
203. Optimization in System-R
- A.k.a. Selinger-style query optimization
- The classic paper on query optimization (Selinger
et al. 1979) - Originally used in the IBM System-R project
- Now widely used
- Basic ideas
- Left-deep trees only
- Bottom-up generation of plans
- Use dynamic-programming paradigm
- Considering interesting orders
21Bottom-up plan generation
- Assumption 1 Once we have joined k relations,
the method of joining this result further with
another relation is independent of the previous
join methods (true?) - Assumption 2 Any subplan of an optimal plan must
also be optimal (otherwise we could replace the
subplan to get a better overall plan). (ture?) - Bottom-up generation of optimal plans
- Compute the optimal plan for joining k relations
- Suboptimal plans are pruned
- From these plans, derive the optimal plans for
joining k1 relations
22Basic version dynamic programming
- The algorithm proceeds by considering
increasingly larger subsets of the set of all
relations. - Plans for a set of cardinality i are constructed
as extensions of the best plan for a set of
cardinality i-1 - Search space can be pruned based on the principal
of optimality - if two plans differ only in a subplan, then the
plan with the better subplan is also the better
plan
23Dynamic Programming (Cont)
1 2 3
4
1 2 1 3 1 4 2 3
2 4 3 4
1 2 3 1 2 4 2 3
4 1 3 4
1 2 3 4
24Example
- Consider joining relations R,S,T,U. Each with
1000 tuples.
S
R
T
U
Cost taken to be size of intermediate results.
Actually should take I/O cost or some other cost
metric.
25Example (cont)
S
R
T
U
Only left-deep trees considered.
26Dynamic Programming (cont)
- Pass 1 find the best single-relation plans
- Pass 2 find the best two-relation plans by
considering each single-relation plan (from pass
1) as the outer relation and every other relation
as the innner relation -
- Pass k Find the best k-relation plans by
considering each (k-1)-relation plan (from Pass
k-1) as the outer relation and every other
relation as the inner relation - Until k of relations in the query
27Improvement interesting order
- Is it always true that locally optimal is
globally optimal? NO! - Example R(A,B) JOIN S(A,C) JOIN T(A,D)
- Best plan for R JOIN S hash join (beats
sort-merge join) - Best overall plan sort-merge join R and S, and
then sort-merge join with T - Subplan of the optimal plan is not optimal
- Why?
- The result of the sort-merge join of R and S is
sorted on A - This is an interesting order that can be
exploited by later processing (e.g., join,
duplicate elimination, GROUP BY, ORDER BY, etc.)!
28Dealing with interesting orders
- When picking the optimal plan
- Comparing their costs is not enough
- Plans are not totally ordered by cost anymore
- Comparing interesting orders is also needed
- Plans are now partially ordered
- Plan X is better than plan Y if
- Cost of X is lower than Y
- Interesting orders produced by X subsume those
produced by Y - Need to keep a set of optimal plans for joining
every combination of k relations - Typically one for each interesting order
29Summary
- Query optimization is hard.
- Instead of finding the best, the objective is to
avoid the bad plans - Many different optimization strategies have been
proposed - greedy heuristics are fast but may generate plans
that are far from optimal - dynamic programming is effective at the expense
of high optimization overhead
30Summary of this course
SQL statements
Access plan
Query Processor
optimizer
Read write records, scan relations
Record-oriented file system
Get page containing tuples
Buffer manager
Basic file system
Read/write file pages
Hardware
Other related DB courses ICS214B, ICS215