Proactive ReOptimization - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Proactive ReOptimization

Description:

1. Proactive Re-Optimization. by Shivath Babu, Pedro Bizarro and David DeWitt. 2. General Idea ... Traditional (TRADs) query optimizers (plan-first execute-next) ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 47

Provided by: steliospa

Category:

more less

Transcript and Presenter's Notes

Title: Proactive ReOptimization

1
Proactive Re-Optimization

by Shivath Babu, Pedro Bizarro and David DeWitt

2
General Idea

Problem
Traditional (TRADs) query optimizers (plan-first
execute-next) often choose suboptimal plans, due
to errors in estimates, using,
skewed and correlated data distributions
wrong/old statistics

3
General Idea

Solutions
better statistics
new algorithms for optimization
adaptive architectures for execution
Re-optimization
Proactive Re-Optimization

4
General Idea

Current Re-Optimizers (reactive)
use traditional optimizers to pick a plan and
if deviation to estimations are detected at
run-time, they stop execution, in order to
re-optimize using the new knowledge

limitary factor
5
General Idea

Shortcomings of the use of TRADs in Reactive
Re-Optimizers
Pr1 may pick plans heavily depending on
uncertain statistics, making re-optimization very
likely
Pr2 if TRAD chooses a new plan, work until
estimation error is found will be lost
Pr3 limited statistics collection ability
leading to new mistakes

6
General Idea

Construction of proactive re-optimizer (Rio) to
confront the above problems, with
bounding boxes (areas of uncertainty in
estimates, intervals instead of single-point),
use of bounding boxes to generate robust and
switchable plans, minimizing re-optimization
buffer results, in order to use them in case of a
plan change
more quick and accurate statistic collection at
run-time using Random Sample processing

7
Limitations of Single-point Estimates (Pr1)

Example (1)
select from R, S
where R.aS.a and R.bgtK1 and R.cgtK2
Assume
DB-cache200MB, R500MB, S160MB, and
s(R)300MB
But optimizer estimates s(R)150MB

8
Limitations of Single-point Estimates (Pr1)

Two possible plans
(Left input of Hash-Join build relation
Right input of Hash-Join probe relation)

9
Limitations of Single-point Estimates (Pr1)
10
Limitations of Single-point Estimates (Pr1)

TRAD will choose P1a resulting in making two
passes over R and S (P1b needs only one pass over
R and S)
Reactive re-optimizer (VRO) will choose P1a
computing a validity range for P1a
100KBs(R)160MB
If s(R)lt100KB, then it is preferable to use an
index nested-loops join
If s(R)gt160MB then P1b is optimal
In this example, the check s(R)160MB will fail
during execution, invoking re-optimization

11
Limitations of Single-point Estimates (Pr1)

What if we chose P1b from the beginning although
at estimation point s(R)150MB it is not the
optimal plan?
P1b cost is close to optimal at all times

12
Limited Information (Pr3)

Example (3)
select from R, S, T
where R.aS.a and S.bT.b and R.cgtK1 and R.dK2
Assume sizes of the tables are known accurately
to be R200MB, S50MB, and T60MB
Further assume s(R)80MB
But optimizer underestimates as 40KB

14
Limited Information (Pr3)

Based on these statistics, the optimizer chooses
Plan P3a
Due to limited information of the actual size of
s(R), a reactive re-optimizer will trigger
re-optimization step-by-step three times moving
from P3a to P3d (optimal plan), leading to much
wasted execution time (called thrashing)

15
Losing Work in a Pipeline (Pr2)

In addition work done in pipelines (e.g. in P3c)
may be lost
possibly if partial work completed (e.g. Pipeline
PPL1 in P3c)
for certain if partial work not completed (e.g.
Pipeline PPL2 in P3c)

16
Proactive Re-opt Architecture
17
Components added in Rio

A Validity-Ranges Optimizer (VRO)
Rio, the proactive re-optimizer
uncertainty buckets and rules on them
histograms for statistical estimations
operators, like
Hybrid hash join operator (2 possible input
subtrees 1 base relation1 deep subtree or 2
base relations)

18
Components added in Rio

more operators
switch operator implementing switchable plans
buffer operators
read-random-samples operators
execution engine additions
ability to stop execution, re-optimize and
continue
in-memory catalog for run-time statistics
inter-operator communication mechanism based on
punctuations

19
Computing Bounding Boxes

Instead of single-point estimation use intervals
around those estimates
Depending on uncertainty of estimation (how did
we compute the estimation?) varies the size of
the box

20
Computing Bounding Boxes

compute a single-point estimation E and assign to
it a corresponding uncertainty bucket U
U is valuated as integer from 0(no uncertainty)
to 6(very high uncertainty) depending on the way
E is derived
bounding box B is an interval lo, hi depending
on (E, U) computed as follows

21
Computing Bounding Boxes
22
Plans Bounding Boxes

But
Single-point estimation one optimal plan
What about bounding boxes?
Four cases with bounding box B
Single optimal plan at all points in B
Single robust plan (plan with cost very close to
optimal at all points in B)
Switchable plan. Set S of plans p, so that
at each point pt in B there is plan p in S, with
cost at pt close to optimal at pt
can decide which p after accurate estimation
exists
can pick any p, if estimation in B without much
execution work done so far
None of the above

23
Optimizing with Bounding Boxes

Rio computes bounding boxes for all input sizes
used to cost plans
then it tries to compute a switchable plan which
may also be a single robust or a single optimal
plan (OPgtRPgtSP)
if that fails it picks the optimal plan based on
the single-point estimates and considers validity
ranges like VRO

24
Bounding Boxes

Bounding Box of example 1

25
Bounding Boxes

No optimal plan in B but there is a robust plan
(P1b)

26
Computation of switchable plan

For each bounding box we find three plans
considering three costs CLow, CEst and CHigh
CEst is computed traditionally, attaching to it
the plan with the minimum cost in case the real
statistics match our single-point estimation
CLow (CHigh) is the cost of the plan with the
minimum cost at lower-left (upper-right) corner
of the bounding box

27
Computation of switchable plan

So we end up with three plans (called seed plans)
BestPlanLow, plan with minimum cost CLow
BestPlanEst, plan with minimum cost CEst
BestPlanHigh, plan with minimum cost CHigh

28
Computation of switchable plan

We now have the following cases
seeds are all the same plan
seeds are not all the same, but one of them is a
robust plan
seeds are not all the same, and none of them is
robust but a switchable plan can be created from
the seeds
No single optimal or robust or switchable plan
can be found

29
Computation of switchable plan

in case (i) we a single optimal plan as the
switchable plan (singleton set)
in case (ii), to check if e.g. BestPlanLow is a
robust plan we do the following
if CEst of BestPlanLow is close to (e.g. within
20 of) CEst of BestPlanEst and
CHigh of BestPlanLow is close to CHigh of
BestPlanHigh
then BestPlanLow is a robust plan

30
Example 1 again
31
Computation of switchable plan

in case (iii), a switchable plan is a set of
plans S where
all plans in S have a different join operator as
a root (reversed build and probe are different)
all plans in S have the same subplan for the deep
subtree input to the root operator
all plans in S have the same base table, but not
necessarily the same access path

32
Computation of switchable plan

a switchable set

33
Computation of switchable plan

if, according to the previous, the three seeds
constitute a switchable set, then we are done
else the optimizer builds a set SW_Low with
switchable plans with BestPlanLow among them,
prunes them so that the three minimum cost (at
Low, Est and High points) plans are left and
check how close they are to CEst and CHigh of
BestPlanEst and BestPlanHigh respectively. If
they are close enough SW_Low is a switchable plan
else we do the same for the two other seeds

34
Computation of switchable plan

if this also fails (case (iv)) then Rio picks
BestPlanEst and adds validity ranges (like VRO)

35
Implementing switchable plans

if we have a switchable set we can defer choosing
among them until we have an accurate estimation
of the input size, as follows
the buffer operator buffers random f tuples of
the input (output of the deep subplan), followed
by an end-of-sample punctuation eos(f)
if the tuples buffered are n, the total input
size is fairly accurately estimated, as
EstN100n/f

36
Implementing switchable plans

if EstN is in the bounding box, then the suitable
plan among the switching plan set is chosen and
executed
else if EstN is out of the bounding box
re-optimization is triggered

37
Randomization in Table-Scan

Randomization in Table-Scan Operators is fairly
easy
if tuples in T are kept in random order, then we
scan sequentially until f of T is read and then
we produce eos(f)
else, we put an extra bit in each tuple in T, to
know if it is used in the sample or not
After computing EstN we continue with the rest
tuples (that are not in the sample)

38
Randomization in Join

Read tuples to memory from probe until eos(f)
reached
Join the in-memory tuples with complete build
relation, then send an eos(f) punctuation and
delete all sample
To resume (after computing EstN) join the rest of
probe with the already built build relation

39
Experiments

Using Robust Plans

40
Experiments

Using Switchable Plans

41
Experiments

Using three-way Join

42
Experiments

Increasing Query Complexity

43
Comments on paper

Many parameters are left to be optimized (?1, ?2,
f, U, E,), or isnt this possible?
What if a slight change of the bounding box leads
to a Single Robust Plan, or even a Single Optimal
Plan, when the whole box doesn't even have a
Switchable Plan?
Switching Plans are estimated based on three
costs (three seeds). Is this optimal? What about
points in-between?

44
Comments on paper

How is Randomization (collecting a random sample
of tuples) really done?
Can we reorder tuples, after sampling, to get
them in order?
Is this efficient?
Do we need them to be in order or only
mid-operators may need them so?

45
Comments on paper

The test of choosing Switchable Sets and
consequently Switchable Plans is limited (s31)
More Switchable Sets possibly exist but we cant
detect them
But until we can
Does it have any sense to use Proactive
Re-Optimization in real DBs? (complexitygtgtprofit)

46
Comments on paper

How efficient is Proactive Re-Optimization in
general?
What is its time complexity when we have various
queries?
May be applied effectually in distributed
(integrated) DBs, where we have possibly limited
access to statistical data (but is Randomization
possible?)

Write a Comment

User Comments (0)