Proactive ReOptimization - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Proactive ReOptimization

Description:

1. Proactive Re-Optimization. by Shivath Babu, Pedro Bizarro and David DeWitt. 2. General Idea ... Traditional (TRADs) query optimizers (plan-first execute-next) ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 47
Provided by: steliospa
Category:

less

Transcript and Presenter's Notes

Title: Proactive ReOptimization


1
Proactive Re-Optimization
  • by Shivath Babu, Pedro Bizarro and David DeWitt

2
General Idea
  • Problem
  • Traditional (TRADs) query optimizers (plan-first
    execute-next) often choose suboptimal plans, due
    to errors in estimates, using,
  • skewed and correlated data distributions
  • wrong/old statistics

3
General Idea
  • Solutions
  • better statistics
  • new algorithms for optimization
  • adaptive architectures for execution
  • Re-optimization
  • Proactive Re-Optimization

4
General Idea
  • Current Re-Optimizers (reactive)
  • use traditional optimizers to pick a plan and
  • if deviation to estimations are detected at
    run-time, they stop execution, in order to
    re-optimize using the new knowledge

limitary factor
5
General Idea
  • Shortcomings of the use of TRADs in Reactive
    Re-Optimizers
  • Pr1 may pick plans heavily depending on
    uncertain statistics, making re-optimization very
    likely
  • Pr2 if TRAD chooses a new plan, work until
    estimation error is found will be lost
  • Pr3 limited statistics collection ability
    leading to new mistakes

6
General Idea
  • Construction of proactive re-optimizer (Rio) to
    confront the above problems, with
  • bounding boxes (areas of uncertainty in
    estimates, intervals instead of single-point),
    use of bounding boxes to generate robust and
    switchable plans, minimizing re-optimization
  • buffer results, in order to use them in case of a
    plan change
  • more quick and accurate statistic collection at
    run-time using Random Sample processing

7
Limitations of Single-point Estimates (Pr1)
  • Example (1)
  • select from R, S
  • where R.aS.a and R.bgtK1 and R.cgtK2
  • Assume
  • DB-cache200MB, R500MB, S160MB, and
    s(R)300MB
  • But optimizer estimates s(R)150MB

8
Limitations of Single-point Estimates (Pr1)
  • Two possible plans
  • (Left input of Hash-Join build relation
  • Right input of Hash-Join probe relation)

9
Limitations of Single-point Estimates (Pr1)
10
Limitations of Single-point Estimates (Pr1)
  • TRAD will choose P1a resulting in making two
    passes over R and S (P1b needs only one pass over
    R and S)
  • Reactive re-optimizer (VRO) will choose P1a
    computing a validity range for P1a
    100KBs(R)160MB
  • If s(R)lt100KB, then it is preferable to use an
    index nested-loops join
  • If s(R)gt160MB then P1b is optimal
  • In this example, the check s(R)160MB will fail
    during execution, invoking re-optimization

11
Limitations of Single-point Estimates (Pr1)
  • What if we chose P1b from the beginning although
    at estimation point s(R)150MB it is not the
    optimal plan?
  • P1b cost is close to optimal at all times

12
Limited Information (Pr3)
  • Example (3)
  • select from R, S, T
  • where R.aS.a and S.bT.b and R.cgtK1 and R.dK2
  • Assume sizes of the tables are known accurately
    to be R200MB, S50MB, and T60MB
  • Further assume s(R)80MB
  • But optimizer underestimates as 40KB

13

14
Limited Information (Pr3)
  • Based on these statistics, the optimizer chooses
    Plan P3a
  • Due to limited information of the actual size of
    s(R), a reactive re-optimizer will trigger
    re-optimization step-by-step three times moving
    from P3a to P3d (optimal plan), leading to much
    wasted execution time (called thrashing)

15
Losing Work in a Pipeline (Pr2)
  • In addition work done in pipelines (e.g. in P3c)
    may be lost
  • possibly if partial work completed (e.g. Pipeline
    PPL1 in P3c)
  • for certain if partial work not completed (e.g.
    Pipeline PPL2 in P3c)

16
Proactive Re-opt Architecture
17
Components added in Rio
  • A Validity-Ranges Optimizer (VRO)
  • Rio, the proactive re-optimizer
  • uncertainty buckets and rules on them
  • histograms for statistical estimations
  • operators, like
  • Hybrid hash join operator (2 possible input
    subtrees 1 base relation1 deep subtree or 2
    base relations)

18
Components added in Rio
  • more operators
  • switch operator implementing switchable plans
  • buffer operators
  • read-random-samples operators
  • execution engine additions
  • ability to stop execution, re-optimize and
    continue
  • in-memory catalog for run-time statistics
  • inter-operator communication mechanism based on
    punctuations

19
Computing Bounding Boxes
  • Instead of single-point estimation use intervals
    around those estimates
  • Depending on uncertainty of estimation (how did
    we compute the estimation?) varies the size of
    the box

20
Computing Bounding Boxes
  • compute a single-point estimation E and assign to
    it a corresponding uncertainty bucket U
  • U is valuated as integer from 0(no uncertainty)
    to 6(very high uncertainty) depending on the way
    E is derived
  • bounding box B is an interval lo, hi depending
    on (E, U) computed as follows

21
Computing Bounding Boxes
22
Plans Bounding Boxes
  • But
  • Single-point estimation one optimal plan
  • What about bounding boxes?
  • Four cases with bounding box B
  • Single optimal plan at all points in B
  • Single robust plan (plan with cost very close to
    optimal at all points in B)
  • Switchable plan. Set S of plans p, so that
  • at each point pt in B there is plan p in S, with
    cost at pt close to optimal at pt
  • can decide which p after accurate estimation
    exists
  • can pick any p, if estimation in B without much
    execution work done so far
  • None of the above

23
Optimizing with Bounding Boxes
  • Rio computes bounding boxes for all input sizes
    used to cost plans
  • then it tries to compute a switchable plan which
    may also be a single robust or a single optimal
    plan (OPgtRPgtSP)
  • if that fails it picks the optimal plan based on
    the single-point estimates and considers validity
    ranges like VRO

24
Bounding Boxes
  • Bounding Box of example 1

25
Bounding Boxes
  • No optimal plan in B but there is a robust plan
    (P1b)

26
Computation of switchable plan
  • For each bounding box we find three plans
    considering three costs CLow, CEst and CHigh
  • CEst is computed traditionally, attaching to it
    the plan with the minimum cost in case the real
    statistics match our single-point estimation
  • CLow (CHigh) is the cost of the plan with the
    minimum cost at lower-left (upper-right) corner
    of the bounding box

27
Computation of switchable plan
  • So we end up with three plans (called seed plans)
  • BestPlanLow, plan with minimum cost CLow
  • BestPlanEst, plan with minimum cost CEst
  • BestPlanHigh, plan with minimum cost CHigh

28
Computation of switchable plan
  • We now have the following cases
  • seeds are all the same plan
  • seeds are not all the same, but one of them is a
    robust plan
  • seeds are not all the same, and none of them is
    robust but a switchable plan can be created from
    the seeds
  • No single optimal or robust or switchable plan
    can be found

29
Computation of switchable plan
  • in case (i) we a single optimal plan as the
    switchable plan (singleton set)
  • in case (ii), to check if e.g. BestPlanLow is a
    robust plan we do the following
  • if CEst of BestPlanLow is close to (e.g. within
    20 of) CEst of BestPlanEst and
  • CHigh of BestPlanLow is close to CHigh of
    BestPlanHigh
  • then BestPlanLow is a robust plan

30
Example 1 again
31
Computation of switchable plan
  • in case (iii), a switchable plan is a set of
    plans S where
  • all plans in S have a different join operator as
    a root (reversed build and probe are different)
  • all plans in S have the same subplan for the deep
    subtree input to the root operator
  • all plans in S have the same base table, but not
    necessarily the same access path

32
Computation of switchable plan
  • a switchable set

33
Computation of switchable plan
  • if, according to the previous, the three seeds
    constitute a switchable set, then we are done
  • else the optimizer builds a set SW_Low with
    switchable plans with BestPlanLow among them,
    prunes them so that the three minimum cost (at
    Low, Est and High points) plans are left and
    check how close they are to CEst and CHigh of
    BestPlanEst and BestPlanHigh respectively. If
    they are close enough SW_Low is a switchable plan
  • else we do the same for the two other seeds

34
Computation of switchable plan
  • if this also fails (case (iv)) then Rio picks
    BestPlanEst and adds validity ranges (like VRO)

35
Implementing switchable plans
  • if we have a switchable set we can defer choosing
    among them until we have an accurate estimation
    of the input size, as follows
  • the buffer operator buffers random f tuples of
    the input (output of the deep subplan), followed
    by an end-of-sample punctuation eos(f)
  • if the tuples buffered are n, the total input
    size is fairly accurately estimated, as
    EstN100n/f

36
Implementing switchable plans
  • if EstN is in the bounding box, then the suitable
    plan among the switching plan set is chosen and
    executed
  • else if EstN is out of the bounding box
    re-optimization is triggered

37
Randomization in Table-Scan
  • Randomization in Table-Scan Operators is fairly
    easy
  • if tuples in T are kept in random order, then we
    scan sequentially until f of T is read and then
    we produce eos(f)
  • else, we put an extra bit in each tuple in T, to
    know if it is used in the sample or not
  • After computing EstN we continue with the rest
    tuples (that are not in the sample)

38
Randomization in Join
  • Read tuples to memory from probe until eos(f)
    reached
  • Join the in-memory tuples with complete build
    relation, then send an eos(f) punctuation and
    delete all sample
  • To resume (after computing EstN) join the rest of
    probe with the already built build relation

39
Experiments
  • Using Robust Plans

40
Experiments
  • Using Switchable Plans

41
Experiments
  • Using three-way Join

42
Experiments
  • Increasing Query Complexity

43
Comments on paper
  • Many parameters are left to be optimized (?1, ?2,
    f, U, E,), or isnt this possible?
  • What if a slight change of the bounding box leads
    to a Single Robust Plan, or even a Single Optimal
    Plan, when the whole box doesn't even have a
    Switchable Plan?
  • Switching Plans are estimated based on three
    costs (three seeds). Is this optimal? What about
    points in-between?

44
Comments on paper
  • How is Randomization (collecting a random sample
    of tuples) really done?
  • Can we reorder tuples, after sampling, to get
    them in order?
  • Is this efficient?
  • Do we need them to be in order or only
    mid-operators may need them so?

45
Comments on paper
  • The test of choosing Switchable Sets and
    consequently Switchable Plans is limited (s31)
  • More Switchable Sets possibly exist but we cant
    detect them
  • But until we can
  • Does it have any sense to use Proactive
    Re-Optimization in real DBs? (complexitygtgtprofit)

46
Comments on paper
  • How efficient is Proactive Re-Optimization in
    general?
  • What is its time complexity when we have various
    queries?
  • May be applied effectually in distributed
    (integrated) DBs, where we have possibly limited
    access to statistical data (but is Randomization
    possible?)
Write a Comment
User Comments (0)
About PowerShow.com