Adaptive Query Processing in Looking Glass - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Adaptive Query Processing in Looking Glass

Description:

Adaptive Query Processing in Looking Glass. Detailed qualitative comparison of ... Effect: Database systems become critically dependent of optimizer. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 27
Provided by: wimA3
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Query Processing in Looking Glass


1
Adaptive Query Processing in Looking Glass
2
Inside the paper
  • Detailed qualitative comparison of Adaptive Query
    Processing systems.
  • Proposal for two new approaches for adaptive
    query processing.

3
Plan-first execute-next
  • The optimizer picks an efficient plan for a
    query.
  • This plan is used to execute the query.
  • Effect Database systems become critically
    dependent of optimizer.
  • Problem Optimizers can make mistakes because of
  • Outdated statistics
  • Invalid assumptions
  • Lack of statistics
  • Unpredictable and changeable environment

4
Adaptive Query Processing
  • Approach to avoid the performance penalty
  • The optimization and execution stages are
    interleaved
  • Query processing becomes more robust by
  • Correcting Optimizers Mistakes
  • Coping with Unknown Statistics
  • Reacting to Changes in Input Characteristics and
    System Conditions

5
Plan-first execute-next approach
6
Adaptive Query Processing Systems
  • Plan-based
  • Routing-based
  • CQ-based

7
Plan-based Systems
8
Routing-based Systems
9
CQ-based Systems
10
Pros and Cons of using an Optimizer
  • Effect on plan quality
  • Using an optimizer
  • Can consider large complex plan spaces
  • - Susceptible to estimation errors (optimizers
    mistakes)
  • No optimizer
  • Routing is based on observed, accurate
    statistics
  • - Routing algorithms are usually greedy and
    designed for smaller, simpler plan spaces
  • Complexity
  • - Optimizers are complex modules
  • Routing based systems are simple
  • Run time overhead
  • Using optimizer
  • - Context switching between optimizer and
    executor
  • - Plan enumeration and costing may be applied for
    several times
  • No optimizer
  • - Overhead of routing
  • - Have to ensure continuously that routes
    correspond to valid plans

11
Tracking Statistics
  • For verifying that the values observed during
    estimation match optimizers estimates.
  • For obtaining run time values for use in future
    optimization steps
  • Statistics can be observed
  • Directly from the execution of a plan (in
    plan-based and CQ-based systems)
  • From the flow of tuples along current routing
    path (in routing-based systems)

12
Techniques used for gathering statistics
  • Observation
  • Used in plan-based systems
  • Statistics are collected on tuples that pass
    through selected points in the query
  • Exploration
  • Used in routing-based systems
  • A fraction of the input tuples are routed along
    routes different from the current best route
  • Competition
  • Similar with exploration
  • Routes the same set of tupes along multiple
    competing routes
  • Profiling
  • A fraction of tuples are routed along all
    operators

13
Analysis of tracking statistics techniques
  • Computational overhead
  • Observation Depends what statistics are
    collected
  • Exploration Depends on the fraction of tuples
    send along less efficient paths
  • Competition Redundant processing of data
  • Profiling Extra work on random sample of data
  • Accuracy of Estimation
  • Observation high, statistics are observations
  • Exploration susceptible to bias in routing
    decisions and correlations
  • Competition high, statistics are observations
  • Profiling Depends on sampling fraction
  • Coverage of Statistics
  • Observation Restricted to what we can observe
    from the plan
  • Exploration Limited by large number of
    alternative routes
  • Competition Low, limited number of competing
    plans can be run
  • Profiling Broad coverage of statistics

14
Re-optimizationWhen and how to re-optimize
  • In plan-based and CQ-based systems
    re-optimization is invoked explicitly
  • during execution, the statistics that the
    optimizer estimated will be tracked
  • the optimizer is invoked whenever an observed
    value is found to be significantly different or
    out of range from the value estimated by
    optimizer
  • In a routing based system re-optimization is
    invoked implicitly
  • the scheme used to route tuples to operators is
    based on the current statistics

15
Re-Optimization Plan switching
  • Re-optimization in plan-based or CQ-based systems
    may decide a new plan is better then the current
    plan
  • Important issues when switching between plans
  • Correctness
  • The new plan must not output result tuples that
    have been output by previous plans or miss some
    of the result tuples
  • Reuse of work
  • The new plan should consider reusing the parts of
    the query that were processed by previous plans
  • Attention to the efficiency of reusing work
  • Plan state
  • The techniques used for changing the plans should
    account the state of the plan

16
Techniques used by AQP for plan switching
  • Avoiding duplicate results
  • Plan-based
  • No result is output until processing is complete
  • Keep track of tuples output so far to eliminate
    duplicates in future
  • CQ-based
  • Access methods over streams are one-pass scan, so
    duplicates are never generated
  • Routing-based
  • Routing constraints are enforced in order to
    elliminate duplicate results
  • Reusing work done so far
  • Plan based the materialized subexpressions are
    used on a cost basis
  • CQ-based migrate state in temporary structures
  • Routing-based migrate state in temporary
    structures
  • Reducing switching time
  • Plan-based
  • New plan is started on new input
  • Combine data partitions processed by different
    plans only after all sources are exhausted
  • CQ-based
  • Caches that can be dropped fast
  • Techniques to migrate state in parallel to
    processing

17
Run-time overhead
  • In order to ensure adaptability a AQP system
    incurs run-time overhead
  • Low rate of change
  • Plan-based
  • very low overhead
  • CQ-based
  • - Overhead is dominated by tracking statistics
  • Routing-based
  • Low overhead of exploring alternative paths
  • High rate of change
  • Plan-based
  • - may use inefficient plans because of limited
    re-optimization opportunities
  • CQ-based
  • More resilient because profiling enables
    faster, more complete statistics tracking
  • Routing-based
  • - inefficient routes may be used always because
    exploration takes time to converge

18
Thrashing
  • It happens when an AQP system is spending most of
    its resources in adaptivity-related overhead.
  • Safeguards for minimizing thrashing
  • Limiting re-optimization points, only
    re-optimizing at blocking operators in plans
  • Limiting the number of times re-optimization can
    be invoked
  • Setting the minimum number of tuples processed or
    time interval between any two invocations of
    re-optimization

19
New Approach to AQPProactive optimization
  • Current plan-based systems use an optimizer to
    generate the plan, then detect and respond to
    suboptimalities reactive optimization
  • Drawback when the plan is chosen, a conventional
    optimizer does not consider issues affecting
    re-optimization
  • Proactive optimization query plans are chosen
    with optimization in mind.

20
Issues considered during proactive re-optimization
  • The potential overhead of tracking statistics
    during execution, possibility of change and plan
    switching
  • The potential for reuse of work in case a plan
    change is required
  • The ability to identify quickly whether the
    current execution plan is suboptimal
  • The ability of decreasing uncertainty in
    statistics

21
Potential run-time overhead for adaptability
  • Consider join of relations R and S
  • Consider
  • R small
  • S large and has a clustered index on the join
    attribute
  • A indexed nested loop join (INLJ) with R as the
    outer will outperform a hybrid hash join (HHJ)
  • If the size of R increases, the performance of
    the HHJ starts to dominate the performance of the
    INLJ
  • If R is unknown either
  • Use the safe HHJ instead of risky INJL because it
    might require change of plan
  • If the size of R is known with a certain
    confidence to be small, we prefer INLJ in stead
    of HHJ

22
Potential reuse of work
  • A plan P can also be considered risky if the
    system may not be able to reuse any of the work
    done by P if re-optimization is required.
  • One approach is to generate plans with shorter
    pipelined segments and more materialization points

23
Identifying Plan Suboptimality Faster
  • Some plans allow more assertions to be checked
    concurrently.
  • The system can discover suboptimalities of
    downstream operators in the plan long before the
    upstream operators have finished execution

24
Detecting Uncertainty in Statistics
  • Consider the join of relation R and S (like in
    the previous example)
  • Suppose that uncertainty in the size of R comes
    from the presence of selective predicates p1 and
    p2 on R
  • The optimizer can choose to estimate the combined
    selectivity of p1 and p2 from a sample of R
    before choosing the join algorithm
  • The optimizer can chose pipelined plans for some
    queries and profiling to estimate required
    statistics
  • The optimizer can explore multiple selected
    subplans to collect statistics

25
New Approach for AQPPlan Logging
  • With plan-logging for a continuous query Q, the
    statistics tracker logs the statistics relevant
    to Q
  • The optimizer logs the plan it picked based on
    those statistics.
  • The information in log for Q can be used as
    follows
  • Grouping together log entries that contain the
    same plan P for query Q
  • The AQP can identify those statistics whose
    changes most contribute to significant changes in
    plan performance
  • The history captured can be used for online
    what-if analysis.
  • It can identify statistics that are prone to
    transient changes

26
Conclusions
  • The main contribution of the paper is
    identification of the three query families and
    the detailed comparison of these systems.
  • Gives an idea about the optimization aspects that
    one needs to consider carefully in case of AQP.
  • Underlines the limit between performance gain or
    lose in case of change of plan.
Write a Comment
User Comments (0)
About PowerShow.com