Adaptive Query Processing in Looking Glass - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Adaptive Query Processing in Looking Glass

Description:

Adaptive Query Processing in Looking Glass. Detailed qualitative comparison of ... Effect: Database systems become critically dependent of optimizer. ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 27

Provided by: wimA3

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Query Processing in Looking Glass

1
Adaptive Query Processing in Looking Glass
2
Inside the paper

Detailed qualitative comparison of Adaptive Query
Processing systems.
Proposal for two new approaches for adaptive
query processing.

3
Plan-first execute-next

The optimizer picks an efficient plan for a
query.
This plan is used to execute the query.
Effect Database systems become critically
dependent of optimizer.
Problem Optimizers can make mistakes because of
Outdated statistics
Invalid assumptions
Lack of statistics
Unpredictable and changeable environment

4
Adaptive Query Processing

Approach to avoid the performance penalty
The optimization and execution stages are
interleaved
Query processing becomes more robust by
Correcting Optimizers Mistakes
Coping with Unknown Statistics
Reacting to Changes in Input Characteristics and
System Conditions

5
Plan-first execute-next approach
6
Adaptive Query Processing Systems

Plan-based
Routing-based
CQ-based

7
Plan-based Systems
8
Routing-based Systems
9
CQ-based Systems
10
Pros and Cons of using an Optimizer

Effect on plan quality
Using an optimizer
Can consider large complex plan spaces
- Susceptible to estimation errors (optimizers
mistakes)
No optimizer
Routing is based on observed, accurate
statistics
- Routing algorithms are usually greedy and
designed for smaller, simpler plan spaces
Complexity
- Optimizers are complex modules
Routing based systems are simple
Run time overhead
Using optimizer
- Context switching between optimizer and
executor
- Plan enumeration and costing may be applied for
several times
No optimizer
- Overhead of routing
- Have to ensure continuously that routes
correspond to valid plans

11
Tracking Statistics

For verifying that the values observed during
estimation match optimizers estimates.
For obtaining run time values for use in future
optimization steps
Statistics can be observed
Directly from the execution of a plan (in
plan-based and CQ-based systems)
From the flow of tuples along current routing
path (in routing-based systems)

12
Techniques used for gathering statistics

Observation
Used in plan-based systems
Statistics are collected on tuples that pass
through selected points in the query
Exploration
Used in routing-based systems
A fraction of the input tuples are routed along
routes different from the current best route
Competition
Similar with exploration
Routes the same set of tupes along multiple
competing routes
Profiling
A fraction of tuples are routed along all
operators

13
Analysis of tracking statistics techniques

Computational overhead
Observation Depends what statistics are
collected
Exploration Depends on the fraction of tuples
send along less efficient paths
Competition Redundant processing of data
Profiling Extra work on random sample of data
Accuracy of Estimation
Observation high, statistics are observations
Exploration susceptible to bias in routing
decisions and correlations
Competition high, statistics are observations
Profiling Depends on sampling fraction
Coverage of Statistics
Observation Restricted to what we can observe
from the plan
Exploration Limited by large number of
alternative routes
Competition Low, limited number of competing
plans can be run
Profiling Broad coverage of statistics

14
Re-optimizationWhen and how to re-optimize

In plan-based and CQ-based systems
re-optimization is invoked explicitly
during execution, the statistics that the
optimizer estimated will be tracked
the optimizer is invoked whenever an observed
value is found to be significantly different or
out of range from the value estimated by
optimizer
In a routing based system re-optimization is
invoked implicitly
the scheme used to route tuples to operators is
based on the current statistics

15
Re-Optimization Plan switching

Re-optimization in plan-based or CQ-based systems
may decide a new plan is better then the current
plan
Important issues when switching between plans
Correctness
The new plan must not output result tuples that
have been output by previous plans or miss some
of the result tuples
Reuse of work
The new plan should consider reusing the parts of
the query that were processed by previous plans
Attention to the efficiency of reusing work
Plan state
The techniques used for changing the plans should
account the state of the plan

16
Techniques used by AQP for plan switching

Avoiding duplicate results
Plan-based
No result is output until processing is complete
Keep track of tuples output so far to eliminate
duplicates in future
CQ-based
Access methods over streams are one-pass scan, so
duplicates are never generated
Routing-based
Routing constraints are enforced in order to
elliminate duplicate results
Reusing work done so far
Plan based the materialized subexpressions are
used on a cost basis
CQ-based migrate state in temporary structures
Routing-based migrate state in temporary
structures
Reducing switching time
Plan-based
New plan is started on new input
Combine data partitions processed by different
plans only after all sources are exhausted
CQ-based
Caches that can be dropped fast
Techniques to migrate state in parallel to
processing

17
Run-time overhead

In order to ensure adaptability a AQP system
incurs run-time overhead
Low rate of change
Plan-based
very low overhead
CQ-based
- Overhead is dominated by tracking statistics
Routing-based
Low overhead of exploring alternative paths
High rate of change
Plan-based
- may use inefficient plans because of limited
re-optimization opportunities
CQ-based
More resilient because profiling enables
faster, more complete statistics tracking
Routing-based
- inefficient routes may be used always because
exploration takes time to converge

18
Thrashing

It happens when an AQP system is spending most of
its resources in adaptivity-related overhead.
Safeguards for minimizing thrashing
Limiting re-optimization points, only
re-optimizing at blocking operators in plans
Limiting the number of times re-optimization can
be invoked
Setting the minimum number of tuples processed or
time interval between any two invocations of
re-optimization

19
New Approach to AQPProactive optimization

Current plan-based systems use an optimizer to
generate the plan, then detect and respond to
suboptimalities reactive optimization
Drawback when the plan is chosen, a conventional
optimizer does not consider issues affecting
re-optimization
Proactive optimization query plans are chosen
with optimization in mind.

20
Issues considered during proactive re-optimization

The potential overhead of tracking statistics
during execution, possibility of change and plan
switching
The potential for reuse of work in case a plan
change is required
The ability to identify quickly whether the
current execution plan is suboptimal
The ability of decreasing uncertainty in
statistics

21
Potential run-time overhead for adaptability

Consider join of relations R and S
Consider
R small
S large and has a clustered index on the join
attribute
A indexed nested loop join (INLJ) with R as the
outer will outperform a hybrid hash join (HHJ)
If the size of R increases, the performance of
the HHJ starts to dominate the performance of the
INLJ
If R is unknown either
Use the safe HHJ instead of risky INJL because it
might require change of plan
If the size of R is known with a certain
confidence to be small, we prefer INLJ in stead
of HHJ

22
Potential reuse of work

A plan P can also be considered risky if the
system may not be able to reuse any of the work
done by P if re-optimization is required.
One approach is to generate plans with shorter
pipelined segments and more materialization points

23
Identifying Plan Suboptimality Faster

Some plans allow more assertions to be checked
concurrently.
The system can discover suboptimalities of
downstream operators in the plan long before the
upstream operators have finished execution

24
Detecting Uncertainty in Statistics

Consider the join of relation R and S (like in
the previous example)
Suppose that uncertainty in the size of R comes
from the presence of selective predicates p1 and
p2 on R
The optimizer can choose to estimate the combined
selectivity of p1 and p2 from a sample of R
before choosing the join algorithm
The optimizer can chose pipelined plans for some
queries and profiling to estimate required
statistics
The optimizer can explore multiple selected
subplans to collect statistics

25
New Approach for AQPPlan Logging

With plan-logging for a continuous query Q, the
statistics tracker logs the statistics relevant
to Q
The optimizer logs the plan it picked based on
those statistics.
The information in log for Q can be used as
follows
Grouping together log entries that contain the
same plan P for query Q
The AQP can identify those statistics whose
changes most contribute to significant changes in
plan performance
The history captured can be used for online
what-if analysis.
It can identify statistics that are prone to
transient changes

26
Conclusions

The main contribution of the paper is
identification of the three query families and
the detailed comparison of these systems.
Gives an idea about the optimization aspects that
one needs to consider carefully in case of AQP.
Underlines the limit between performance gain or
lose in case of change of plan.

Write a Comment

User Comments (0)