LEODB2s Learning Optimizer - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

LEODB2s Learning Optimizer

Description:

Currency of Information statistics and DBMS state. Uniformity Skew for local selection ... Batch process, separate process, incremental and online (best choice) ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 22

Provided by: itg187

Category:

more less

Transcript and Presenter's Notes

Title: LEODB2s Learning Optimizer

1
LEO-DB2s Learning Optimizer

-Authors-
Michael Stillger, Guy Lohman, Volker Markl,
Mokhtar Kandil

For Self Managing Computing Systems
Course Instructor Dr. Ying Lu Presented
By, Vishwanath Gurav, Department of Computer
Science, University Of Nebraska Lincoln Date-
2/10/2006
2
DBMS Query Optimization

Most of queries are with complex joins
Same query may have different alternative
semantically equivalent plans of execution
Execution cost Running time for different plans
may vary
So may provide different performance
The query optimizer is responsible for generating
the input for the execution engine. It takes a
parsed representation of a SQL query as input and
is responsible for generating an
efficient/optimal execution plan for the given
query from the space of possible execution plans.

3
Query Optimization Execution Cost

Execution Cost Depends on
Cardinality
Statistics of Database Characteristics
Normal query optimizers and their assumptions
Currency of Information statistics and DBMS
state
Uniformity Skew for local selection
Independence of Predicates
Principle of Inclusion - smaller domain larger
domain
What is these assumptions are wrong
Errors in cardinality, cost, estimates, plans
Cardinality Number of rows processed in QEP
after application of one or more predicates.
Selectivity for predicate p is the probability
that any row from database satisfies predicate
p.

4
Problems solved by LEO

Limitations in previous work
In-efficiency in cardinality and statistics
adjustments, High cost to scan whole base table,
no plans for aggregation and complex joins, no
dynamic adjustment for various queries.
Implementation Overview Learn from
experiential info. adjust database statistics
- learn from any modeling error at any point in
a QEP
- Monitor previously executed queries
- Compare the optimizers estimates with actuals
at each step of
QEP
- Determine where cardinality estimation errors
encountered
- Compute adjustment for model (cost estimate
statistics)
dynamically
- Use them for future query executions to
overcome past mistakes

5
Problems solved by LEO

LEO can adjust parameters like- buffer
utilization, I/O, actual running time, sort heap
consumption.
But includes cost to collect actuals for these
estimates.

6
Learning Optimizer
7
Learning Optimizer

SELECT FROM X, Y, Z
WHERE X.Price gt 100 AND Z.City Denver
AND Y.Month Dec AND X.ID Y.ID
AND Y.NR Z.NR
GROUP BY A
Sample SQL Query

8
Practical Considerations

Modifying Statistics Vs. Adjusting Selectivities
- Two-Layered Approach
No direct updates to original catalog
statistics.
Second set of statistics- adjustment of
optimizers estimates.
Advantage- Wrong adjustment can be ignored.
Specific adjustment and plan uses it can be
stored together- avoid wrong adjustment.
Easily accessible mechanism for tuning
Selectivities.

9
Practical Considerations

Consistency between statistics
DB2- Collects statistics for base tables,
columns, indexes, table spaces, functions Which
maybe
interdependent.
DB2- Incremental generation of statistics and
check in-consistency.
LEO- Consistency of interdependent classes (no.
of rows in table no. of relative disk pages

10
Practical Considerations

Currency and Accuracy-
Creating statistics- costly and not accurate
enough.
Fabricated Statistics- file size.
Updation of statistics- Re adjust the
adjustments properly for no information loss.

11
Practical Considerations

LEO Vs Database Statistics
Improvement for repetitive queries
Gathering information on derived tables and more
detailed info. Than RUNSTATS
RUNSTATS Command (necessary even if in presence
if LEO)
Improves estimates for the regions of database
which are queried most.

12
LEO Feedback loop

Retaining the plan and its estimates-
- Section can be executed or stored.
- QEP is not retained in section else provided
at runtime of section.
- Threads- sequences of operators interpreted at
runtime.
- Plan Skeleton Cumulative cardinality
estimates for each operator

13
LEO Feedback loop

Monitoring Query Execution-
- Capture actual number rows processed by each
operator (counters are used)
- Whole time monitoring for every query in
workload with less overhead on query execution

14
LEO Feedback loop

Analyzing actuals estimates-
- Batch process, separate process, incremental
and online (best choice)
- Runs as background process in spare cycles
with the help of scheduler
- Analyze and feedback for each query
Working
- Analysis component of LEO finds plan
skeleton for query (hashed in memory)
For each entry in monitor dump file it finds
match skeleton plan (probing hash table)
- LEO maps monitor counters for each
section back to proper QEP operator

15
LEO Feedback loop

Adjustment Calculations
- Operators (TBSCAN,NJOIN) Multiple predicates
- LEO finds actual monitor data and analyze
predicate according to order of predicates
Compare actual and estimated Selectivities
LEO deduce and store adjustment factor
Adj. factorest. selectivity correct selectivity
Est actualstatsadj
Storing adjustments
LEO_TABLE, LEO_COLUMNS, LEO_JOINS

16
LEO Feedback loop

Using Learned Knowledge
DB2 Optimizer retrieve schema statistics for
each base table referred by query
Compute selectivity and cardinality for each
predicate
LEARNING enabled by control flag
Search cataloged for relevant base table
cardinalities
Adjusts predicate and base table statistics
Base table cardinality multiplied by adjustment
factor
E.g. No. of pages with RUNSTATS should be
reflected to LEOs NPAGES.
Consistency between indexes and table statistics.

17
LEO Feedback loop

Single Table Predicates

For X.Pricelt100 Adj_selectivity
cardinalityadjustment
18
LEO Feedback loop

Join predicates- group by, distinct, union,
except
Actuals estimates for each operator repair
previous errors in join elements
Adj act old_adj / old_est (set operations)
Correlation Between Predicates-
Independence of columns
Correlations between columns
Multidimensional histogram approach (assuming
prior knowledge of correlation between columns)

May Cause errors in selectivity estimate
19
Performance
Monitoring Overhead- 10GB TPC-H Database- 5
overhead of total execution time. Enabling and
disabling monitoring Permitted useful for time
critical Applications.

Benefit of learning-
Adjustments in outdated and incorrect
information
Difference between old and new QEP- LEO can
speed up query

Change in QEP due to dramatic changes in database
20
Advanced Topics
When to re-optimize For Static and Dynamic
Queries-Adjustments may provide better
alternative plans. Currently LEO do not provide
Static query or flushing statement cache
(Reason-Learned knowledge). Trade-off
re-optimization and improved runtime Learning
other Information- Self tuning for Parameters
of DBMS Like some changes in following
parameters- Buffer pools Sort heap size N/W
rate, disk access time, disk transfer - common
parameters for workload adjustment
As per user requirements
21
Conclusions