Title: LEODB2s Learning Optimizer
1LEO-DB2s Learning Optimizer
- -Authors-
- Michael Stillger, Guy Lohman, Volker Markl,
Mokhtar Kandil
For Self Managing Computing Systems
Course Instructor Dr. Ying Lu Presented
By, Vishwanath Gurav, Department of Computer
Science, University Of Nebraska Lincoln Date-
2/10/2006
2DBMS Query Optimization
- Most of queries are with complex joins
- Same query may have different alternative
semantically equivalent plans of execution - Execution cost Running time for different plans
may vary - So may provide different performance
- The query optimizer is responsible for generating
the input for the execution engine. It takes a
parsed representation of a SQL query as input and
is responsible for generating an
efficient/optimal execution plan for the given
query from the space of possible execution plans.
3 Query Optimization Execution Cost
- Execution Cost Depends on
- Cardinality
- Statistics of Database Characteristics
- Normal query optimizers and their assumptions
- Currency of Information statistics and DBMS
state - Uniformity Skew for local selection
- Independence of Predicates
- Principle of Inclusion - smaller domain larger
domain - What is these assumptions are wrong
- Errors in cardinality, cost, estimates, plans
- Cardinality Number of rows processed in QEP
after application of one or more predicates. - Selectivity for predicate p is the probability
that any row from database satisfies predicate
p.
4 Problems solved by LEO
- Limitations in previous work
- In-efficiency in cardinality and statistics
adjustments, High cost to scan whole base table,
no plans for aggregation and complex joins, no
dynamic adjustment for various queries. - Implementation Overview Learn from
experiential info. adjust database statistics - - learn from any modeling error at any point in
a QEP - - Monitor previously executed queries
- - Compare the optimizers estimates with actuals
at each step of - QEP
- - Determine where cardinality estimation errors
encountered - - Compute adjustment for model (cost estimate
statistics) - dynamically
- - Use them for future query executions to
overcome past mistakes
5 Problems solved by LEO
- LEO can adjust parameters like- buffer
utilization, I/O, actual running time, sort heap
consumption. - But includes cost to collect actuals for these
estimates.
6 Learning Optimizer
7 Learning Optimizer
- SELECT FROM X, Y, Z
- WHERE X.Price gt 100 AND Z.City Denver
- AND Y.Month Dec AND X.ID Y.ID
- AND Y.NR Z.NR
- GROUP BY A
- Sample SQL Query
8Practical Considerations
- Modifying Statistics Vs. Adjusting Selectivities
- Two-Layered Approach - No direct updates to original catalog
statistics. - Second set of statistics- adjustment of
optimizers estimates. - Advantage- Wrong adjustment can be ignored.
- Specific adjustment and plan uses it can be
stored together- avoid wrong adjustment. - Easily accessible mechanism for tuning
Selectivities.
9Practical Considerations
- Consistency between statistics
- DB2- Collects statistics for base tables,
columns, indexes, table spaces, functions Which
maybe - interdependent.
- DB2- Incremental generation of statistics and
check in-consistency. - LEO- Consistency of interdependent classes (no.
of rows in table no. of relative disk pages
10Practical Considerations
- Currency and Accuracy-
- Creating statistics- costly and not accurate
enough. - Fabricated Statistics- file size.
- Updation of statistics- Re adjust the
adjustments properly for no information loss. -
11Practical Considerations
- LEO Vs Database Statistics
- Improvement for repetitive queries
- Gathering information on derived tables and more
detailed info. Than RUNSTATS - RUNSTATS Command (necessary even if in presence
if LEO) - Improves estimates for the regions of database
which are queried most. -
12LEO Feedback loop
- Retaining the plan and its estimates-
- - Section can be executed or stored.
- - QEP is not retained in section else provided
at runtime of section. - - Threads- sequences of operators interpreted at
runtime. - - Plan Skeleton Cumulative cardinality
estimates for each operator
13LEO Feedback loop
- Monitoring Query Execution-
- - Capture actual number rows processed by each
- operator (counters are used)
- - Whole time monitoring for every query in
- workload with less overhead on query execution
-
14LEO Feedback loop
- Analyzing actuals estimates-
- - Batch process, separate process, incremental
and online (best choice) - - Runs as background process in spare cycles
with the help of scheduler - - Analyze and feedback for each query
- Working
- - Analysis component of LEO finds plan
- skeleton for query (hashed in memory)
- For each entry in monitor dump file it finds
match skeleton plan (probing hash table) - - LEO maps monitor counters for each
- section back to proper QEP operator
15LEO Feedback loop
- Adjustment Calculations
- - Operators (TBSCAN,NJOIN) Multiple predicates
- - LEO finds actual monitor data and analyze
predicate according to order of predicates - Compare actual and estimated Selectivities
- LEO deduce and store adjustment factor
- Adj. factorest. selectivity correct selectivity
- Est actualstatsadj
- Storing adjustments
- LEO_TABLE, LEO_COLUMNS, LEO_JOINS
16LEO Feedback loop
- Using Learned Knowledge
- DB2 Optimizer retrieve schema statistics for
each base table referred by query - Compute selectivity and cardinality for each
predicate - LEARNING enabled by control flag
- Search cataloged for relevant base table
cardinalities - Adjusts predicate and base table statistics
- Base table cardinality multiplied by adjustment
factor - E.g. No. of pages with RUNSTATS should be
reflected to LEOs NPAGES. - Consistency between indexes and table statistics.
17LEO Feedback loop
For X.Pricelt100 Adj_selectivity
cardinalityadjustment
18LEO Feedback loop
- Join predicates- group by, distinct, union,
except - Actuals estimates for each operator repair
previous errors in join elements - Adj act old_adj / old_est (set operations)
- Correlation Between Predicates-
- Independence of columns
- Correlations between columns
- Multidimensional histogram approach (assuming
prior knowledge of correlation between columns)
May Cause errors in selectivity estimate
19Performance
Monitoring Overhead- 10GB TPC-H Database- 5
overhead of total execution time. Enabling and
disabling monitoring Permitted useful for time
critical Applications.
- Benefit of learning-
- Adjustments in outdated and incorrect
information - Difference between old and new QEP- LEO can
speed up query
Change in QEP due to dramatic changes in database
20 Advanced Topics
When to re-optimize For Static and Dynamic
Queries-Adjustments may provide better
alternative plans. Currently LEO do not provide
Static query or flushing statement cache
(Reason-Learned knowledge). Trade-off
re-optimization and improved runtime Learning
other Information- Self tuning for Parameters
of DBMS Like some changes in following
parameters- Buffer pools Sort heap size N/W
rate, disk access time, disk transfer - common
parameters for workload adjustment
As per user requirements
21Conclusions
- Quality services from LEO-
- Learn and improve
- Can handle various errors assumptions,
modeling, literals - Can change cardinality, at any point of time
- Far general than multidimensional mechanism
- Less overhead due to monitoring
- Improvement in query optimization
- Tuning problem queries Cost of ownership
- Future work-
- Adjustments for all type of predicates
- Testing on realistic user queries
- Correlation among predicates
- Dynamic change in query execution