LEODB2s Learning Optimizer - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

LEODB2s Learning Optimizer

Description:

Currency of Information statistics and DBMS state. Uniformity Skew for local selection ... Batch process, separate process, incremental and online (best choice) ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 22
Provided by: itg187
Category:

less

Transcript and Presenter's Notes

Title: LEODB2s Learning Optimizer


1
LEO-DB2s Learning Optimizer
  • -Authors-
  • Michael Stillger, Guy Lohman, Volker Markl,
    Mokhtar Kandil

For Self Managing Computing Systems
Course Instructor Dr. Ying Lu Presented
By, Vishwanath Gurav, Department of Computer
Science, University Of Nebraska Lincoln Date-
2/10/2006
2
DBMS Query Optimization
  • Most of queries are with complex joins
  • Same query may have different alternative
    semantically equivalent plans of execution
  • Execution cost Running time for different plans
    may vary
  • So may provide different performance
  • The query optimizer is responsible for generating
    the input for the execution engine. It takes a
    parsed representation of a SQL query as input and
    is responsible for generating an
    efficient/optimal execution plan for the given
    query from the space of possible execution plans.

3
Query Optimization Execution Cost
  • Execution Cost Depends on
  • Cardinality
  • Statistics of Database Characteristics
  • Normal query optimizers and their assumptions
  • Currency of Information statistics and DBMS
    state
  • Uniformity Skew for local selection
  • Independence of Predicates
  • Principle of Inclusion - smaller domain larger
    domain
  • What is these assumptions are wrong
  • Errors in cardinality, cost, estimates, plans
  • Cardinality Number of rows processed in QEP
    after application of one or more predicates.
  • Selectivity for predicate p is the probability
    that any row from database satisfies predicate
    p.

4
Problems solved by LEO
  • Limitations in previous work
  • In-efficiency in cardinality and statistics
    adjustments, High cost to scan whole base table,
    no plans for aggregation and complex joins, no
    dynamic adjustment for various queries.
  • Implementation Overview Learn from
    experiential info. adjust database statistics
  • - learn from any modeling error at any point in
    a QEP
  • - Monitor previously executed queries
  • - Compare the optimizers estimates with actuals
    at each step of
  • QEP
  • - Determine where cardinality estimation errors
    encountered
  • - Compute adjustment for model (cost estimate
    statistics)
  • dynamically
  • - Use them for future query executions to
    overcome past mistakes

5
Problems solved by LEO
  • LEO can adjust parameters like- buffer
    utilization, I/O, actual running time, sort heap
    consumption.
  • But includes cost to collect actuals for these
    estimates.

6
Learning Optimizer
7
Learning Optimizer
  • SELECT FROM X, Y, Z
  • WHERE X.Price gt 100 AND Z.City Denver
  • AND Y.Month Dec AND X.ID Y.ID
  • AND Y.NR Z.NR
  • GROUP BY A
  • Sample SQL Query

8
Practical Considerations
  • Modifying Statistics Vs. Adjusting Selectivities
    - Two-Layered Approach
  • No direct updates to original catalog
    statistics.
  • Second set of statistics- adjustment of
    optimizers estimates.
  • Advantage- Wrong adjustment can be ignored.
  • Specific adjustment and plan uses it can be
    stored together- avoid wrong adjustment.
  • Easily accessible mechanism for tuning
    Selectivities.

9
Practical Considerations
  • Consistency between statistics
  • DB2- Collects statistics for base tables,
    columns, indexes, table spaces, functions Which
    maybe
  • interdependent.
  • DB2- Incremental generation of statistics and
    check in-consistency.
  • LEO- Consistency of interdependent classes (no.
    of rows in table no. of relative disk pages

10
Practical Considerations
  • Currency and Accuracy-
  • Creating statistics- costly and not accurate
    enough.
  • Fabricated Statistics- file size.
  • Updation of statistics- Re adjust the
    adjustments properly for no information loss.

11
Practical Considerations
  • LEO Vs Database Statistics
  • Improvement for repetitive queries
  • Gathering information on derived tables and more
    detailed info. Than RUNSTATS
  • RUNSTATS Command (necessary even if in presence
    if LEO)
  • Improves estimates for the regions of database
    which are queried most.

12
LEO Feedback loop
  • Retaining the plan and its estimates-
  • - Section can be executed or stored.
  • - QEP is not retained in section else provided
    at runtime of section.
  • - Threads- sequences of operators interpreted at
    runtime.
  • - Plan Skeleton Cumulative cardinality
    estimates for each operator

13
LEO Feedback loop
  • Monitoring Query Execution-
  • - Capture actual number rows processed by each
  • operator (counters are used)
  • - Whole time monitoring for every query in
  • workload with less overhead on query execution

14
LEO Feedback loop
  • Analyzing actuals estimates-
  • - Batch process, separate process, incremental
    and online (best choice)
  • - Runs as background process in spare cycles
    with the help of scheduler
  • - Analyze and feedback for each query
  • Working
  • - Analysis component of LEO finds plan
  • skeleton for query (hashed in memory)
  • For each entry in monitor dump file it finds
    match skeleton plan (probing hash table)
  • - LEO maps monitor counters for each
  • section back to proper QEP operator

15
LEO Feedback loop
  • Adjustment Calculations
  • - Operators (TBSCAN,NJOIN) Multiple predicates
  • - LEO finds actual monitor data and analyze
    predicate according to order of predicates
  • Compare actual and estimated Selectivities
  • LEO deduce and store adjustment factor
  • Adj. factorest. selectivity correct selectivity
  • Est actualstatsadj
  • Storing adjustments
  • LEO_TABLE, LEO_COLUMNS, LEO_JOINS

16
LEO Feedback loop
  • Using Learned Knowledge
  • DB2 Optimizer retrieve schema statistics for
    each base table referred by query
  • Compute selectivity and cardinality for each
    predicate
  • LEARNING enabled by control flag
  • Search cataloged for relevant base table
    cardinalities
  • Adjusts predicate and base table statistics
  • Base table cardinality multiplied by adjustment
    factor
  • E.g. No. of pages with RUNSTATS should be
    reflected to LEOs NPAGES.
  • Consistency between indexes and table statistics.

17
LEO Feedback loop
  • Single Table Predicates

For X.Pricelt100 Adj_selectivity
cardinalityadjustment
18
LEO Feedback loop
  • Join predicates- group by, distinct, union,
    except
  • Actuals estimates for each operator repair
    previous errors in join elements
  • Adj act old_adj / old_est (set operations)
  • Correlation Between Predicates-
  • Independence of columns
  • Correlations between columns
  • Multidimensional histogram approach (assuming
    prior knowledge of correlation between columns)

May Cause errors in selectivity estimate
19
Performance
Monitoring Overhead- 10GB TPC-H Database- 5
overhead of total execution time. Enabling and
disabling monitoring Permitted useful for time
critical Applications.
  • Benefit of learning-
  • Adjustments in outdated and incorrect
    information
  • Difference between old and new QEP- LEO can
    speed up query

Change in QEP due to dramatic changes in database
20
Advanced Topics
When to re-optimize For Static and Dynamic
Queries-Adjustments may provide better
alternative plans. Currently LEO do not provide
Static query or flushing statement cache
(Reason-Learned knowledge). Trade-off
re-optimization and improved runtime Learning
other Information- Self tuning for Parameters
of DBMS Like some changes in following
parameters- Buffer pools Sort heap size N/W
rate, disk access time, disk transfer - common
parameters for workload adjustment
As per user requirements
21
Conclusions
  • Quality services from LEO-
  • Learn and improve
  • Can handle various errors assumptions,
    modeling, literals
  • Can change cardinality, at any point of time
  • Far general than multidimensional mechanism
  • Less overhead due to monitoring
  • Improvement in query optimization
  • Tuning problem queries Cost of ownership
  • Future work-
  • Adjustments for all type of predicates
  • Testing on realistic user queries
  • Correlation among predicates
  • Dynamic change in query execution
Write a Comment
User Comments (0)
About PowerShow.com