Query Optimization Strategies - PowerPoint PPT Presentation

About This Presentation
Title:

Query Optimization Strategies

Description:

Some initial suggestions for the project proposal ... since they didn't pay off immediately had to hack them into sort-merge join ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 30
Provided by: zack4
Category:

less

Transcript and Presenter's Notes

Title: Query Optimization Strategies


1
Query Optimization Strategies
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 650 Implementing Data Management Systems
  • January 31, 2005

2
Administrivia
  • Wednesday
  • Some initial suggestions for the project proposal
  • Scheduling of the deadline for your midterm
    report
  • The next assignment
  • Read Gray et al. (granularity of locks) and Kung
    and Robinson (optimistic concurrency control)
    papers
  • Review Kung and Robinson
  • Consider how optimistic CC is or isnt useful in
    Web-scale data management

3
Todays Trivia Question
4
Query Optimization
  • Challenge pick the query execution plan that
    has minimum cost
  • Sources of cost
  • Interactions with other work
  • Size of intermediate results
  • Choices of algorithms, access methods
  • Mismatch between I/O, CPU rates
  • Data properties skew, order, placement

5
The General Model of Optimization
  • Given an AST of a query
  • Build a logical query plan
  • (Tree of query algebraic operations)
  • Transform into better logical plan
  • Convert into a physical query plan
  • (Includes strategies for executing operations)

6
Strategies
  • Basically, search, over the space of possible
    plans
  • At least of exponential complexity in the number
    of operators
  • Hence, exhaustive search is generally not
    feasible
  • What can you do?
  • Heuristics only INGRES, Oracle until the
    mid-90s
  • Randomized, simulated annealing, many efforts
    in the mid-90s
  • Heuristics plus cost-based join enumeration
    System-R
  • Stratified search heuristics plus cost-based
    enumeration of joins and a few other operators
    Starburst optimizers
  • Unified search full cost-based search EXODUS,
    Volcano, Cascades optimizers

7
What Are the Cost Estimating Factors?
  • Some notion of CPU, disk speeds, page sizes,
    buffer sizes,
  • Cost model for every operator
  • Some information about tables and data
  • Sizes
  • Cardinalities
  • Number of unique values (from index)
  • Histograms, sketches,

8
The System-R Approach Heuristics with
Cost-Based Join Enumeration
  • Make the following assumptions
  • All predicates are evaluated as early as possible
  • All data is projected away as early as possible
  • Separately consider operations that produce
    intermediate state or are blocking
  • Joins
  • Aggregation
  • Sorting
  • Correlation with a subquery (join, exists, )
  • By choosing a join ordering, were automatically
    choosing where selections and projections are
    pushed why is this so?

9
System-R Architecture
  • Breaks a query into its blocks, separately
    optimizes them
  • Nested loops join between them, if necessary
  • Within a block focuses on joins (only a few
    kinds) in dynamic programming enumeration
  • Principle of optimality best k-way join includes
    best (k-1)-way join
  • Use simple table statistics when available, based
    on indices magic numbers where unavailable
  • Heuristics
  • Push sargable selects, projects as low as
    possible
  • Cartesian products only after joins
  • Left-linear trees only n2n-1 cost-estimation
    operations
  • Grouping last
  • Extra interesting orders dimension
  • Grouping, ordering, join attributes

10
Example
  • Schema R(a,b), S(b,c), T(a,c), U(c,d)
  • SELECT d, AVG(c)FROM R,S,T,UWHERE R.aS.b AND
    S.cT.c AND R.aT.a AND T.cU.cGROUP BY dORDER
    BY d
  • In relational algebra ?d/AVG(c)(?d(sa lt 2 (R ?
    S ? T ? U)))

11
Why We Need More than System-R
  • Cross-query-block optimizations
  • e.g., push a selection predicate from one block
    to another
  • Better statistics
  • More general kinds of optimizations
  • Optimization of aggregation operations with joins
  • Different cost and data models, e.g., OO, XML
  • Additional joins, e.g., containment joins
  • Can we build an extensible architecture for this?
  • Logical, physical, and logical-to-physical
    transformations
  • Enforcers
  • Alternative search strategies
  • Left-deep plans arent always optimal
  • Perhaps we can prune more efficiently

12
Optimizer Generators
  • Idea behind EXODUS and StarburstBuild a
    programming language for writing custom
    optimizers!
  • Rule-based or transformational language
  • Describes, using declarative template and
    conditions, an equivalent expression
  • EXODUS compile an optimizer based on a set of
    rules
  • Starburst run a set of rules in the
    interpreter a client could customize the rules

13
Starburst Query Optimizer
  • Corona query processor vs. Core engine follows
    RDS and RSS separation
  • Part of a very ambitious project
  • Hydrogen language highly orthogonal (unlike SQL),
    and supported many fancy OO concepts (e.g.,
    inheritance, methods), recursion special
    constraints, etc. much more powerful than SQL
    at the time
  • Some portions of Hydrogen and Corona made their
    way into DB2, later SQL standards
  • Two stages (stratified search)
  • Query rewrite/query graph model a
    SQL-block-level, relational calculus-like
    representation of queries
  • Plan optimization a System-R-style dynamic
    programming phase once query rewrite has completed

14
Starburst QGM
  • Tries to encode relational calculus-like
    concepts
  • Predicates between variables within each SQL
    block body
  • Variables can be
  • Distinguished (set-builder / for-each / F)
  • Existential (?)
  • Universal (?)
  • Returned values from each block (head)
  • Predicates across blocks

15
Starburst QGM Example
  • SELECT partno, price, order_qty
  • FROM quotations Q1
  • WHERE Q1.partno IN
  • (SELECT partnoFROM inventory Q3WHERE
    Q3.onhand_qty lt Q1.order_qtyAND Q3.type cpu

16
Starburst Query Rewrite
  • Focus inter-block optimizations
  • Pushing predicates across views, pushing
    projections down
  • Magic sets rewritings
  • Simplification, transitivity
  • Implemented through production rules
  • Condition action rules selected by
  • Sequence
  • Priority
  • Probability distribution
  • Search may stop after a budget
  • End product logical plan(s), chosen via above
    constraints
  • Normally set is singleton
  • Can also CHOOSE among set by invoking the
    cost-based plan optimizer

17
Query Rewrite Example
  • Convert subquery to join
  • IF OP1.type Select Æ Q2.type 9 Æ (at each
    evaluation of the existential predicate at most
    one tuple of T2 satisfies the predicate)
  • THEN Q2.type F
  • Merge operations
  • IF OP1.type Select Æ OP2.type Select Æ
    Q2.type F Æ (NOT (T1.distinct false Æ
    OP2.eliminate_duplicate true))
  • THEN
  • merge OP2 into OP1
  • IF OP2.eliminate_duplicateTHEN
  • OP1.eliminate_duplicate true

18
Query Rewrite Example
  • Convert subquery to join
  • IF OP1.type Select Æ Q2.type 9 Æ (at each
    evaluation of the existential predicate at most
    one tuple of T2 satisfies the predicate)
  • THEN Q2.type F
  • Merge operations
  • IF OP1.type Select Æ OP2.type Select Æ
    Q2.type F Æ (NOT (T1.distinct false Æ
    OP2.eliminate_duplicate true))
  • THEN
  • merge OP2 into OP1
  • IF OP2.eliminate_duplicateTHEN
  • OP1.eliminate_duplicate true

19
Starburst Plan Optimization
  • Separately optimizes each QGM operation (box)
  • Grammar of STrategy AlteRnatives (STARs)
  • Take high-level operations and turn them into
    LOw-LEvel Plan OPerations (LOLEPOPs)
  • JOIN, UNION, SCAN, SORT, SHIP,
  • Tables and plans have properties
  • Relational (tables joined, columns accessed,
    predicates applied)
  • Operational (ordering, site)
  • Estimated (cost, cardinality)
  • GLUE operators SORT, SHIP
  • Join enumerator tries alternative join sequences
    a la System-R
  • Can produce bushy trees
  • Can have rank/priority with each STAR

20
Starburst Pros and Cons
  • Pro
  • Stratified search generally works well in
    practice DB2 UDB has perhaps the best query
    optimizer out there
  • Interesting model of separating calculus-level
    and algebra-level optimizations
  • Generally provides fast performance
  • Con
  • Interpreted rules were too slow and no database
    user ever customized the engine!
  • Difficult to assign priorities to transformations
  • Some QGM transformations that were tried were
    difficult to assess without running many
    cost-based optimizations
  • Rules got out of control

21
The EXODUS and Volcano Optimizer Generators
  • Part of a database toolkit approach to building
    systems
  • A set of libraries and languages for building
    databases with custom data models and
    functionalities

(rules in E)
(EXODUS)
(gcc)
(MyDB plan)
(MyQL)
22
EXODUS/Volcano Model
  • Try to unify the notion of logical-logical
    transformations and logical-physical
    transformations
  • No stratification as in Starburst everything is
    transformations
  • Challenge efficient search need a lot of
    pruning
  • EXODUS used many heuristics, something called a
    MESH
  • Volcano branch-and-bound pruning, recursion
    memoization

23
Example Rules
  • Physical operators
  • operator 2 join
  • method 2 hash_join loops_join cartesian_product
  • Logical-logical transformations
  • join (1,2) -gt join(2,1)
  • Logical-physical transformations
  • join (1,2) by hash_join (1,2)
  • Can get quite hairy
  • join 7 (join 8 (1,2), 3) lt-gt join 8(1, join 7
    (2,3))ifdef FORWARDif (NOT cover_predicate
    (OPERATOR_7 oper_argument, INPUT_2 oper_property,
    INPUT_3 oper_property)) REJECT

24
So How Does the Optimizer Work?(EXODUS version)
  • Needs to enumerate all possible transformations
    without repeating
  • Every expression is stored in a MESH
  • Basically, an associative lookup for each
    expression, which can link to other entries in
    the same MESH

25
Search in EXODUS
  • Apply a transformation, see if it produces a new
    node
  • If so
  • Find cheapest implementation rule
  • Also apply all relevant transformation rules, add
    results to OPEN set
  • Propagate revised cost to parents (reanalyze)
  • Check parents for new transformation
    possibilities (rematch)
  • Heuristics to guide the search in the OPEN set
  • Promise an expected cost factor for each
    transformation rule, based on analysis of
    averages of the optimizers cost model results
  • Favor items with high expected payoff over the
    current cost
  • Problem often need to apply 2 rules to get a
    benefit use heuristics
  • Once a full plan is found, optimizer does hill
    climbing, only applying a limited set of rules

26
Pros and Cons of EXODUS
  • Pros
  • Unified model of optimization is powerful,
    elegant
  • Very extensible architecture
  • Cons
  • Combined logical and physical expressions in the
    same MESH
  • equivalent logical plans with different physical
    operators (e.g., merge vs. hash joins) were kept
    twice
  • Physical properties werent handled well
  • sort enforcers were seldom applied since they
    didnt pay off immediately had to hack them
    into sort-merge join
  • Hard-coded transformation, then algorithm
    selection, cost analysis
  • always applied even if not part of the most
    promising expression
  • applied based on perceived benefit biased
    towards larger expressions, which meant repeated
    re-optimization
  • Cost wasnt very generic a concept

27
Volcano, Successor to EXODUS(Shouldnt it be
LEVITICUS?)
  • Re-architected into a top-down, memoized engine
  • Depth-first search allows branch-and-bound
    pruning
  • FindBestPlan takes logical expression, physical
    properties, cost bound
  • If already computed, return
  • Else compute set of possible moves
  • Logical-logical rule
  • Compliant logical-physical rule
  • Enforcer
  • Insert logical expression into lookup table
  • Insert physical op, plan into separate lookup
    table
  • Return best plan and cost
  • More generic notions of properties and enforcers
    (e.g., location, SHIP), cost (an ADT)

28
EXODUS, Revision 3 Cascades
  • Basically, a cleaner, more object-oriented
    version of the Volcano engine
  • Rumor has it that MS SQL Server is currently
    based on a (simplified and streamlined) version
    of the Volcano/Cascades optimizer generator

29
Optimization Evaluation
  • So, which is best?
  • Heuristics plus join-based enumeration (System-R)
  • Stratified, calculus-then-algebraic (Starburst)
  • Con QGM transformations are almost always
    heuristics-based
  • Pro very succinct transformations at QGM level
  • Unified algebraic (Volcano/Cascades)
  • Con many more rules need to be applied to get
    effect of QGM rewrites
  • Pro unified, fully cost-based model
Write a Comment
User Comments (0)
About PowerShow.com