P1259051408OIsXb - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

P1259051408OIsXb

Description:

Data Row measures the clustering of index leaf rows in relation to data rows ... around optimizer issues with upgrade or bugs while fixes are made - In most ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 45
Provided by: sybas
Category:

less

Transcript and Presenter's Notes

Title: P1259051408OIsXb


1
DM213 - The Adaptive Optimizer - Evolution of
the Optimizer from 11.9.2 to 12.0 and Beyond
Eric Miner Development Engineer II ESD Optimizer
Group eric.miner_at_sybase.com
2
Objectives
  • Review the evolution of the optimizer from ASE
    11.9.2 to ASE 12.0, and to introduce planned
    changes in the Galaxy optimizer
  • Provide an opportunity for you to give input on
    features and functionality youd like to see
    included in the optimizer

3
Assumptions
  • General knowledge of what the optimizer is and
    how it works
  • Experience in tuning the optimizer and/or
    analyzing optimizer issues
  • Curiosity about future features and functionality
    of the optimizer
  • Input on changes and features youd like to see
    included in future versions of the optimizer

4
Why Should The Optimizer Evolve?
  • The optimizer changes in order to perform new and
    different task
  • The optimizer will always be evolving
  • It must be flexible enough to deal with different
    types of work loads
  • Features and functionality important to Customers
    need to be implemented as quickly at possible
  • Changes in optimization theory in the academic
    world need to be examined and incorporated if
    found useful

5
The11.9.2 Adaptive Optimizer
  • Major changes for the optimizer
  • Major changes in the statistics
  • New statistics used in costing
  • New costing methods and enhancement of existing
    costing
  • Tool added to allow reading,writing and
    simulating of the statistics

6
The11.9.2 Adaptive Optimizer
  • The Statistics Project
  • Major changes to the accuracy, storage and use of
    the optimizer statistics
  • Optimizer statistics stored in two system tables
  • sysstatistics - Column level statistics
  • systabstats - table/index level statistics
  • Column Level (distribution) statistics are no
    longer associated with an index - now associated
    with a column
  • Table/Index level statistics are centralized in
    systabstats

7
The11.9.2 Adaptive Optimizer The Statistics
Project cont.
  • sysstatistics - The column level statistics
  • The histogram cell boundary values and associated
    weights.These are the distribution values
  • The weight for each cell makes them more accurate
    than in previous versions
  • Two cell types - range and frequency count cells
  • The range cell and total density values
  • The date/time of last modification to the column
    statistics
  • The default selectivity values (formerly the
    magic numbers)
  • Column level statistics can be written directly

8
The11.9.2 Adaptive Optimizer The Statistics
Project cont.
  • systabstats - The table and Index level
    statistics
  • Row and page counts of the table and leaf of
    indexes
  • The number of deleted and forwarded rows
  • The number of empty datapages and index leaf
    pages
  • height of an index.
  • The Datapage, Index page and Data row cluster
    ratios
  • Dynamically updated - in memory structure flushed
    to systabstats by a number of actions - cant be
    written

9
Changes to Update Statistics
  • Update statistics has been extended to allow for
    placement of statistics on columns
  • update statistics table_name (col_name)
  • update index statistics table_name ind_name
  • update all statistics table_name
  • Specify the requested number of steps (cells) to
    use when building the columns histogram
  • update statistics table_name (col_name) using 200
    values

10
Optdiag
  • Utility allows you read, write and simulate
    statistics
  • Displays all statistics used by the optimizer
    (sysstatistics and systabstats)
  • Output files can be edited and read in to write
    the column level statistics or output files can
    be used to
  • Outputs can be used to go back to a previous
    set of statistics
  • Only column level statistics can be written with
    optdiag - exception is simulate mode

11
Optdiag cont.
  • Output files very useful when you need to examine
    a dataset, table or its indexes
  • Optdiag simulate mode can be used to simulate the
    effect of changes to any statistical value and/or
    a number of configuration values on the optimizer
  • Simulate writes the column level statistics but
    not the table level - table level stats for
    simulate are stored in special area of
    sysstatistics and used to optimize query when set
    statistics simulate is on
  • Optdiag outputs may be requested by TS

12
Cluster Ratios and Large I/O Costing
  • Cluster ratios are used by the optimizer to cost
    large I/O and NC access
  • Three cluster ratios - Data Page, Index Page and
    Data Row - Cluster ratio of 1 perfect clusting
  • Data Page and Index Page measure the degree of
    clustering of their respective page type within
    extents
  • Used in costing large I/O access
  • Data Row measures the clustering of index leaf
    rows in relation to data rows
  • Used to cost NC/PI access

13
Other Costing - Directional Index Scan
  • Used by the optimizer to avoid sorts for ORDER BY
  • If index order matches ORDER BY sort can be
    avoided.
  • The optimizer will do a sort avert costing if
    possible
  • If the sort avert plan is found cheapest it will
    be used

14
Other Costing - Updateable Cursors and Isolation
Level 0
  • For DOL tables unique index no longer required
  • It is no longer necessary for the optimizer to
    use only a unique index
  • More indexes available, more likely an efficient
    plan will be chosen
  • Uses fixed row ids as identifiers

15
Other Costing - Forwarded Row Costing
  • Forwarded rows exist only on DOL tables
  • A read of a forwarded row requires two reads -
  • One to read the pointer on the home page
  • One to read the actual row
  • The optimizer takes the presence of forwarded
    rows in to account when costing.
  • Check optdiag for a count of forwarded rows in
    the table
  • Use reorg to move forwarded rows back to their
    home pages - remove fragmentation

16
Traceon 302 Rewritten
  • Traceon 302 rewritten for ease of use and to
    incorporate new statistics
  • Five distinct blocks of information
  • Each block describes different areas of costing
  • Obscure wording rewritten, plain English
  • All new statistics are displayed when used in
    costing

17
The ASE 12.0 Adaptive Optimizer
  • ASE 12.0 introduces some enhancements and
    expanded functionality
  • Many involve both optimization and query
    processing
  • Sort Merge Joins
  • Join Transitive Closure
  • Predicate Factoring and Transformation
  • Like optimization enhancements
  • 50 Table Limit
  • Abstract Query Plans

18
The ASE 12.0 Adaptive OptimizerSort Merge Joins
(SMJ)
  • New join method designed primarily for DSS
    queries
  • SMJ can be more efficient in many DSS/DW queries
    than nested loop. May be less efficient for OLTP
  • Ordered joins provide clustered access to joining
    rows result in less logical and physical I/Os.
  • Can exploit indexes that pre-order rows on
    joining columns.
  • All aspects of performing an SMJ are costed by
    the optimizer
  • SMJ can be done in parallel

19
Example
select from part, partsupp, lineitem where
p_partkey ps_partkey and ps_partkey
l_partkey and ps_orderkey l_orderkey and
p_type CD
Unsorted Accessto innermost table
Part Clustered on p_partkey
Partsupp Clustered on ps_partkey
Lineitem Clustered on l_orderkey
Part Clustered on p_partkey
Partsupp Clustered on ps_partkey
Sorted Accessto innermost table
Lineitem Sorted on l_partkey
20
Example
Table T1
Table T2
where T1.pk T2.pk
R E A D N E X T
R E A D N E X T
77
77
78
79
79
81
80
84
81
87
82
90
83
91
84
94
21
The ASE 12.0 Adaptive OptimizerSort Merge Joins
(SMJ) cont
  • By default SMJ is off in ASE 12
  • SMJ may not be efficient for all queries, to set
    it on server wide - sp_configure enable
    sort-merge join and JTC,1
  • Or, boot with trace 384
  • At the session level - set sort_merge on

22
The ASE 12.0 Adaptive Optimizer Join Transitive
Closure (JTC)
  • Provides the optimizer with more possible join
    orders
  • More join orders will result in more join plans
    costed and may result in more efficient plan
    being chosen
  • select A.a from A,B,C where A.AB.b and B.bC.c
  • Adds join and A.aC.c and four extra orders
  • JTC makes this possible for equi-joins
  • JTC off by default to set it on server wide-
  • sp_configure enable sort-merge join and JTC,1
  • Or, boot with trace 334
  • At the session level - set JTC on

23
The ASE 12.0 Adaptive OptimizerPredicate
Factoring Transformation
  • PFT can add easily optimized AND clauses to
    queries containing OR clauses
  • PFT aids optimization by extracting optimizable
    clauses from predicates linked with ORs and
    substitutes them with clauses linked with ANDs
  • PFT makes more access methods available for
    optimization - the more possible methods to cost
    the more likely an efficient plan will be chosen
  • Occurs before entering the optimizer
  • Will not alter the original query semantics

24
The ASE 12.0 Adaptive OptimizerPFT continued
  • Cant be turned off
  • Traceon 302 will print clauses in the clause info
    block that dont appear in the query
  • Example query
  • select from lineitem, part where (p_partkey
    l_partkey and l_quantity gt 10)or (p_partkey
    l_partkey and l_quantity lt 20)
  • Above query is transformed to the following
  • select from lineitem, partwhere ((p_partkey
    l_partkey and l_quantity gt 10)or (p_partkey
    l_partkey and l_quantity lt 20) )and (p_partkey
    l_partkey)and (l_quantity gt 10 or
    l_quantity lt 20)

25
The ASE 12.0 Adaptive OptimizerLIKE Optimization
Enhancement
  • Costing of LIKE predicates with a leading
    wildcard has been improved
  • Previously the optimizer could not cost such a
    query selectivity of 1.0 was all that was
    possible
  • A pattern match search of histogram boundary
    values is done.
  • If a match is found the weight(s) of the cells
    are added to get selectivity
  • If no match is found selectivity is 1/of steps -
    If 20 steps selectivity would be 0.05

26
The ASE 12.0 Adaptive Optimizer 50 Table Limit
  • Number of user tables in a query has increased
  • 50 user tables and 14 worktables
  • Designed to allow for a queries with a large
    number of of non-flattened subqueries
  • Requires more scan descriptors be configured
  • Short circuiting employed - as the number of
    tables increases the number of tables costed at a
    time by join processing decreases
  • 2-25 4, 26-37 3, 38-50 2 - Plans may be
    overlooked

27
The ASE 12.0 Adaptive Optimizer 50 Table Limit
cont.
  • Not designed for 50 tables in a join- however.
  • If you use a large number of tables in a join
    expect optimization to be slower due the the
    plans that must be costed.
  • If possible set tablecount to 4 if short
    circuiting likely
  • Consider using an Abstract Plan if possible

28
The ASE 12.0 Adaptive Optimizer Abstract Query
Plans
  • A persistent readable, editable and reusable
    description of a query plan
  • Associated with a query, not syntactically part
    of the query - no special syntax, can be included
    in query if desired
  • Can be captured and stored when a query is run
  • Capture mode
  • Can be created manually - new T-SQL extensions
  • Can be used by a query when it is re-running
  • Can be edited

29
The ASE 12.0 Adaptive Optimizer Abstract Query
Plans cont.
  • Designed as a method to temporarily work around
    optimizer issues with upgrade or bugs while fixes
    are made - In most cases wont be needed
  • Stored in sysquerplans
  • When captured will contain the entire query plan
  • Full plan - bypasses the optimizer
  • Possible to specify part of the plan and let the
    optimizer complete the plan
  • Partial plan - Allows fine control of the
    optimizer

30
The ASE 12.0 Adaptive Optimizer Abstract Query
Plans cont.
  • Far more granular and flexible control than the
    force options offer
  • Can be used as a super force option

31
The Future
  • Caveats
  • Future features and functionality may change
    before the release of a new ASE
  • In depth details are not presented at this time -
    this is a high level discussion
  • Your input and suggestions are welcome and
    important to us

32
The Future Adaptive Optimizer
  • Project code name - Galaxy - DSS/DW oriented
  • The optimizer will be rewritten - some of the
    planned new features and functionality include -
  • New search engine - more accesses possible, star
    join support
  • Parallel Optimization - better optimization of
    PLL queries
  • New join costing method - more accurate join
    optimization
  • On-Line Statistics - a method to allow column
    level statistics to be gathered quickly

33
The Future Adaptive Optimizer -New Search Engine
  • New search engine - flexible and configurable
  • The search engine needs to be flexible in order
    to allow different levels of optimization for
    different needs - DSS vs. OLTP
  • The search engines levels of optimization will
    be configurable
  • Youll be able to set the amount of searching
    (plan costing) you want done
  • Useful when ASE is doing OLTP vs. DSS
  • Exhaustive vs. restrictive search degree of
    pruning of plans

34
The Future Adaptive Optimizer -New Search Engine
cont.
  • Index selection for joins is currently done in
    the prep phase- indexes costed before join
    costing is done.
  • Predicates on columns of a composite index are
    costed separately to use the index either for the
    join or the SARG
  • Index selection for joins will now be done while
    join plans are being costed (permuted)
  • Predicates (joins and SARGs) on the columns of a
    composite index will be costed together, making
    join index costing more granular

35
The Future Adaptive Optimizer -New Search Engine
cont.
  • More join plans possible
  • Number of plans costed will be controllable
  • Star joins supported -
  • optimizer will look for and cost star joins when
    found
  • May want to use a high level of optimization to
    examine all star join plans

36
The Future Adaptive Optimizer - Parallel
Optimization
  • New optimization for costing parallel queries
  • Parallel optimization is a critical factor in a
    DSS/DW system
  • Parallel optimization will be better able to
    balance response time and resource utilization
  • Will find an efficient PLL plan based on resource
    utilization
  • Will take into account that I/O can be done in
    parallel and cost it accordingly - cost resource
    usage rather than simple I/O
  • Resources include - disks, CPUs, memory and
    network

37
The Future Adaptive Optimizer - Parallel
Optimization cont.
  • Parallel optimization is configurable
  • Activate or inhibit parallel optimization phases
    based on operating requirements
  • Levels of parallel optimization can be set
  • Configurable unit costs of each resource type
  • Includes disk speed, CPU, network, memory

38
The Future Adaptive Optimizer - New Join Costing
Method
  • New method will result in more accurate costing
    of joins
  • By merging histograms on the fly rows that will
    not participate in the join will not effect
    costing
  • More accurate estimate of the number of rows that
    will join
  • SARG values will be used to restrict the portion
    of original and join histograms to be used in
    costing

39
The Future Adaptive Optimizer - New Join Costing
Method cont.
  • Boundary values are merged, weights are more
    granular , those values not qualified by SARG
    values will be eliminated
  • Data skew (many values with few rows - a few
    values with many rows) has a direct effect on the
    total density value and thus on join costing
  • New costing method does not use the Total Density
    value
  • Unless it participates in the join - a highly
    duplicated value will not effect the join costing
  • Must have statistics on the joining columns

40
The Future Adaptive Optimizer - On-Line Statistics
  • Designed to gather column level statistics
    quicker and when you want
  • Sampling is at the heart of on-line statistics
  • A full read of all rows of the column not
    required - fewer reads, fewer I/O
  • Can set pages or extents to be read for sampling
  • Reading pages/extents is more efficient than
    sampling rows
  • Can specify percentage (degree) of sampling to do
    - 10 - 0.1 expected range - percentage will be
    persistent for a column

41
The Future Adaptive Optimizer - On-Line
Statistics cont.
  • Manual or automatic?
  • Sampling can be specified at create index or
    update statistics
  • Degree of sampling will not change unless you
    specify another value
  • Set on-line statistics to gather stats on its own
  • Based on the DML stream?
  • Based on the number of rows in a column that
    change?

42
Conclusion
  • The optimizer will always be evolving
  • The optimizer will be flexible enough to handle
    new demands on it
  • It will better handle DSS/DW tasks
  • Customer input on new features and functionality
    must be weighed and implemented
  • New theories in optimization must be examined and
    incorporated if found useful
  • The optimizer can not stand still

43
Portal-Ready

Integration
Availability
  • Access Analysis Corporate Information
  • Plugn Play with Standard
  • Internet Technologies
  • Continuous Database Availability
  • Realtime Database Maintenance Tuning

Database Requirements
Productivity
  • Portal Productivity Technologies
  • Web Transaction Management

44
Questions/Answers and Input
  • Come up and get a business card
  • or
  • eminer_at_sybase.com
Write a Comment
User Comments (0)
About PowerShow.com