Title: P1259051408OIsXb
1DM213 - The Adaptive Optimizer - Evolution of
the Optimizer from 11.9.2 to 12.0 and Beyond
Eric Miner Development Engineer II ESD Optimizer
Group eric.miner_at_sybase.com
2Objectives
- Review the evolution of the optimizer from ASE
11.9.2 to ASE 12.0, and to introduce planned
changes in the Galaxy optimizer - Provide an opportunity for you to give input on
features and functionality youd like to see
included in the optimizer
3Assumptions
- General knowledge of what the optimizer is and
how it works - Experience in tuning the optimizer and/or
analyzing optimizer issues - Curiosity about future features and functionality
of the optimizer - Input on changes and features youd like to see
included in future versions of the optimizer
4Why Should The Optimizer Evolve?
- The optimizer changes in order to perform new and
different task - The optimizer will always be evolving
- It must be flexible enough to deal with different
types of work loads - Features and functionality important to Customers
need to be implemented as quickly at possible - Changes in optimization theory in the academic
world need to be examined and incorporated if
found useful
5The11.9.2 Adaptive Optimizer
- Major changes for the optimizer
- Major changes in the statistics
- New statistics used in costing
- New costing methods and enhancement of existing
costing - Tool added to allow reading,writing and
simulating of the statistics
6The11.9.2 Adaptive Optimizer
- The Statistics Project
- Major changes to the accuracy, storage and use of
the optimizer statistics - Optimizer statistics stored in two system tables
- sysstatistics - Column level statistics
- systabstats - table/index level statistics
- Column Level (distribution) statistics are no
longer associated with an index - now associated
with a column - Table/Index level statistics are centralized in
systabstats
7The11.9.2 Adaptive Optimizer The Statistics
Project cont.
- sysstatistics - The column level statistics
- The histogram cell boundary values and associated
weights.These are the distribution values - The weight for each cell makes them more accurate
than in previous versions - Two cell types - range and frequency count cells
- The range cell and total density values
- The date/time of last modification to the column
statistics - The default selectivity values (formerly the
magic numbers) - Column level statistics can be written directly
8The11.9.2 Adaptive Optimizer The Statistics
Project cont.
- systabstats - The table and Index level
statistics - Row and page counts of the table and leaf of
indexes - The number of deleted and forwarded rows
- The number of empty datapages and index leaf
pages - height of an index.
- The Datapage, Index page and Data row cluster
ratios - Dynamically updated - in memory structure flushed
to systabstats by a number of actions - cant be
written
9Changes to Update Statistics
- Update statistics has been extended to allow for
placement of statistics on columns - update statistics table_name (col_name)
- update index statistics table_name ind_name
- update all statistics table_name
- Specify the requested number of steps (cells) to
use when building the columns histogram - update statistics table_name (col_name) using 200
values
10Optdiag
- Utility allows you read, write and simulate
statistics - Displays all statistics used by the optimizer
(sysstatistics and systabstats) - Output files can be edited and read in to write
the column level statistics or output files can
be used to - Outputs can be used to go back to a previous
set of statistics - Only column level statistics can be written with
optdiag - exception is simulate mode
11Optdiag cont.
- Output files very useful when you need to examine
a dataset, table or its indexes - Optdiag simulate mode can be used to simulate the
effect of changes to any statistical value and/or
a number of configuration values on the optimizer - Simulate writes the column level statistics but
not the table level - table level stats for
simulate are stored in special area of
sysstatistics and used to optimize query when set
statistics simulate is on - Optdiag outputs may be requested by TS
12Cluster Ratios and Large I/O Costing
- Cluster ratios are used by the optimizer to cost
large I/O and NC access - Three cluster ratios - Data Page, Index Page and
Data Row - Cluster ratio of 1 perfect clusting - Data Page and Index Page measure the degree of
clustering of their respective page type within
extents - Used in costing large I/O access
- Data Row measures the clustering of index leaf
rows in relation to data rows - Used to cost NC/PI access
13Other Costing - Directional Index Scan
- Used by the optimizer to avoid sorts for ORDER BY
- If index order matches ORDER BY sort can be
avoided. - The optimizer will do a sort avert costing if
possible - If the sort avert plan is found cheapest it will
be used
14Other Costing - Updateable Cursors and Isolation
Level 0
- For DOL tables unique index no longer required
- It is no longer necessary for the optimizer to
use only a unique index - More indexes available, more likely an efficient
plan will be chosen - Uses fixed row ids as identifiers
15Other Costing - Forwarded Row Costing
- Forwarded rows exist only on DOL tables
- A read of a forwarded row requires two reads -
- One to read the pointer on the home page
- One to read the actual row
- The optimizer takes the presence of forwarded
rows in to account when costing. - Check optdiag for a count of forwarded rows in
the table - Use reorg to move forwarded rows back to their
home pages - remove fragmentation
16Traceon 302 Rewritten
- Traceon 302 rewritten for ease of use and to
incorporate new statistics - Five distinct blocks of information
- Each block describes different areas of costing
- Obscure wording rewritten, plain English
- All new statistics are displayed when used in
costing
17The ASE 12.0 Adaptive Optimizer
- ASE 12.0 introduces some enhancements and
expanded functionality - Many involve both optimization and query
processing - Sort Merge Joins
- Join Transitive Closure
- Predicate Factoring and Transformation
- Like optimization enhancements
- 50 Table Limit
- Abstract Query Plans
18The ASE 12.0 Adaptive OptimizerSort Merge Joins
(SMJ)
- New join method designed primarily for DSS
queries - SMJ can be more efficient in many DSS/DW queries
than nested loop. May be less efficient for OLTP - Ordered joins provide clustered access to joining
rows result in less logical and physical I/Os. - Can exploit indexes that pre-order rows on
joining columns. - All aspects of performing an SMJ are costed by
the optimizer - SMJ can be done in parallel
19Example
select from part, partsupp, lineitem where
p_partkey ps_partkey and ps_partkey
l_partkey and ps_orderkey l_orderkey and
p_type CD
Unsorted Accessto innermost table
Part Clustered on p_partkey
Partsupp Clustered on ps_partkey
Lineitem Clustered on l_orderkey
Part Clustered on p_partkey
Partsupp Clustered on ps_partkey
Sorted Accessto innermost table
Lineitem Sorted on l_partkey
20Example
Table T1
Table T2
where T1.pk T2.pk
R E A D N E X T
R E A D N E X T
77
77
78
79
79
81
80
84
81
87
82
90
83
91
84
94
21The ASE 12.0 Adaptive OptimizerSort Merge Joins
(SMJ) cont
- By default SMJ is off in ASE 12
- SMJ may not be efficient for all queries, to set
it on server wide - sp_configure enable
sort-merge join and JTC,1 - Or, boot with trace 384
- At the session level - set sort_merge on
22The ASE 12.0 Adaptive Optimizer Join Transitive
Closure (JTC)
- Provides the optimizer with more possible join
orders - More join orders will result in more join plans
costed and may result in more efficient plan
being chosen - select A.a from A,B,C where A.AB.b and B.bC.c
- Adds join and A.aC.c and four extra orders
- JTC makes this possible for equi-joins
- JTC off by default to set it on server wide-
- sp_configure enable sort-merge join and JTC,1
- Or, boot with trace 334
- At the session level - set JTC on
23The ASE 12.0 Adaptive OptimizerPredicate
Factoring Transformation
- PFT can add easily optimized AND clauses to
queries containing OR clauses - PFT aids optimization by extracting optimizable
clauses from predicates linked with ORs and
substitutes them with clauses linked with ANDs - PFT makes more access methods available for
optimization - the more possible methods to cost
the more likely an efficient plan will be chosen - Occurs before entering the optimizer
- Will not alter the original query semantics
24The ASE 12.0 Adaptive OptimizerPFT continued
- Cant be turned off
- Traceon 302 will print clauses in the clause info
block that dont appear in the query - Example query
- select from lineitem, part where (p_partkey
l_partkey and l_quantity gt 10)or (p_partkey
l_partkey and l_quantity lt 20) - Above query is transformed to the following
- select from lineitem, partwhere ((p_partkey
l_partkey and l_quantity gt 10)or (p_partkey
l_partkey and l_quantity lt 20) )and (p_partkey
l_partkey)and (l_quantity gt 10 or
l_quantity lt 20)
25The ASE 12.0 Adaptive OptimizerLIKE Optimization
Enhancement
- Costing of LIKE predicates with a leading
wildcard has been improved - Previously the optimizer could not cost such a
query selectivity of 1.0 was all that was
possible - A pattern match search of histogram boundary
values is done. - If a match is found the weight(s) of the cells
are added to get selectivity - If no match is found selectivity is 1/of steps -
If 20 steps selectivity would be 0.05
26The ASE 12.0 Adaptive Optimizer 50 Table Limit
- Number of user tables in a query has increased
- 50 user tables and 14 worktables
- Designed to allow for a queries with a large
number of of non-flattened subqueries - Requires more scan descriptors be configured
- Short circuiting employed - as the number of
tables increases the number of tables costed at a
time by join processing decreases - 2-25 4, 26-37 3, 38-50 2 - Plans may be
overlooked
27The ASE 12.0 Adaptive Optimizer 50 Table Limit
cont.
- Not designed for 50 tables in a join- however.
- If you use a large number of tables in a join
expect optimization to be slower due the the
plans that must be costed. - If possible set tablecount to 4 if short
circuiting likely - Consider using an Abstract Plan if possible
28The ASE 12.0 Adaptive Optimizer Abstract Query
Plans
- A persistent readable, editable and reusable
description of a query plan - Associated with a query, not syntactically part
of the query - no special syntax, can be included
in query if desired - Can be captured and stored when a query is run
- Capture mode
- Can be created manually - new T-SQL extensions
- Can be used by a query when it is re-running
- Can be edited
29The ASE 12.0 Adaptive Optimizer Abstract Query
Plans cont.
- Designed as a method to temporarily work around
optimizer issues with upgrade or bugs while fixes
are made - In most cases wont be needed - Stored in sysquerplans
- When captured will contain the entire query plan
- Full plan - bypasses the optimizer
- Possible to specify part of the plan and let the
optimizer complete the plan - Partial plan - Allows fine control of the
optimizer
30The ASE 12.0 Adaptive Optimizer Abstract Query
Plans cont.
- Far more granular and flexible control than the
force options offer - Can be used as a super force option
31The Future
- Caveats
- Future features and functionality may change
before the release of a new ASE - In depth details are not presented at this time -
this is a high level discussion - Your input and suggestions are welcome and
important to us
32The Future Adaptive Optimizer
- Project code name - Galaxy - DSS/DW oriented
- The optimizer will be rewritten - some of the
planned new features and functionality include - - New search engine - more accesses possible, star
join support - Parallel Optimization - better optimization of
PLL queries - New join costing method - more accurate join
optimization - On-Line Statistics - a method to allow column
level statistics to be gathered quickly
33The Future Adaptive Optimizer -New Search Engine
- New search engine - flexible and configurable
- The search engine needs to be flexible in order
to allow different levels of optimization for
different needs - DSS vs. OLTP - The search engines levels of optimization will
be configurable - Youll be able to set the amount of searching
(plan costing) you want done - Useful when ASE is doing OLTP vs. DSS
- Exhaustive vs. restrictive search degree of
pruning of plans
34The Future Adaptive Optimizer -New Search Engine
cont.
- Index selection for joins is currently done in
the prep phase- indexes costed before join
costing is done. - Predicates on columns of a composite index are
costed separately to use the index either for the
join or the SARG - Index selection for joins will now be done while
join plans are being costed (permuted) - Predicates (joins and SARGs) on the columns of a
composite index will be costed together, making
join index costing more granular
35The Future Adaptive Optimizer -New Search Engine
cont.
- More join plans possible
- Number of plans costed will be controllable
- Star joins supported -
- optimizer will look for and cost star joins when
found - May want to use a high level of optimization to
examine all star join plans
36The Future Adaptive Optimizer - Parallel
Optimization
- New optimization for costing parallel queries
- Parallel optimization is a critical factor in a
DSS/DW system - Parallel optimization will be better able to
balance response time and resource utilization - Will find an efficient PLL plan based on resource
utilization - Will take into account that I/O can be done in
parallel and cost it accordingly - cost resource
usage rather than simple I/O - Resources include - disks, CPUs, memory and
network
37The Future Adaptive Optimizer - Parallel
Optimization cont.
- Parallel optimization is configurable
- Activate or inhibit parallel optimization phases
based on operating requirements - Levels of parallel optimization can be set
- Configurable unit costs of each resource type
- Includes disk speed, CPU, network, memory
38The Future Adaptive Optimizer - New Join Costing
Method
- New method will result in more accurate costing
of joins - By merging histograms on the fly rows that will
not participate in the join will not effect
costing - More accurate estimate of the number of rows that
will join - SARG values will be used to restrict the portion
of original and join histograms to be used in
costing
39The Future Adaptive Optimizer - New Join Costing
Method cont.
- Boundary values are merged, weights are more
granular , those values not qualified by SARG
values will be eliminated - Data skew (many values with few rows - a few
values with many rows) has a direct effect on the
total density value and thus on join costing - New costing method does not use the Total Density
value - Unless it participates in the join - a highly
duplicated value will not effect the join costing
- Must have statistics on the joining columns
40The Future Adaptive Optimizer - On-Line Statistics
- Designed to gather column level statistics
quicker and when you want - Sampling is at the heart of on-line statistics
- A full read of all rows of the column not
required - fewer reads, fewer I/O - Can set pages or extents to be read for sampling
- Reading pages/extents is more efficient than
sampling rows - Can specify percentage (degree) of sampling to do
- 10 - 0.1 expected range - percentage will be
persistent for a column
41The Future Adaptive Optimizer - On-Line
Statistics cont.
- Manual or automatic?
- Sampling can be specified at create index or
update statistics - Degree of sampling will not change unless you
specify another value - Set on-line statistics to gather stats on its own
- Based on the DML stream?
- Based on the number of rows in a column that
change?
42Conclusion
- The optimizer will always be evolving
- The optimizer will be flexible enough to handle
new demands on it - It will better handle DSS/DW tasks
- Customer input on new features and functionality
must be weighed and implemented - New theories in optimization must be examined and
incorporated if found useful - The optimizer can not stand still
43 Portal-Ready
Integration
Availability
- Access Analysis Corporate Information
- Plugn Play with Standard
- Internet Technologies
- Continuous Database Availability
- Realtime Database Maintenance Tuning
Database Requirements
Productivity
- Portal Productivity Technologies
- Web Transaction Management
44Questions/Answers and Input
- Come up and get a business card
- or
- eminer_at_sybase.com