P1259051408OIsXb

About This Presentation

Title:

P1259051408OIsXb

Description:

Data Row measures the clustering of index leaf rows in relation to data rows ... around optimizer issues with upgrade or bugs while fixes are made - In most ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 45

Provided by: sybas

Category:

more less

Transcript and Presenter's Notes

Title: P1259051408OIsXb

1
DM213 - The Adaptive Optimizer - Evolution of
the Optimizer from 11.9.2 to 12.0 and Beyond
Eric Miner Development Engineer II ESD Optimizer
Group eric.miner_at_sybase.com
2
Objectives

Review the evolution of the optimizer from ASE
11.9.2 to ASE 12.0, and to introduce planned
changes in the Galaxy optimizer
Provide an opportunity for you to give input on
features and functionality youd like to see
included in the optimizer

3
Assumptions

General knowledge of what the optimizer is and
how it works
Experience in tuning the optimizer and/or
analyzing optimizer issues
Curiosity about future features and functionality
of the optimizer
Input on changes and features youd like to see
included in future versions of the optimizer

4
Why Should The Optimizer Evolve?

The optimizer changes in order to perform new and
different task
The optimizer will always be evolving
It must be flexible enough to deal with different
types of work loads
Features and functionality important to Customers
need to be implemented as quickly at possible
Changes in optimization theory in the academic
world need to be examined and incorporated if
found useful

5
The11.9.2 Adaptive Optimizer

Major changes for the optimizer
Major changes in the statistics
New statistics used in costing
New costing methods and enhancement of existing
costing
Tool added to allow reading,writing and
simulating of the statistics

6
The11.9.2 Adaptive Optimizer

The Statistics Project
Major changes to the accuracy, storage and use of
the optimizer statistics
Optimizer statistics stored in two system tables
sysstatistics - Column level statistics
systabstats - table/index level statistics
Column Level (distribution) statistics are no
longer associated with an index - now associated
with a column
Table/Index level statistics are centralized in
systabstats

7
The11.9.2 Adaptive Optimizer The Statistics
Project cont.

sysstatistics - The column level statistics
The histogram cell boundary values and associated
weights.These are the distribution values
The weight for each cell makes them more accurate
than in previous versions
Two cell types - range and frequency count cells
The range cell and total density values
The date/time of last modification to the column
statistics
The default selectivity values (formerly the
magic numbers)
Column level statistics can be written directly

8
The11.9.2 Adaptive Optimizer The Statistics
Project cont.

systabstats - The table and Index level
statistics
Row and page counts of the table and leaf of
indexes
The number of deleted and forwarded rows
The number of empty datapages and index leaf
pages
height of an index.
The Datapage, Index page and Data row cluster
ratios
Dynamically updated - in memory structure flushed
to systabstats by a number of actions - cant be
written

9
Changes to Update Statistics

Update statistics has been extended to allow for
placement of statistics on columns
update statistics table_name (col_name)
update index statistics table_name ind_name
update all statistics table_name
Specify the requested number of steps (cells) to
use when building the columns histogram
update statistics table_name (col_name) using 200
values

10
Optdiag

Utility allows you read, write and simulate
statistics
Displays all statistics used by the optimizer
(sysstatistics and systabstats)
Output files can be edited and read in to write
the column level statistics or output files can
be used to
Outputs can be used to go back to a previous
set of statistics
Only column level statistics can be written with
optdiag - exception is simulate mode

11
Optdiag cont.

Output files very useful when you need to examine
a dataset, table or its indexes
Optdiag simulate mode can be used to simulate the
effect of changes to any statistical value and/or
a number of configuration values on the optimizer
Simulate writes the column level statistics but
not the table level - table level stats for
simulate are stored in special area of
sysstatistics and used to optimize query when set
statistics simulate is on
Optdiag outputs may be requested by TS

12
Cluster Ratios and Large I/O Costing

Cluster ratios are used by the optimizer to cost
large I/O and NC access
Three cluster ratios - Data Page, Index Page and
Data Row - Cluster ratio of 1 perfect clusting
Data Page and Index Page measure the degree of
clustering of their respective page type within
extents
Used in costing large I/O access
Data Row measures the clustering of index leaf
rows in relation to data rows
Used to cost NC/PI access

13
Other Costing - Directional Index Scan

Used by the optimizer to avoid sorts for ORDER BY
If index order matches ORDER BY sort can be
avoided.
The optimizer will do a sort avert costing if
possible
If the sort avert plan is found cheapest it will
be used

14
Other Costing - Updateable Cursors and Isolation
Level 0

For DOL tables unique index no longer required
It is no longer necessary for the optimizer to
use only a unique index
More indexes available, more likely an efficient
plan will be chosen
Uses fixed row ids as identifiers

15
Other Costing - Forwarded Row Costing

Forwarded rows exist only on DOL tables
A read of a forwarded row requires two reads -
One to read the pointer on the home page
One to read the actual row
The optimizer takes the presence of forwarded
rows in to account when costing.
Check optdiag for a count of forwarded rows in
the table
Use reorg to move forwarded rows back to their
home pages - remove fragmentation

16
Traceon 302 Rewritten

Traceon 302 rewritten for ease of use and to
incorporate new statistics
Five distinct blocks of information
Each block describes different areas of costing
Obscure wording rewritten, plain English
All new statistics are displayed when used in
costing

17
The ASE 12.0 Adaptive Optimizer

ASE 12.0 introduces some enhancements and
expanded functionality
Many involve both optimization and query
processing
Sort Merge Joins
Join Transitive Closure
Predicate Factoring and Transformation
Like optimization enhancements
50 Table Limit
Abstract Query Plans

18
The ASE 12.0 Adaptive OptimizerSort Merge Joins
(SMJ)

New join method designed primarily for DSS
queries
SMJ can be more efficient in many DSS/DW queries
than nested loop. May be less efficient for OLTP
Ordered joins provide clustered access to joining
rows result in less logical and physical I/Os.
Can exploit indexes that pre-order rows on
joining columns.
All aspects of performing an SMJ are costed by
the optimizer
SMJ can be done in parallel

19
Example
select from part, partsupp, lineitem where
p_partkey ps_partkey and ps_partkey
l_partkey and ps_orderkey l_orderkey and
p_type CD
Unsorted Accessto innermost table
Part Clustered on p_partkey
Partsupp Clustered on ps_partkey
Lineitem Clustered on l_orderkey
Part Clustered on p_partkey
Partsupp Clustered on ps_partkey
Sorted Accessto innermost table
Lineitem Sorted on l_partkey
20
Example
Table T1
Table T2
where T1.pk T2.pk
R E A D N E X T
R E A D N E X T
77
77
78
79
79
81
80
84
81
87
82
90
83
91
84
94
21
The ASE 12.0 Adaptive OptimizerSort Merge Joins
(SMJ) cont

By default SMJ is off in ASE 12
SMJ may not be efficient for all queries, to set
it on server wide - sp_configure enable
sort-merge join and JTC,1
Or, boot with trace 384
At the session level - set sort_merge on

22
The ASE 12.0 Adaptive Optimizer Join Transitive
Closure (JTC)

Provides the optimizer with more possible join
orders
More join orders will result in more join plans
costed and may result in more efficient plan
being chosen
select A.a from A,B,C where A.AB.b and B.bC.c
Adds join and A.aC.c and four extra orders
JTC makes this possible for equi-joins
JTC off by default to set it on server wide-
sp_configure enable sort-merge join and JTC,1
Or, boot with trace 334
At the session level - set JTC on

23
The ASE 12.0 Adaptive OptimizerPredicate
Factoring Transformation

PFT can add easily optimized AND clauses to
queries containing OR clauses
PFT aids optimization by extracting optimizable
clauses from predicates linked with ORs and
substitutes them with clauses linked with ANDs
PFT makes more access methods available for
optimization - the more possible methods to cost
the more likely an efficient plan will be chosen
Occurs before entering the optimizer
Will not alter the original query semantics

24
The ASE 12.0 Adaptive OptimizerPFT continued

Cant be turned off
Traceon 302 will print clauses in the clause info
block that dont appear in the query
Example query
select from lineitem, part where (p_partkey
l_partkey and l_quantity gt 10)or (p_partkey
l_partkey and l_quantity lt 20)
Above query is transformed to the following
select from lineitem, partwhere ((p_partkey
l_partkey and l_quantity gt 10)or (p_partkey
l_partkey and l_quantity lt 20) )and (p_partkey
l_partkey)and (l_quantity gt 10 or
l_quantity lt 20)

25
The ASE 12.0 Adaptive OptimizerLIKE Optimization
Enhancement

Costing of LIKE predicates with a leading
wildcard has been improved
Previously the optimizer could not cost such a
query selectivity of 1.0 was all that was
possible
A pattern match search of histogram boundary
values is done.
If a match is found the weight(s) of the cells
are added to get selectivity
If no match is found selectivity is 1/of steps -
If 20 steps selectivity would be 0.05

26
The ASE 12.0 Adaptive Optimizer 50 Table Limit

Number of user tables in a query has increased
50 user tables and 14 worktables
Designed to allow for a queries with a large
number of of non-flattened subqueries
Requires more scan descriptors be configured
Short circuiting employed - as the number of
tables increases the number of tables costed at a
time by join processing decreases
2-25 4, 26-37 3, 38-50 2 - Plans may be
overlooked

27
The ASE 12.0 Adaptive Optimizer 50 Table Limit
cont.

Not designed for 50 tables in a join- however.
If you use a large number of tables in a join
expect optimization to be slower due the the
plans that must be costed.
If possible set tablecount to 4 if short
circuiting likely
Consider using an Abstract Plan if possible

28
The ASE 12.0 Adaptive Optimizer Abstract Query
Plans

A persistent readable, editable and reusable
description of a query plan
Associated with a query, not syntactically part
of the query - no special syntax, can be included
in query if desired
Can be captured and stored when a query is run
Capture mode
Can be created manually - new T-SQL extensions
Can be used by a query when it is re-running
Can be edited

29
The ASE 12.0 Adaptive Optimizer Abstract Query
Plans cont.

Designed as a method to temporarily work around
optimizer issues with upgrade or bugs while fixes
are made - In most cases wont be needed
Stored in sysquerplans
When captured will contain the entire query plan
Full plan - bypasses the optimizer
Possible to specify part of the plan and let the
optimizer complete the plan
Partial plan - Allows fine control of the
optimizer

30
The ASE 12.0 Adaptive Optimizer Abstract Query
Plans cont.

Far more granular and flexible control than the
force options offer
Can be used as a super force option

31
The Future

Caveats
Future features and functionality may change
before the release of a new ASE
In depth details are not presented at this time -
this is a high level discussion
Your input and suggestions are welcome and
important to us

32
The Future Adaptive Optimizer

Project code name - Galaxy - DSS/DW oriented
The optimizer will be rewritten - some of the
planned new features and functionality include -
New search engine - more accesses possible, star
join support
Parallel Optimization - better optimization of
PLL queries
New join costing method - more accurate join
optimization
On-Line Statistics - a method to allow column
level statistics to be gathered quickly

33
The Future Adaptive Optimizer -New Search Engine

New search engine - flexible and configurable
The search engine needs to be flexible in order
to allow different levels of optimization for
different needs - DSS vs. OLTP
The search engines levels of optimization will
be configurable
Youll be able to set the amount of searching
(plan costing) you want done
Useful when ASE is doing OLTP vs. DSS
Exhaustive vs. restrictive search degree of
pruning of plans

34
The Future Adaptive Optimizer -New Search Engine
cont.

Index selection for joins is currently done in
the prep phase- indexes costed before join
costing is done.
Predicates on columns of a composite index are
costed separately to use the index either for the
join or the SARG
Index selection for joins will now be done while
join plans are being costed (permuted)
Predicates (joins and SARGs) on the columns of a
composite index will be costed together, making
join index costing more granular

35
The Future Adaptive Optimizer -New Search Engine
cont.

More join plans possible
Number of plans costed will be controllable
Star joins supported -
optimizer will look for and cost star joins when
found
May want to use a high level of optimization to
examine all star join plans

36
The Future Adaptive Optimizer - Parallel
Optimization

New optimization for costing parallel queries
Parallel optimization is a critical factor in a
DSS/DW system
Parallel optimization will be better able to
balance response time and resource utilization
Will find an efficient PLL plan based on resource
utilization
Will take into account that I/O can be done in
parallel and cost it accordingly - cost resource
usage rather than simple I/O
Resources include - disks, CPUs, memory and
network

37
The Future Adaptive Optimizer - Parallel
Optimization cont.

Parallel optimization is configurable
Activate or inhibit parallel optimization phases
based on operating requirements
Levels of parallel optimization can be set
Configurable unit costs of each resource type
Includes disk speed, CPU, network, memory

38
The Future Adaptive Optimizer - New Join Costing
Method

New method will result in more accurate costing
of joins
By merging histograms on the fly rows that will
not participate in the join will not effect
costing
More accurate estimate of the number of rows that
will join
SARG values will be used to restrict the portion
of original and join histograms to be used in
costing

39
The Future Adaptive Optimizer - New Join Costing
Method cont.

Boundary values are merged, weights are more
granular , those values not qualified by SARG
values will be eliminated
Data skew (many values with few rows - a few
values with many rows) has a direct effect on the
total density value and thus on join costing
New costing method does not use the Total Density
value
Unless it participates in the join - a highly
duplicated value will not effect the join costing
Must have statistics on the joining columns

40
The Future Adaptive Optimizer - On-Line Statistics

Designed to gather column level statistics
quicker and when you want
Sampling is at the heart of on-line statistics
A full read of all rows of the column not
required - fewer reads, fewer I/O
Can set pages or extents to be read for sampling
Reading pages/extents is more efficient than
sampling rows
Can specify percentage (degree) of sampling to do
- 10 - 0.1 expected range - percentage will be
persistent for a column

41
The Future Adaptive Optimizer - On-Line
Statistics cont.

Manual or automatic?
Sampling can be specified at create index or
update statistics
Degree of sampling will not change unless you
specify another value
Set on-line statistics to gather stats on its own
Based on the DML stream?
Based on the number of rows in a column that
change?

42
Conclusion

The optimizer will always be evolving
The optimizer will be flexible enough to handle
new demands on it
It will better handle DSS/DW tasks
Customer input on new features and functionality
must be weighed and implemented
New theories in optimization must be examined and
incorporated if found useful
The optimizer can not stand still

43
Portal-Ready

Integration
Availability