BID207 Understanding Sybase IQ Optimizer (or What is that Query Plan Telling Me?)

About This Presentation

Title:

BID207 Understanding Sybase IQ Optimizer (or What is that Query Plan Telling Me?)

Description:

Subquery Optimization. Index Access Selection. Run-Time Engine. Prefetch Manager ... Query Predicate(s) - WHERE clause search arguments ... – PowerPoint PPT presentation

Number of Views:478

Avg rating:3.0/5.0

Slides: 82

Provided by: fellenm

Category:

more less

Transcript and Presenter's Notes

Title: BID207 Understanding Sybase IQ Optimizer (or What is that Query Plan Telling Me?)

1
BID207 Understanding Sybase IQ Optimizer(or What
is that Query Plan Telling Me?)
Lou StantonPrincipal System Consultantstanton_at_sy
base.com / (301) 896-1733August 2003
2
Query Processing and Query Plans

In this presentation -
Show you how Sybase IQ Processes Queries
Show methods for viewing Query Plans
Understanding and Interpreting Query Plans
How to influence Query Execution
Discuss general Performance and Tuning tips
This presentation is Sybase IQ 12.5 specific
Query plans were enhanced to provide more detail
New features in v12.5 will be addressed

3
By the End of this Presentation You Will Know

How to obtain and interpret Query Plans
How much (or little) the optimizer knows about
your data
How important indexes are to Query Performance
The importance of table and database design

4
Why are Query Plans Important?

You suspect (or know) a query runs poorly
Query plans can help you investigate the problem
Youre not sure you have the best indexes for a
query
Query plans tell you what indexes are used in a
query

5
Query Execution Phases in Sybase

Upon submission of a query
Syntax and Permissions are Checked
Performed by the ASA front-end
Query is Parsed
Broken down into object codes
Query is Optimized
Most efficient execution method is determined
Query Plan is created
Query is executed
Resources are cleaned up

6
Sybase IQ Query Execution
Server Front End Shared with ASAnywhere
(ASA) Handles Connections Parses Incoming
Statements Cross-DB Decomposition (CIS) Security
Checking Java Support Stored Procedures
Optimizer Predicate Inference Predicate
Selectivity Estimation Join Optimization Grouping
Algorithm Selection Subquery Optimization Index
Access Selection
Run-Time Engine Prefetch Manager Predicate
Execution Tuple (Row) Projection Join
Execution Grouping Execution Sorting Subquery
Execution
7
How do I See a Query Plan?

Query Plans are Generated -
In the IQ Message File
As HTML Pages (if requested) in a separate file
Set Option Commands used with Query Plans
Usually set as Temporary Options
Example
Set Temporary Option Query_Detail On
v12.5 Query_Plan option is ON by default
Query Plan appears in the IQ Message File
You may turn this option off, if desired

8
Database Options for Query Plans

Query_Plan On (default On)
Provides a Basic Query Execution plan in the IQ
Message File
May not provide enough detail
Query_Detail On (default Off)
More detailed information in a Query Plan
(Recommended)
Must be used with Query_Plan or
Query_Plan_As_HTML
Query_Plan_As_HTML On (default Off)
Creates Query Plan as an external HTML file

9
More Database Options for Query Plans

Query_Plan_After_Run On (default Off)
Delays creating the query plan until query
completes
Query_Timing On (default Off)
Includes execution times at each stage of the
query
Must be used with Query_Plan_After_Run
Query_Name query_name (default )
Prints name provided in the Query Plan or as part
of the file name (for HTML Query Plans)
NoExec On (default Off)
Creates Query Plan but does not execute the query

10
HTML Query Plans

Since HTML plans are easier to read these will be
used in this presentation
Query_Detail will also be set On for all
examples
When you specify Query_Plan_As_HTML On
File will be written in the directory with the
Database File (.db)
The File name that is produced -
Prefaced with User Name, Date and Time
Also contains the Query Name if specified
has a .html file extension and is opened with a
browser

11
Simple Example of HTML Query Plan File

Set temporary option Query_Name stanton
Select count() from Central_Fact_Table
HTML file generated in Database File Directory

12
Query Plan (HTML) Output Two Parts
1) Query Tree
2) Query Detail
13
HTML Query Plan The Query Tree
The Node Number (underlined) is a hot linkto the
Nodes below in the Query Detail
14
Query Tree and Nodes

Query Tree is a representation of the Query Plan
Consists of Nodes representing execution plan
steps
Nodes are numbered sequentially and identified by
type
Some housekeeping nodes are not displayed so
there may be gaps in the Node number sequence
Query Tree also displays the estimated number of
rows flowing up the branches between nodes
The Query Tree is displayed inverted
As you examine a Query Plan you will see the last
node at the top of the Query Tree usually the
Root Node
Leaf Nodes are typically at the bottom of the
tree (but will appear throughout the tree if
there are many tables)

15
A More Complex Query Tree
16
Nodes - General

Header Shows the Node Number and Node Type
Child Node(s) (if any)
Node(s) that feed this node
Hot link to the Child Node
(Leaf Nodes rarely have children)
Estimated Result Rows
Number of rows the optimizer thinkswill come out
of this Node
Actual Row Counts are shown withQuery_Plan_After_
Run On

17
Some Node Types (there are others)

Root
Scrolling Cursor Store
Filter
Group By
Sort
Hash
Indexes
Order By
Store
Semi-join Filter Filler
Leaf

Join
Hash (HJ)
Hash Pushdown (HJPD)
Nested Loop (NL)
Nested Loop Pushdown (NLPD)
Sort-Merge (SM)
Sort-Merge Pushdown (SMPD)
Cartesian
Subquery
Union All

18
Node Details

Some Nodes Contain a Wealth of Information
Detail in other nodes is not as important
Nodes to Concentrate On -
Root
Leaf
Join
Group By
Filter

19
Root Node (only with Query_Detail ON)

Node Detail
Child Node
Query Name (if any)
User Name (login)
Temp Space Used
Num of Users in Server
Num of CPUs
Database Options Set
Output from this node

20
Examining the Root Node

Number of Active users
Many Active users will have an effect on query
response
Optimizer may choose a different Query Plan
depending upon CPU, IQ Cache Size and Number of
Active Users
Temp Space Usage
Is the space required greater than the Temp Cache
Size?
It may indicate Temp Cache is paging to disk
Database Options may affect the Query Plan
A user or dba may have set options influencing
joins, aggregation or server behavior

21
Leaf Nodes

A Leaf Node Represents an IQ Table
Several Types of Leaf Nodes
(Regular) Leaf
Aggregation Leaf
Grouping Leaf
Order Leaf
Leaf Nodes can have a wealth of information
Table Row Count
Query Predicate(s) - WHERE clause search
arguments
Indexes available on columns used in predicates
Estimated Selectivity of predicates
Usefulness of the predicate
Index used by the predicate
Output Columns from the table

22
Leaf Node Discussion

Portion of a table used in a query is a found
set
The optimizer knows that Table A has X rows
If there are no predicates on columns in the
table we are done
We just need the column(s) from the table for the
query
Rows are passed up the Query Tree to the next
Node
If predicates exist Optimizer must determine the
best order to execute them
If only one predicate then there is no decision
With multiple predicates each one is evaluated
and assigned a USEFULNESS Score
Usefulness Score is a scale of 0 to 10 (10
Most Useful)

23
IQ Optimizer - Usefulness

Usefulness is used to rank the predicates for
execution order
Predicate with the highest value (score) is
executed first
Remaining predicates executed in descending order
of Usefulness
What determines Usefulness Score?
How well a Query Predicate will reduce the found
set (Table)
Factors influencing Usefulness Score -
Type of Query Predicate
Index(es) available for Column
Optimized FP indexes on Column
How fast the predicate can be executed

24
Leaf Node Discussion Example Query

Single table (147 MM rows) - 5 Predicates
What is the best order to execute these ??
What do you need to know to decide ??
select paid_amt
from central_fact_table
Where elig_beg_date gt '2000-01-01' and
MEMBER_GENDER 'M' and
MEMBER_DOB lt '1950-01-01' and
complication 'OUTPATIENT' and
service_rollup 'Neoplasms'

25
Optimizer Search Criteria NO indexes

Rules the optimizer uses (IQ 12.5) when NO
indexes exist for a query predicate (partial
list) Percent of table Query
Operator Returned (Estimated)
Equality ()
20
Open Range (gt) 40
Between
40
Like ()
20
Inter-column equality (t.a t.b) 30
Inter-column comparison (t.a lt t.b) 50
(You will see these estimates in Query Plans)

26
Leaf Node DiscussionIndexes on Predicates

If a useable index exists on a predicate it can
provide more information about a column
Depends on the type of predicate and the index
type(s)
Beware of some functions used on a column as they
will negate the use of the index!
Substring( t.a, 5, 5 ) fghijk
May result in a Column Scan
The optimizer will use the default rule for rows
returned (20 of the rows for an Equality search)

27
Query Predicate Operators

Equality and Inequality ( , ! )
Includes IN and NOT IN lists
Ranges ( lt, gt, lt, gt, Between)
Including NOT (!)
Like ( ) or Not Like
Contains
IS NOT NULL

28
Query Predicate Operators and IQ Indexes

Equality and Inequality ( , ! )
Includes IN and NOT IN lists
Ranges ( lt, gt, lt, gt, Between)
Including NOT (!)
Like ( ) or Not Like
Contains
IS NOT NULL

LF and HG
DATE, HNG, LF and HG
FP and WD
Null Bit
29
The Enumerated Indexes HG and LF

These indexes provide exact data distribution
counts to the optimizer -
The number of distinct values for a column and
The number of rows for each value
When columns with these indexes are used in
certain query predicates they provide the
optimizer with exact row counts that will be
found
Equality and Inequality operators (also IN List,
NOT IN List)
LF/HG indexes also help Range Searches by
providing counts of distinct values within the
Range

30
Other Indexes

The other indexes used in query predicates locate
the rows to satisfy the query
They do not provide counts of rows to the
optimizer
You will see them referenced in Leaf Nodes
If Enumerated indexes or Optimized FP indexes are
not available then Optimizer can only guess the
number of rows that satisfy the predicate
Optimizer will use the defaults (discussed
earlier)
Later you will see how to help the optimizer
avoid bad guesses

31
Query Example Leaf Node

Back to our Query
select paid_amt
from central_fact_table
Where
elig_beg_date gt '2000-01-01' and
MEMBER_GENDER 'M' and
MEMBER_DOB lt '1950-01-01' and
complication 'OUTPATIENT' and
service_rollup 'Neoplasms'

32
The Leaf Node for This Query
This is a largeLeaf Node ! (about 2 pages) We
will examine it in parts
33
Leaf Node (Part 1) Row Information
(Information here is hidden for clarity)
34
Leaf Node (Part 2) Predicate Evaluation

Each predicate for the table is evaluated
If an HG/LF index exists it provides Selectivity
Statistics (number rows returned as a Percentage
of the Table)
Usefulness Score is assigned based on found set
size
The smaller the set, the more useful the predicate

35
Each Predicate is Evaluated and Ranked
Selectivity
Usefulness
Index Used
36
Usefulness For Each Predicate Assigned

Predicates are ranked by Usefulness
Query Predicate
service_rollup 'Neoplasms' .001 9.99
MEMBER_GENDER 'M' .389 9.61
elig_beg_date gt '2000-01-01' .764 6.23
MEMBER_DOB lt '1950-01-01' .389 5.61
complication 'OUTPATIENT' .200 2.80

Selectivity
Usefulness
This looks like a Guess 20 for an Equality
37
Est. Row Count from the Table is Derived from
Selectivity Estimates Applied to all the
Predicates

Estimates are applied to each subsequent
predicate to determine the tables Found Set
Condition 1 149,178,536 0.001995 297,587
rows
Condition 2 297,587 0.389099
115,791 rows
Condition 3 115,791 0.764858
88,564 rows
Condition 4 88,564 0.389793
34,521 rows
Condition 5 34,521 .2000000
6,904 rows

38
Optimized FP Indexes (1 and 2 byte FPs)

Play important roles in 12.5 Engine
Provide distinct counts in the absence of LF/HG
index
Help with Range Searches, LIKE predicates,
push-down join conditions, and predicates
containing an expression on a column such as -
SUBSTR(t.x, 1, 2) 'TE'
Move data up the query tree faster (fewer bytes
to move)
12.5 Query Plans display usage of these FPs
12.5 has new Database Option to auto-create
Optimized FP indexes for all columns
MINIMIZE_STORAGE (default OFF)

39
Minimize_Storage Option

Late addition to 12.5 Release
Option description did not make first printing of
any IQ Docs
Look in the Release Bulletin for 12.5 for details
When set ON has the affect of IQ Unique(255)
for all columns in tables you create
As data is loaded column FP index will roll
over to 2 byte and then Flat FP index as
necessary
Highly recommended for all tables lt 1000 columns
After Upgrading to IQ 12.5
Set Option ON and create a new table
Load from old table into new table using INSERT
Drop old table and rename new table

40
Optimized FP Indexes Changes in 12.5

Optimizer no longer considers IQ Unique() value
in Create Table for approx distinct counts of a
column
12.4.x optimizer looked at the value in absence
of HG/LF index
That value could have changed over time and been
way wrong
There is no method to update the IQ Unique()
value
In 12.5 - Uses Optimized FP indexes to
Improve Query Performance
Reduce Data Storage in IQ Store
Many cases of dramatic storage reduction
Fully Indexed IQ data 50 of input data size

41
Leaf Node Predicate Information

There is more detail on predicates in the Leaf
Node
This shows how much IQ knows about the data

elig_beg_date gt '2000-01-01'

HG and Date Index on column
8578 Distinct values
1096 Values satisfy query
Column stored as FP(2)

42
Back to Our Query Example

Was the Optimizer Estimate correct?
Use Query_Plan_After_Run option
Delays the printing of the Query Plan until Query
Completes
New details appear in the Query Plan After Run
Estimated and Actual Row Counts at each Node
Estimated and Actual Temp Space Used

43
Sample Query - Query Plan After Run

Row Estimate
Off by a factor or 72.6
Temp Space Usage
Just slightly below actual
Questions
Why is row estimate wrong?
Is this a problem?

44
Row Estimates

Several Reasons for Wrong Estimates
Method used to derive estimate is not perfect
Assumes even distribution of data and
Predicates are not correlated
One Predicate was a Guess (no useable index for
column)
Importance of HG/LF indexes cannot be stressed
enough
Wrong Estimates May NOT be a Problem
Nodes using HASH tables are more sensitive to row
counts
There are row limitations and performance
implications
Sort Nodes are not sensitive to wrong estimates

45
Corrective Action and Results

Created LF Index on 149MM row table
Took 110 Seconds (Sun 64 w/ 900 MHZ cpu)
New index is more Selective (and Useful)
Estimates are much closer to reality
This is about as close as we can get for now

46
Query Plan After Index Added
47
User Supplied Estimates for Queries

For cases where no index can be used for a query
You can advise the optimizer the percentage of
rows that will be returned by the predicate
Syntax Add percentage to predicate after a
comma within parentheses()
Select count() from central_fact_table
Where (service_rollup like Poison , .003)
This example - .00003 percent of rows match
Value provided is a percentage

48
Query Plan - No Estimate Provided
49
Query Plan with User Supplied Estimate
50
Leaf Nodes Wrap Up

Show Indexes Available and Indexes Used
Order of Predicate Execution
Selectivity and Usefulness of Indexes
Estimated and Actual Row Counts from Table
Other
Output Columns from Table
Next Node Type Join Nodes

51
Join Nodes Seven Type of Joins in IQ

Hash (HJ)
Nested Loop (NL)
Sort-Merge (SM)

Hash Pushdown (HJPD)
Nested Loop Pushdown (NLPD)
Sort-Merge Pushdown (SMPD)

Cartesian Nested Loop Joins
You will see the Full Join Type Names and
Abbreviations in Query Plans
(Short description of these join types follows)

52
Nested Loop Joins

Nested Loop (NL)
Takes a join key from the larger side of the join
and compare it to each row of the smaller side
Nested Loop Pushdown (NLPD)
Take the join keys of the smaller side of the
join into an IN list on the large table

53
Hash Joins

Use Hashing Algorithms for Joins
This Join will be handled in Memory
Hash Operations are Restricted by the DB Option
MAX_HASH_ROWS default 2,500,000 (configurable)
Hash Joins are Sensitive to Row Estimates
If Estimated Row Counts are too low Thrashing can
occur
Thrashing involves Disk I/O
Server will abort a query (rollback) if thrashing
becomes excessive
Hash_Thrashing_Percent DB Option controls this
behavior
Default 10 (percent) and is configurable

54
Sort-Merge

Classic Join Algorithm
Sort both tables and match up the join keys
Used for joining two large tables
Uses Temp Cache/Store for sorting
Accompanied with Order By Nodes for both Tables
Where sorting is performing
Sort-Merge Push Down
Creates an IN List of Keys on smaller table to
probe and filter rows from the other table
Unique to Sybase IQ

55
Joins in a Simple Query
56
Join Node - General

Has Two Child Nodes
Shows Estimated Result Rows after joining
Actual Row counts with Query_Plan_After_Run
Lists the Types of Joins the Optimizer considered
Could have picked any one in the list
Shows Join Condition(s)
But NOT the indexes used!
Other information specific to the type of join

57
Sort-Merge Join Node (Query Tree)

Sort-Merge Joins are usually preceded by an
Order By Node

58
Sort-Merge Join Node (partial)

Child Nodes
Estimated Result Rows
Join Types Considered
Optimization Note
Join Constraint

59
Push Down Joins

When the Optimizer uses a Push Type join you will
see different notations in Leaf Nodes that are
probed
The Leaf for the larger side of the join is
filtered by an IN list to reduce the number of
keys that flow from that Node

60
Join Nodes - Performance Considerations

Create Primary Keys on tables
At the very least HG or Unique HG indexes on all
join keys
Use Query_Plan_After_Run to check row counts
If estimates are are way off then check for
missing indexes in Leaf Nodes and/or create
Primary Keys
Hash Join types are most sensitive to row
estimates
Sort-Merge, Nested Loop joins are not

61
Complex Query No Primary Keys Defined
Some BIG Numbers!
62
Complex Query

Query involved many join columns on large tables
Indexes were correct for all joins and predicates
The optimizer made very conservative estimates
since it could not accurately predicate final
result set from join
Actual results were much smaller than the
estimate
Creating Primary keys on tables changed the plan
Result set estimates while still on the high side
were decidedly smaller

63
Row Estimates - Before and After Primary Key
Result Set Estimate 7,355,725,745,151,829
New Estimate 24,396,774,363 300,000 xsmaller!
64
Tweaking Joins

You can influence Join Type selection
Database Option - Join_Preference
You can also influence Join Order
Database Option - Join_Optimization

65
Join_Preference Database Option
Default 0 (Optimizer Decides)

1 Prefer Sort/Merge
2 Prefer Nested Loop
3 Prefer Nested Loop PD
4 Prefer Hash
5 Prefer Hash PD
6 Prefer PreJoin
7 Prefer Sort Merge PD

-1 Avoid Sort-Merge
-2 Avoid Nested Loop
-3 Avoid Nested Loop PD
-4 Avoid Hash
-5 Avoid Hash PD
-6 Avoid PreJoin
-7 Avoid Sort-Merge PD

You can only influence join type from the list of
joins the serverconsidered in the Query Plan,
otherwise it will ignore the option (and it may
just ignore it anyway!!)
66
Join_Preference Option

Set Temporary Option Join_Preference -2
Telling Optimizer to Avoid Nested Loop Joins
Option value must be quoted as shown
For problem solving and performance testing
Use as a Temporary Option with a Query

67
Join_Optimization Option

Default On Optimizer Decides Join Order
If set Off will join tables as listed in the
FROM clause
Use this only to diagnose obscure join
performance issues
And use it as a Temporary Option as well

68
Group By Nodes

Three types of Group By Nodes
Group By (Hash) for smaller result sets
All done in memory
Group By (Sort) larger sets
Will use IQ Temp Memory/dbspace for sorts
Grouped Leaf Grouping with indexes in a Leaf
Node
For single tables when all Group By columns are
indexed
Group By (Hash)
Same database options apply as for joins
2,500,000 row max (default)

69
Group By Nodes - General

Have a Child Node
Show Estimated Result Rows
Actual rowcount if Query_Plan_After_Run On
Grouping Expressions Used
But no information on indexes used
Output columns

70
Group By Performance

Index All Grouping Expressions
LF or HG indexes needed
They will provide the best estimates for results
Group By (Hash) generally faster than Group By
(Sort)
Sort method will be used when optimizer is in
doubt
If estimates in child node are way under the
wrong method may be used
Check Estimated vs Actual counts

71
Tweaking Group By Performance

Like Joins, You can influence the optimizer
Database Option Aggregation_Preference
Default 0 Optimizer decides
1 Prefer Sort -1 Avoid Sort
2 Prefer Using Indexes -2 Avoid Indexes
3 Prefer Hash -3 Avoid Hash
Use this option with care and always as a
Temporary Option

72
Filter Nodes

Used with some expressions and Having clauses
that cannot by filtered with an index
Example
Select diag_catgy_cd, sum(paid_amt)
From inpatient
Group By diag_catgy_cd
Having sum(paid_amt) gt 10000
The Having expression must be evaluated and
filtered after the groups are formed

73
Query Plan with Filter Node
74
Node Detail Leaf Node
No HG/LF Index !
75
Node Detail Group By (Hash)
76
Node detail Filter Node
77
Index on Group By Column May Help