BID207 Understanding Sybase IQ Optimizer (or What is that Query Plan Telling Me?) - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

BID207 Understanding Sybase IQ Optimizer (or What is that Query Plan Telling Me?)

Description:

Subquery Optimization. Index Access Selection. Run-Time Engine. Prefetch Manager ... Query Predicate(s) - WHERE clause search arguments ... – PowerPoint PPT presentation

Number of Views:478
Avg rating:3.0/5.0
Slides: 82
Provided by: fellenm
Category:

less

Transcript and Presenter's Notes

Title: BID207 Understanding Sybase IQ Optimizer (or What is that Query Plan Telling Me?)


1
BID207 Understanding Sybase IQ Optimizer(or What
is that Query Plan Telling Me?)
Lou StantonPrincipal System Consultantstanton_at_sy
base.com / (301) 896-1733August 2003
2
Query Processing and Query Plans
  • In this presentation -
  • Show you how Sybase IQ Processes Queries
  • Show methods for viewing Query Plans
  • Understanding and Interpreting Query Plans
  • How to influence Query Execution
  • Discuss general Performance and Tuning tips
  • This presentation is Sybase IQ 12.5 specific
  • Query plans were enhanced to provide more detail
  • New features in v12.5 will be addressed

3
By the End of this Presentation You Will Know
  • How to obtain and interpret Query Plans
  • How much (or little) the optimizer knows about
    your data
  • How important indexes are to Query Performance
  • The importance of table and database design

4
Why are Query Plans Important?
  • You suspect (or know) a query runs poorly
  • Query plans can help you investigate the problem
  • Youre not sure you have the best indexes for a
    query
  • Query plans tell you what indexes are used in a
    query

5
Query Execution Phases in Sybase
  • Upon submission of a query
  • Syntax and Permissions are Checked
  • Performed by the ASA front-end
  • Query is Parsed
  • Broken down into object codes
  • Query is Optimized
  • Most efficient execution method is determined
  • Query Plan is created
  • Query is executed
  • Resources are cleaned up

6
Sybase IQ Query Execution
Server Front End Shared with ASAnywhere
(ASA) Handles Connections Parses Incoming
Statements Cross-DB Decomposition (CIS) Security
Checking Java Support Stored Procedures
Optimizer Predicate Inference Predicate
Selectivity Estimation Join Optimization Grouping
Algorithm Selection Subquery Optimization Index
Access Selection
Run-Time Engine Prefetch Manager Predicate
Execution Tuple (Row) Projection Join
Execution Grouping Execution Sorting Subquery
Execution
7
How do I See a Query Plan?
  • Query Plans are Generated -
  • In the IQ Message File
  • As HTML Pages (if requested) in a separate file
  • Set Option Commands used with Query Plans
  • Usually set as Temporary Options
  • Example
  • Set Temporary Option Query_Detail On
  • v12.5 Query_Plan option is ON by default
  • Query Plan appears in the IQ Message File
  • You may turn this option off, if desired

8
Database Options for Query Plans
  • Query_Plan On (default On)
  • Provides a Basic Query Execution plan in the IQ
    Message File
  • May not provide enough detail
  • Query_Detail On (default Off)
  • More detailed information in a Query Plan
    (Recommended)
  • Must be used with Query_Plan or
    Query_Plan_As_HTML
  • Query_Plan_As_HTML On (default Off)
  • Creates Query Plan as an external HTML file

9
More Database Options for Query Plans
  • Query_Plan_After_Run On (default Off)
  • Delays creating the query plan until query
    completes
  • Query_Timing On (default Off)
  • Includes execution times at each stage of the
    query
  • Must be used with Query_Plan_After_Run
  • Query_Name query_name (default )
  • Prints name provided in the Query Plan or as part
    of the file name (for HTML Query Plans)
  • NoExec On (default Off)
  • Creates Query Plan but does not execute the query

10
HTML Query Plans
  • Since HTML plans are easier to read these will be
    used in this presentation
  • Query_Detail will also be set On for all
    examples
  • When you specify Query_Plan_As_HTML On
  • File will be written in the directory with the
    Database File (.db)
  • The File name that is produced -
  • Prefaced with User Name, Date and Time
  • Also contains the Query Name if specified
  • has a .html file extension and is opened with a
    browser

11
Simple Example of HTML Query Plan File
  • Set temporary option Query_Name stanton
  • Select count() from Central_Fact_Table
  • HTML file generated in Database File Directory

12
Query Plan (HTML) Output Two Parts
1) Query Tree
2) Query Detail
13
HTML Query Plan The Query Tree
The Node Number (underlined) is a hot linkto the
Nodes below in the Query Detail
14
Query Tree and Nodes
  • Query Tree is a representation of the Query Plan
  • Consists of Nodes representing execution plan
    steps
  • Nodes are numbered sequentially and identified by
    type
  • Some housekeeping nodes are not displayed so
    there may be gaps in the Node number sequence
  • Query Tree also displays the estimated number of
    rows flowing up the branches between nodes
  • The Query Tree is displayed inverted
  • As you examine a Query Plan you will see the last
    node at the top of the Query Tree usually the
    Root Node
  • Leaf Nodes are typically at the bottom of the
    tree (but will appear throughout the tree if
    there are many tables)

15
A More Complex Query Tree
16
Nodes - General
  • Header Shows the Node Number and Node Type
  • Child Node(s) (if any)
  • Node(s) that feed this node
  • Hot link to the Child Node
  • (Leaf Nodes rarely have children)
  • Estimated Result Rows
  • Number of rows the optimizer thinkswill come out
    of this Node
  • Actual Row Counts are shown withQuery_Plan_After_
    Run On

17
Some Node Types (there are others)
  • Root
  • Scrolling Cursor Store
  • Filter
  • Group By
  • Sort
  • Hash
  • Indexes
  • Order By
  • Store
  • Semi-join Filter Filler
  • Leaf
  • Join
  • Hash (HJ)
  • Hash Pushdown (HJPD)
  • Nested Loop (NL)
  • Nested Loop Pushdown (NLPD)
  • Sort-Merge (SM)
  • Sort-Merge Pushdown (SMPD)
  • Cartesian
  • Subquery
  • Union All

18
Node Details
  • Some Nodes Contain a Wealth of Information
  • Detail in other nodes is not as important
  • Nodes to Concentrate On -
  • Root
  • Leaf
  • Join
  • Group By
  • Filter

19
Root Node (only with Query_Detail ON)
  • Node Detail
  • Child Node
  • Query Name (if any)
  • User Name (login)
  • Temp Space Used
  • Num of Users in Server
  • Num of CPUs
  • Database Options Set
  • Output from this node

20
Examining the Root Node
  • Number of Active users
  • Many Active users will have an effect on query
    response
  • Optimizer may choose a different Query Plan
    depending upon CPU, IQ Cache Size and Number of
    Active Users
  • Temp Space Usage
  • Is the space required greater than the Temp Cache
    Size?
  • It may indicate Temp Cache is paging to disk
  • Database Options may affect the Query Plan
  • A user or dba may have set options influencing
    joins, aggregation or server behavior

21
Leaf Nodes
  • A Leaf Node Represents an IQ Table
  • Several Types of Leaf Nodes
  • (Regular) Leaf
  • Aggregation Leaf
  • Grouping Leaf
  • Order Leaf
  • Leaf Nodes can have a wealth of information
  • Table Row Count
  • Query Predicate(s) - WHERE clause search
    arguments
  • Indexes available on columns used in predicates
  • Estimated Selectivity of predicates
  • Usefulness of the predicate
  • Index used by the predicate
  • Output Columns from the table

22
Leaf Node Discussion
  • Portion of a table used in a query is a found
    set
  • The optimizer knows that Table A has X rows
  • If there are no predicates on columns in the
    table we are done
  • We just need the column(s) from the table for the
    query
  • Rows are passed up the Query Tree to the next
    Node
  • If predicates exist Optimizer must determine the
    best order to execute them
  • If only one predicate then there is no decision
  • With multiple predicates each one is evaluated
    and assigned a USEFULNESS Score
  • Usefulness Score is a scale of 0 to 10 (10
    Most Useful)

23
IQ Optimizer - Usefulness
  • Usefulness is used to rank the predicates for
    execution order
  • Predicate with the highest value (score) is
    executed first
  • Remaining predicates executed in descending order
    of Usefulness
  • What determines Usefulness Score?
  • How well a Query Predicate will reduce the found
    set (Table)
  • Factors influencing Usefulness Score -
  • Type of Query Predicate
  • Index(es) available for Column
  • Optimized FP indexes on Column
  • How fast the predicate can be executed

24
Leaf Node Discussion Example Query
  • Single table (147 MM rows) - 5 Predicates
  • What is the best order to execute these ??
  • What do you need to know to decide ??
  • select paid_amt
  • from central_fact_table
  • Where elig_beg_date gt '2000-01-01' and
  • MEMBER_GENDER 'M' and
  • MEMBER_DOB lt '1950-01-01' and
  • complication 'OUTPATIENT' and
  • service_rollup 'Neoplasms'

25
Optimizer Search Criteria NO indexes
  • Rules the optimizer uses (IQ 12.5) when NO
    indexes exist for a query predicate (partial
    list) Percent of table Query
    Operator Returned (Estimated)
  • Equality ()
    20
  • Open Range (gt) 40
  • Between
    40
  • Like ()
    20
  • Inter-column equality (t.a t.b) 30
  • Inter-column comparison (t.a lt t.b) 50
  • (You will see these estimates in Query Plans)

26
Leaf Node DiscussionIndexes on Predicates
  • If a useable index exists on a predicate it can
    provide more information about a column
  • Depends on the type of predicate and the index
    type(s)
  • Beware of some functions used on a column as they
    will negate the use of the index!
  • Substring( t.a, 5, 5 ) fghijk
  • May result in a Column Scan
  • The optimizer will use the default rule for rows
    returned (20 of the rows for an Equality search)

27
Query Predicate Operators
  • Equality and Inequality ( , ! )
  • Includes IN and NOT IN lists
  • Ranges ( lt, gt, lt, gt, Between)
  • Including NOT (!)
  • Like ( ) or Not Like
  • Contains
  • IS NOT NULL

28
Query Predicate Operators and IQ Indexes
  • Equality and Inequality ( , ! )
  • Includes IN and NOT IN lists
  • Ranges ( lt, gt, lt, gt, Between)
  • Including NOT (!)
  • Like ( ) or Not Like
  • Contains
  • IS NOT NULL

LF and HG
DATE, HNG, LF and HG
FP and WD
Null Bit
29
The Enumerated Indexes HG and LF
  • These indexes provide exact data distribution
    counts to the optimizer -
  • The number of distinct values for a column and
  • The number of rows for each value
  • When columns with these indexes are used in
    certain query predicates they provide the
    optimizer with exact row counts that will be
    found
  • Equality and Inequality operators (also IN List,
    NOT IN List)
  • LF/HG indexes also help Range Searches by
    providing counts of distinct values within the
    Range

30
Other Indexes
  • The other indexes used in query predicates locate
    the rows to satisfy the query
  • They do not provide counts of rows to the
    optimizer
  • You will see them referenced in Leaf Nodes
  • If Enumerated indexes or Optimized FP indexes are
    not available then Optimizer can only guess the
    number of rows that satisfy the predicate
  • Optimizer will use the defaults (discussed
    earlier)
  • Later you will see how to help the optimizer
    avoid bad guesses

31
Query Example Leaf Node
  • Back to our Query
  • select paid_amt
  • from central_fact_table
  • Where
  • elig_beg_date gt '2000-01-01' and
  • MEMBER_GENDER 'M' and
  • MEMBER_DOB lt '1950-01-01' and
  • complication 'OUTPATIENT' and
  • service_rollup 'Neoplasms'

32
The Leaf Node for This Query
This is a largeLeaf Node ! (about 2 pages) We
will examine it in parts
33
Leaf Node (Part 1) Row Information
(Information here is hidden for clarity)
34
Leaf Node (Part 2) Predicate Evaluation
  • Each predicate for the table is evaluated
  • If an HG/LF index exists it provides Selectivity
    Statistics (number rows returned as a Percentage
    of the Table)
  • Usefulness Score is assigned based on found set
    size
  • The smaller the set, the more useful the predicate

35
Each Predicate is Evaluated and Ranked
Selectivity
Usefulness
Index Used
36
Usefulness For Each Predicate Assigned
  • Predicates are ranked by Usefulness
  • Query Predicate
  • service_rollup 'Neoplasms' .001 9.99
  • MEMBER_GENDER 'M' .389 9.61
  • elig_beg_date gt '2000-01-01' .764 6.23
  • MEMBER_DOB lt '1950-01-01' .389 5.61
  • complication 'OUTPATIENT' .200 2.80

Selectivity
Usefulness
This looks like a Guess 20 for an Equality
37
Est. Row Count from the Table is Derived from
Selectivity Estimates Applied to all the
Predicates
  • Estimates are applied to each subsequent
    predicate to determine the tables Found Set
  • Condition 1 149,178,536 0.001995 297,587
    rows
  • Condition 2 297,587 0.389099
    115,791 rows
  • Condition 3 115,791 0.764858
    88,564 rows
  • Condition 4 88,564 0.389793
    34,521 rows
  • Condition 5 34,521 .2000000
    6,904 rows

38
Optimized FP Indexes (1 and 2 byte FPs)
  • Play important roles in 12.5 Engine
  • Provide distinct counts in the absence of LF/HG
    index
  • Help with Range Searches, LIKE predicates,
    push-down join conditions, and predicates
    containing an expression on a column such as -
    SUBSTR(t.x, 1, 2) 'TE'
  • Move data up the query tree faster (fewer bytes
    to move)
  • 12.5 Query Plans display usage of these FPs
  • 12.5 has new Database Option to auto-create
    Optimized FP indexes for all columns
  • MINIMIZE_STORAGE (default OFF)

39
Minimize_Storage Option
  • Late addition to 12.5 Release
  • Option description did not make first printing of
    any IQ Docs
  • Look in the Release Bulletin for 12.5 for details
  • When set ON has the affect of IQ Unique(255)
    for all columns in tables you create
  • As data is loaded column FP index will roll
    over to 2 byte and then Flat FP index as
    necessary
  • Highly recommended for all tables lt 1000 columns
  • After Upgrading to IQ 12.5
  • Set Option ON and create a new table
  • Load from old table into new table using INSERT
  • Drop old table and rename new table

40
Optimized FP Indexes Changes in 12.5
  • Optimizer no longer considers IQ Unique() value
    in Create Table for approx distinct counts of a
    column
  • 12.4.x optimizer looked at the value in absence
    of HG/LF index
  • That value could have changed over time and been
    way wrong
  • There is no method to update the IQ Unique()
    value
  • In 12.5 - Uses Optimized FP indexes to
  • Improve Query Performance
  • Reduce Data Storage in IQ Store
  • Many cases of dramatic storage reduction
  • Fully Indexed IQ data 50 of input data size

41
Leaf Node Predicate Information
  • There is more detail on predicates in the Leaf
    Node
  • This shows how much IQ knows about the data

elig_beg_date gt '2000-01-01'
  • HG and Date Index on column
  • 8578 Distinct values
  • 1096 Values satisfy query
  • Column stored as FP(2)

42
Back to Our Query Example
  • Was the Optimizer Estimate correct?
  • Use Query_Plan_After_Run option
  • Delays the printing of the Query Plan until Query
    Completes
  • New details appear in the Query Plan After Run
  • Estimated and Actual Row Counts at each Node
  • Estimated and Actual Temp Space Used

43
Sample Query - Query Plan After Run
  • Row Estimate
  • Off by a factor or 72.6
  • Temp Space Usage
  • Just slightly below actual
  • Questions
  • Why is row estimate wrong?
  • Is this a problem?

44
Row Estimates
  • Several Reasons for Wrong Estimates
  • Method used to derive estimate is not perfect
  • Assumes even distribution of data and
  • Predicates are not correlated
  • One Predicate was a Guess (no useable index for
    column)
  • Importance of HG/LF indexes cannot be stressed
    enough
  • Wrong Estimates May NOT be a Problem
  • Nodes using HASH tables are more sensitive to row
    counts
  • There are row limitations and performance
    implications
  • Sort Nodes are not sensitive to wrong estimates

45
Corrective Action and Results
  • Created LF Index on 149MM row table
  • Took 110 Seconds (Sun 64 w/ 900 MHZ cpu)
  • New index is more Selective (and Useful)
  • Estimates are much closer to reality
  • This is about as close as we can get for now

46
Query Plan After Index Added
47
User Supplied Estimates for Queries
  • For cases where no index can be used for a query
  • You can advise the optimizer the percentage of
    rows that will be returned by the predicate
  • Syntax Add percentage to predicate after a
    comma within parentheses()
  • Select count() from central_fact_table
  • Where (service_rollup like Poison , .003)
  • This example - .00003 percent of rows match
  • Value provided is a percentage

48
Query Plan - No Estimate Provided
49
Query Plan with User Supplied Estimate
50
Leaf Nodes Wrap Up
  • Show Indexes Available and Indexes Used
  • Order of Predicate Execution
  • Selectivity and Usefulness of Indexes
  • Estimated and Actual Row Counts from Table
  • Other
  • Output Columns from Table
  • Next Node Type Join Nodes

51
Join Nodes Seven Type of Joins in IQ
  • Hash (HJ)
  • Nested Loop (NL)
  • Sort-Merge (SM)
  • Hash Pushdown (HJPD)
  • Nested Loop Pushdown (NLPD)
  • Sort-Merge Pushdown (SMPD)
  • Cartesian Nested Loop Joins
  • You will see the Full Join Type Names and
    Abbreviations in Query Plans
  • (Short description of these join types follows)

52
Nested Loop Joins
  • Nested Loop (NL)
  • Takes a join key from the larger side of the join
    and compare it to each row of the smaller side
  • Nested Loop Pushdown (NLPD)
  • Take the join keys of the smaller side of the
    join into an IN list on the large table

53
Hash Joins
  • Use Hashing Algorithms for Joins
  • This Join will be handled in Memory
  • Hash Operations are Restricted by the DB Option
  • MAX_HASH_ROWS default 2,500,000 (configurable)
  • Hash Joins are Sensitive to Row Estimates
  • If Estimated Row Counts are too low Thrashing can
    occur
  • Thrashing involves Disk I/O
  • Server will abort a query (rollback) if thrashing
    becomes excessive
  • Hash_Thrashing_Percent DB Option controls this
    behavior
  • Default 10 (percent) and is configurable

54
Sort-Merge
  • Classic Join Algorithm
  • Sort both tables and match up the join keys
  • Used for joining two large tables
  • Uses Temp Cache/Store for sorting
  • Accompanied with Order By Nodes for both Tables
  • Where sorting is performing
  • Sort-Merge Push Down
  • Creates an IN List of Keys on smaller table to
    probe and filter rows from the other table
  • Unique to Sybase IQ

55
Joins in a Simple Query
56
Join Node - General
  • Has Two Child Nodes
  • Shows Estimated Result Rows after joining
  • Actual Row counts with Query_Plan_After_Run
  • Lists the Types of Joins the Optimizer considered
  • Could have picked any one in the list
  • Shows Join Condition(s)
  • But NOT the indexes used!
  • Other information specific to the type of join

57
Sort-Merge Join Node (Query Tree)
  • Sort-Merge Joins are usually preceded by an
    Order By Node

58
Sort-Merge Join Node (partial)
  • Child Nodes
  • Estimated Result Rows
  • Join Types Considered
  • Optimization Note
  • Join Constraint

59
Push Down Joins
  • When the Optimizer uses a Push Type join you will
    see different notations in Leaf Nodes that are
    probed
  • The Leaf for the larger side of the join is
    filtered by an IN list to reduce the number of
    keys that flow from that Node

60
Join Nodes - Performance Considerations
  • Create Primary Keys on tables
  • At the very least HG or Unique HG indexes on all
    join keys
  • Use Query_Plan_After_Run to check row counts
  • If estimates are are way off then check for
    missing indexes in Leaf Nodes and/or create
    Primary Keys
  • Hash Join types are most sensitive to row
    estimates
  • Sort-Merge, Nested Loop joins are not

61
Complex Query No Primary Keys Defined
Some BIG Numbers!
62
Complex Query
  • Query involved many join columns on large tables
  • Indexes were correct for all joins and predicates
  • The optimizer made very conservative estimates
    since it could not accurately predicate final
    result set from join
  • Actual results were much smaller than the
    estimate
  • Creating Primary keys on tables changed the plan
  • Result set estimates while still on the high side
    were decidedly smaller

63
Row Estimates - Before and After Primary Key
Result Set Estimate 7,355,725,745,151,829
New Estimate 24,396,774,363 300,000 xsmaller!
64
Tweaking Joins
  • You can influence Join Type selection
  • Database Option - Join_Preference
  • You can also influence Join Order
  • Database Option - Join_Optimization

65
Join_Preference Database Option
Default 0 (Optimizer Decides)
  • 1 Prefer Sort/Merge
  • 2 Prefer Nested Loop
  • 3 Prefer Nested Loop PD
  • 4 Prefer Hash
  • 5 Prefer Hash PD
  • 6 Prefer PreJoin
  • 7 Prefer Sort Merge PD
  • -1 Avoid Sort-Merge
  • -2 Avoid Nested Loop
  • -3 Avoid Nested Loop PD
  • -4 Avoid Hash
  • -5 Avoid Hash PD
  • -6 Avoid PreJoin
  • -7 Avoid Sort-Merge PD

You can only influence join type from the list of
joins the serverconsidered in the Query Plan,
otherwise it will ignore the option (and it may
just ignore it anyway!!)
66
Join_Preference Option
  • Set Temporary Option Join_Preference -2
  • Telling Optimizer to Avoid Nested Loop Joins
  • Option value must be quoted as shown
  • For problem solving and performance testing
  • Use as a Temporary Option with a Query

67
Join_Optimization Option
  • Default On Optimizer Decides Join Order
  • If set Off will join tables as listed in the
    FROM clause
  • Use this only to diagnose obscure join
    performance issues
  • And use it as a Temporary Option as well

68
Group By Nodes
  • Three types of Group By Nodes
  • Group By (Hash) for smaller result sets
  • All done in memory
  • Group By (Sort) larger sets
  • Will use IQ Temp Memory/dbspace for sorts
  • Grouped Leaf Grouping with indexes in a Leaf
    Node
  • For single tables when all Group By columns are
    indexed
  • Group By (Hash)
  • Same database options apply as for joins
  • 2,500,000 row max (default)

69
Group By Nodes - General
  • Have a Child Node
  • Show Estimated Result Rows
  • Actual rowcount if Query_Plan_After_Run On
  • Grouping Expressions Used
  • But no information on indexes used
  • Output columns

70
Group By Performance
  • Index All Grouping Expressions
  • LF or HG indexes needed
  • They will provide the best estimates for results
  • Group By (Hash) generally faster than Group By
    (Sort)
  • Sort method will be used when optimizer is in
    doubt
  • If estimates in child node are way under the
    wrong method may be used
  • Check Estimated vs Actual counts

71
Tweaking Group By Performance
  • Like Joins, You can influence the optimizer
  • Database Option Aggregation_Preference
  • Default 0 Optimizer decides
  • 1 Prefer Sort -1 Avoid Sort
  • 2 Prefer Using Indexes -2 Avoid Indexes
  • 3 Prefer Hash -3 Avoid Hash
  • Use this option with care and always as a
    Temporary Option

72
Filter Nodes
  • Used with some expressions and Having clauses
    that cannot by filtered with an index
  • Example
  • Select diag_catgy_cd, sum(paid_amt)
  • From inpatient
  • Group By diag_catgy_cd
  • Having sum(paid_amt) gt 10000
  • The Having expression must be evaluated and
    filtered after the groups are formed

73
Query Plan with Filter Node
74
Node Detail Leaf Node
No HG/LF Index !
75
Node Detail Group By (Hash)
76
Node detail Filter Node
77
Index on Group By Column May Help
  • Build on index on the Grouping Expression
  • HG Index is appropriate
  • Rerun The Query

78
New Query Plan (new HG Index)
  • Grouped Leaf instead of Leaf Node and Group
    By(hash)

79
Grouped Leaf Node and Filter Node
80
You Should be Dangerous Now
  • We have covered Query Plans 101 for Sybase IQ
  • Now you know
  • Indexes are important (always have been)
  • Primary keys are important (more than ever)
  • Optimized FP indexes are important (even more
    than ever)
  • The optimizer is conservative and can be
    influenced
  • And there is more, but we just dont have the
    time

81
Questions
  • Visit the IQ Technical Team on the Boardwalk
  • Lou Stanton
  • stanton_at_sybase.com
Write a Comment
User Comments (0)
About PowerShow.com