Title: BI515: Query Processing in Adaptive Server IQ
1BI515Query Processing in Adaptive Server IQ
Steven Kirk Architect ASIQ Query Engine
Group stevek_at_sybase.com
2IQ Query Processing Topics
- IQ Query Engine
- Differences from traditional database engines
- Differences from traditional query engines
- Query engine architecture
- Query optimizer
- IQ Query Plans
- How to get query plans, and how to read them
- IQ Query Performance
3IQ Query Engine
- Different Goals than Traditional Database Engines
- Designed from the ground up to maximize the
performance of many users running ad-hoc queries - Prioritized IQ Design Goals
- Query Performance - very high priority
- Bulk Update Speed - high priority
- Small Update Speed - very low priority
4IQ Query Engine
- Different Goals Yeild a Different Design than
Traditional Databases - Column-oriented storage instead of row-oriented
- IQ reads only the data for columns referenced
- Enables efficient data compression
- Most data pages compressed when written
- In practice, fully indexed databases are usually
60 to 120 of the raw ASCII data size - Single row inserts and deletes are painfully slow
5IQ Query Engine
- Different Design than Traditional Databases
(cont.) - Column-oriented storage instead of row-oriented
- Unique row identifiers for the life of a row
- Bit-map representation for sets of row
identifiers - Multiple patented bit-mapped index types
- Assumes majority of columns will have multiple
indexes defined
6IQ Query Engine
- Differences from Traditional Query Engines
- Effects of most predicates local to a table can
be combined with each other before any tuples are
actually projected from the table. - Numerous query operators are available directly
within the indexes, including - Grouping and distincting
- Inter-column comparisons
- Hash containment
7IQ Query Engine
- Similarities with Traditional Query Engines
- Once IQ has completed all the work it can in the
indexes for each table, tuples are projected and
then joined or grouped in the same ways as in
other query engines - Whenever multiple access paths (or operator
algorithms) are available, then a cost-based
decision must be made, often based on predicate
selectivity estimates
8IQ Query Engine Architecture
IQ Server Front End Shared with
ASAnywhere Handles TDS and ODBC Parses Incoming
Statements Text Substitution Transformations Cross
-DB Decomposition (CIS) Security Checking Java
Support Stored Procedures
IQ Optimizer Predicate Inference Predicate
Selectivity Estimation Join Optimization Grouping
Algorithm Selection Subquery Optimization Index
Access Selection
IQ Run-Time Engine Prefetch Manager Predicate
Execution Tuple Projection Join
Execution Grouping Execution Sorting Subquery
Execution
9IQ Query Optimizer
- Predicate Inference
- Inference of additional predicates via transitive
closure - Inference of additional local predicates from
complex multi-table predicates - Elimination of any redundant predicates after
join order has been decided
10IQ Query Optimizer
- Predicate Selectivity Estimations
- Because so many columns are indexed, more
predicates have accurate selectivity estimates - If no complex indexes are available, or if the
predicate expression is too complicated, then we
use guesses based on the type of predicate - Users can supply their own selectivity estimates
thereby overriding the optimizers estimates
11IQ Query Optimizer
- Join Optimization
- Applicable join indexes are considered
- All possible bushy join plans that avoid
cartesian product joins are considered - Evaluated join sub-plans are stored in a hash
table - Alpha-pruning is used to limit the search space
12IQ Query Optimizer
- Join Optimization (cont.)
- Join Algorithm Selection from among
- Classic nested-loop
- Sort-merge
- Classic hash
- Nested-loop push-down
- Hash push-down semi-join
- Sort-merge push-down semi-join (new
in 12.4.2)
13IQ Query Optimizer
- Grouping Algorithm Selection
- Three different algorithms are compared for each
GROUP BY or DISTINCT operator - Sort-based
- Classic hash-based
- Index-based
14IQ Query Optimizer
- Subquery Optimization
- Some EXISTS subqueries are flattened into
distinct joins - Results of uncorrelated subqueries are used
directly in the indexes of tables in the parent
query blocks - Correlated subqueries
- Are migrated to minimize subquery re-executions
- Have a 1-value result cache, and outer references
are ordered to further minimize re-executions - Are optimized to maximize invariant operators
15IQ Query Optimizer
- Index Access Selection
- The optimizer does a cost-based selection of what
order to execute predicates and which index to
use for each predicate, considering - Predicate type
- Indexes available on that column (or columns)
- Estimated selectivity of this predicate
- Estimated number of rows passing earlier
predicates - Whether invariant predicate results can be cached
16IQ Query Plans
- IQ uses a Volcano style data-flow query engine
- Queries are transformed into an upside-down tree
of operator nodes, each of which can accept
tuples from any nodes immediately below it, and
can pass tuples up to the node above it - Each node transforms the tuple stream in some
way, such as filtering some tuples out, unioning
or joining multiple tuple streams together, etc. - A Query Plan is the textual display of a query
tree
17IQ Query Plans
The Query Tree
The Query
select s_suppkey, count(ps_partkey) fr
om partsupp, supplier where ps_suppkey
s_suppkey and ps_availqty gt 100 group by
s_suppkey
18IQ Query Plans
The Query Tree
The Query Plan
Query Plan 1 04 Group By (Hash)
Child Node 1 03 Query_Name
techwave_example_1 Output Vector 2
entries (12 data bytes) 2 . 03 Join (Hash)
Left Child Node 01 Right
Child Node 02 Valid Join Algorithms
NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left
Input Table 1 partsupp Right Input
Table 1 supplier Join Result
Constraint Many to 1 Condition 1
(partsupp.ps_suppkey supplier.s_suppkey)
Output Vector 2 entries (8 data bytes) 3
. . 01 Vertical Cursor Table Name
partsupp Table Row Count 8000
Condition 1 (Invariant)
(partsupp.ps_availqty gt 100)
Condition 1 Selectivity 0.98951089
Output Vector 2 entries (8 data bytes) 3 . .
02 Vertical Cursor Table Name
supplier Table Row Count 100
Output Vector 1 entries (4 data bytes)
19IQ Query Plans
The Query Tree
The Query Plan
Query Plan 1 04 Group By (Hash)
Child Node 1 03 Query_Name
techwave_example_1 Output Vector 2
entries (12 data bytes) 2 . 03 Join (Hash)
Left Child Node 01 Right
Child Node 02 Valid Join Algorithms
NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left
Input Table 1 partsupp Right Input
Table 1 supplier Join Result
Constraint Many to 1 Condition 1
(partsupp.ps_suppkey supplier.s_suppkey)
Output Vector 2 entries (8 data bytes) 3
. . 01 Vertical Cursor Table Name
partsupp Table Row Count 8000
Condition 1 (Invariant)
(partsupp.ps_availqty gt 100)
Condition 1 Selectivity 0.98951089
Output Vector 2 entries (8 data bytes) 3 . .
02 Vertical Cursor Table Name
supplier Table Row Count 100
Output Vector 1 entries (4 data bytes)
20IQ Query Plans
- Query plans are enabled by setting an option
- Query plans appear in the server log (.iqmsg)
file - All users query plans plus all other server
messages are interleaved in the server log file - All plan lines in the server log are prefixed
like - 2000-06-20 204523 0001 20535
- Prefix is composed of date, time, connection
id, message number for query plan lines 20535
set temporary option Query_Plan on
21IQ Query Plans
- Other useful options with query plans
- Query_Name - adds one line to the query plan with
the name string supplied makes it much easier to
find your query plan in the midst of the server
log file - Query_Detail - adds lots of extra info to the
query plan that is occasionally useful, like the
data types, distinct counts, and indexes
available on each column used - NoExec - generates a plan without actually
executing - Query_Plan_After_Run - delays query plan printing
so that actual rows projected, etc. can be
included in plans
22IQ Query Plan Elements
- Leaf Node one table or one join index
3 . . 01 Vertical Cursor Table
Name partsupp Table Row Count
8000 Condition 1 (Invariant)
(partsupp.ps_availqty gt 100)
Condition 1 Selectivity 0.98951089
Output Vector 2 entries (8 data bytes)
23IQ Query Plan Elements
- Leaf Node with Query_Plan_After_Run
3 . . 01 Vertical Cursor
Generated Result Rows 7916 Table
Name partsupp Table Row Count
8000 Condition 1 (Invariant)
(partsupp.ps_availqty gt 100)
Condition 1 Selectivity 0.98951089
Output Vector 2 entries (8 data bytes)
24IQ Query Plan Elements
3 . . 01 Vertical Cursor Table
Name partsupp Table Row Count
8000 Condition 1 (Invariant)
(partsupp.ps_availqty gt 100)
Condition 1 Selectivity 0.98951089
Condition 1 Index HNG ps_availqty_hng
Output Vector 2 entries (8 data bytes)
Output 1 partsupp.ps_partkey
Output 1 Data Type INTEGER (10, 0)
Output 1 Base Distincts 2000
Output 1 Indexes FP, HG
Output 2 partsupp.ps_suppkey
Output 2 Data Type INTEGER (10, 0)
Output 2 Base Distincts 100
Output 2 Indexes FP, HG
Column 1 partsupp.ps_availqty
Column 1 Data Type INTEGER (10, 0)
Column 1 Base Distincts 5497
Column 1 Indexes FP, HNG, HG
Index Output 1 (Projected) partsupp.ps_partkey
Index Output 2 (Projected)
partsupp.ps_suppkey
- Leaf Node with Query_Detail
25IQ Query Plan Elements
2 . 03 Join (Hash) Left Child
Node 01 Right Child Node 02
Valid Join Algorithms NLJ, NLPDJ, SMJ,
SMPDJ, HJ, HPDJ Left Input Table 1
partsupp Right Input Table 1
supplier Join Result Constraint Many
to 1 Condition 1 (partsupp.ps_suppkey
supplier.s_suppkey) Output Vector
2 entries (8 data bytes)
26IQ Query Plan Elements
- Join Node with Query_Plan_After_Run
2 . 03 Join (Hash) Left Child
Node 01 Right Child Node 02
Generated Result Rows 7916
Valid Join Algorithms NLJ, NLPDJ, SMJ, SMPDJ,
HJ, HPDJ Left Input Table 1
partsupp Right Input Table 1
supplier Join Result Constraint Many
to 1 Condition 1 (partsupp.ps_suppkey
supplier.s_suppkey) Hash actual
number of keys 100 Output Vector 2
entries (8 data bytes)
27IQ Query Plan Elements
- Join Node with Query_Detail
2 . 03 Join (Hash) Left Child
Node 01 Right Child Node 02
Valid Join Algorithms NLJ, NLPDJ, SMJ,
SMPDJ, HJ, HPDJ Left Input Table 1
partsupp Right Input Table 1
supplier Join Result Constraint Many
to 1 Condition 1 (partsupp.ps_suppkey
supplier.s_suppkey) Output Vector
2 entries (8 data bytes) Output 1
supplier.s_suppkey Output 2
partsupp.ps_partkey
28IQ Query Plan Elements
2 . 03 Join (Hash PushDown) Left
Child Node 01 Right Child Node
02 Valid Join Algorithms NLJ,
NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left Input
Table 1 partsupp Right Input Table
1 supplier Join Result Constraint
Many to 1 Condition 1
(partsupp.ps_suppkey supplier.s_suppkey)
Condition 2 (Pushed) (partsupp.ps_suppkey
IN BV(0)) Output Vector 2 entries (8
data bytes)
29IQ Query Plan Elements
- Join Node Sort-Merge Push-Down
2 . 03 Join (Sort-Merge PushDown)
Left Child Node 05 Right Child
Node 06 Valid Join Algorithms
NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left
Input Table 1 partsupp Right Input
Table 1 supplier Join Result
Constraint Many to 1 Condition 1
(partsupp.ps_suppkey supplier.s_suppkey)
Condition 2 (Pushed) (partsupp.ps_suppkey
PROBABLY_IN BV(0)) Output Vector 2
entries (8 data bytes)
30IQ Query Plans
- Other query plan node types
- Group By - grouping operator seen earlier
- Order By - sort operator used by sort-merge
joins, subquery outer reference ordering, and
sort-based grouping, as well as the ORDER BY
clause - Vertical Grouping Cursor - leaf node where
grouping is being done in the indexes - Root - only appears when Query_Detail is on site
of internal data type to external data type
conversions
31IQ Query Plans
- Other query plan node types (cont.)
- Scrolling Cursor Store - results cache that
appears at the top of a query tree to support
scrolling cursor access (see the
Force_No_Scroll_Cursors option description) - Store - results cache for invariant portions of
correlated subqueries - Subquery - executes correlated subqueries, or
uncorrelated subqueries that could not be used
directly with the parent query blocks indexes - Filter - site of non-pushable non-join predicates
32IQ Query Performance Hints
- Use a sensible schema design
- Follow the recommendations in the manual for
where to create indexes, and what types to
create, including HG indexes on all join columns - Define the IQ UNIQUE attribute on all columns
with modest expected distinct value counts (lt
10000) - Do not use tiny page sizes. For big benchmarks
on 1Gb memory machines we often use 128K pages - RTFM and RTFRN
33IQ Query Performance Experimentation
- The IQ optimizer usually does a far better job of
creating a good query plan than we can, but there
will be times when you have a specific query that
just runs too slow (IYHO). Things you can try - If there is any local predicate where the
estimated selectivity is way off, supply the
actual selectivity. - You can often influence the choice of algorithms
by experimenting with the options
Join_Preference, Aggregation_Preference, and
Index_Preference - You can force the optimizer to accept your
specified join order exactly by setting
Join_Optimization off.