Title: Title: Arial 28pt.
1ASE135 Query processing for Mixed Workload
Environments using ASE (Galaxy Release)
Joydeep Das Staff Software Engineer Joydeep.Das_at_sy
base.com August 15-19, 2004
2The Enterprise. Unwired.
3The Enterprise. Unwired.
Industry and Cross Platform Solutions
Unwire People
Unwire Information
Manage Information
- Adaptive Server Enterprise
- Adaptive Server Anywhere
- Sybase IQ
- Dynamic Archive
- Dynamic ODS
- Replication Server
- OpenSwitch
- Mirror Activator
- PowerDesigner
- Connectivity Options
- EAServer
- Industry Warehouse Studio
- Unwired Accelerator
- Unwired Orchestrator
- Unwired Toolkit
- Enterprise Portal
- Real Time Data Services
- SQL Anywhere Studio
- M-Business Anywhere
- Pylon Family (Mobile Email)
- Mobile Sales
- XcelleNet Frontline Solutions
- PocketBuilder
- PowerBuilder Family
- AvantGo
Sybase Workspace
4Agenda
- Query Processing (QP) for Mixed Workload
Environments - Motivation
- Enhanced Query Processing solutions in Galaxy
- Feedback and QA
5Agenda
- Query Processing for Mixed Workload Environments
- Motivation
- Enhanced Query Processing solutions in Galaxy
Release - Feedback and QA
6Motivation ASEs Current QP Capabilities
- Extremely well-suited for demanding OLTP
environments - Good plans for all types of transactional queries
- Meets some demands of Operational DSS
- Good plans for several types of complex queries
- Meets some demands of mixed workload environments
- (OLTP Operational DSS)
- Good plans for mixed workloads if system
appropriately configured for - Cache characteristics Large I/Os, LRU/MRU,
bindings - Parallel operations table scans, index scans,
joins
7Motivation - Your Requests
- Need better mixed workload capabilities from ASE
- Need better performance for specific
transactional queries - Need better performance for highly complex
queries (reporting apps) - Need efficient query processing for large data
sets (VLDB) - Need advanced database administration
capabilities - More intuitive diagnostics to track problem
queries - Useful tools to monitor query performance
characteristics
8Motivation - Our Response
- We have reviewed your inputs and requests
- To understand and appreciate your challenges
- To provide quick solutions wherever possible
- We have analyzed current and future industry
trends - Based on your inputs and our analysis, we now
offer - A rich and comprehensive set of query processing
enhancements in the ASE Galaxy Release
9Agenda
- Query Processing for Mixed Workload Environments
- Motivation
- Enhanced Query Processing solutions in Galaxy
- Feedback and QA
10Enhanced Query Processing in Galaxy
- Enhancements for specific requests
- Tackles Complex Queries
- Handles Large Data Sets
- Offers Goal Oriented Optimization
- Offers Auto Update Statistics
- Offers Advanced Monitoring Diagnosis
- And much more
11Enhanced Query Processing in Galaxy
- Enhancements for specific requests
- Tackles Complex Queries
- Handles Large Data Sets
- Offers Goal Oriented Optimization
- Offers Auto Update Statistics
- Offers Advanced Monitoring Diagnosis
- And much more
12Enhancements For Specific Requests
- Index Usage with mismatched data types
- select from Parts, Lineitem where
Parts.p_partkey Lineitem.l_partkey - - Parts.p_partkey is float
- - Lineitem.l_partkey is int and has an index on
it - - Parts is small (10,000 rows), Lineitem is
larger (300,000 rows) - Galaxy is 3.7 times faster
- Galaxy QP ensures that
- - an index scan on Lineitem.l_partkey is selected
- - the index scan is positioned based on values
from Parts.p_partkey
13Enhancements For Specific Requests
- Accurate Join costing
- select count() from Lineitem, Orders
- where Lineitem.l_orderkey Orders.o_orderkey
- and Lineitem.l_orderkey 3
- - 60 of Lineitem.l_orderkey 3
- - Lineitem has 30,000 rows Orders has 200 rows
- - Highly duplicated value from l_orderkey results
in data skew - Galaxy is 1.2 times faster
- Galaxy QP estimates the join selectivity very
accurately - join selectivity is based on dynamically computed
join histograms - translates to optimal join order Lineitem, Orders
14Enhancements For Specific Requests
- Fewer Materializations avoiding work tables for
sorting - select count(), l_orderkey from lineitem
- group by l_orderkey
- order by l_orderkey
- - Lineitem has 300,000 rows
- Galaxy is 3.7 times faster
- Galaxy QP avoids sorting for ORDER BY by
remembering the ordering from GROUP BY - Similar materializations also avoided for
union-in-view queries
15Enhancements For Specific Requests
- Unlimited terms in IN-list
- select from tab
- where col1 in (1,2,3,..16000)
- Galaxy QP will successfully process and execute
this query - Only limited by size of procedure cache
16Enhancements For Specific Requests
- Removed column limits with grouped aggregation
- select col1, . col50, MAX(col52) from tab
- GROUP BY col1, . Col50
- Galaxy QP will process and execute it
successfully if widths of col1 col50 plus the
result of the MAX is less than 16K - Max width increased to 16K, independent of page
size - Limit of 31 columns in a GROUP BY clause removed
17Enhanced Query Processing in Galaxy
- Enhancements for specific requests
- Tackles Complex Queries
- Handles Large Data Sets
- Offers Goal Oriented Optimization
- Offers Auto Update Statistics
- Offers Advanced Monitoring Diagnosis
- And much more
18Tackles Complex Queries
- Improved Union operator processing
- select l_orderkey, l_shipdate, l_commitdate,
l_receiptdate - from Lineitem where l_orderkey lt 10000
- union
- select l_orderkey, l_shipdate, l_commitdate,
l_receiptdate - from Lineitem where l_orderkey gt 50
- - Lineitem has 300,000 rows
- - No index on l_orderkey
- Galaxy is 1.5 times faster
- Galaxy QP uses new feature hash based union
distinct
19Tackles Complex Queries
- Improved Aggregations
- select sum(l_quantity), max(l_discount),
l_suppkey, l_partkey - from Lineitem group by l_suppkey, l_partkey
- - Lineitem has 300,000 rows
- - Index on l_suppkey, l_partkey
- Galaxy is 2.8 times faster
- Galaxy QP uses on-the-fly grouping (hash based)
- - no work tables created
20Tackles Complex Queries
- Index Union
- select from lineitem
- where l_orderkey between 10 and 20 OR
- l_suppkey between 40 and 50
- - Lineitem has 300,000 rows
- - Indices available on l_orderkey and l_suppkey
- Galaxy is 1.1 times faster
- Uses new operator hash based union distinct
21Tackles Complex Queries
- Index Intersection
- select Lineitem.l_orderkey, Lineitem.l_suppkey
- from Lineitem where Lineitem.l_orderkey gt 1000
and Lineitem.l_ lt 10000 - and Lineitem.l_suppkey gt 200 and
Lineitem.l_suppkey lt 2000 - - Lineitem has 300,000 rows
- - Indexes available on Lineitem.l_orderkey and
Lineitem.l_suppkey - Galaxy is at least 1.1 times faster
- Galaxy QP ensures that both indexes scanned and
hash-intersected - - may be selected with Index Union if AND/OR
present in query - - may be selected in STAR queries if FACT table
has indexes for each DIMENSION table joining
column
22Tackles Complex Queries
- Large number of tables
- select from T1, T2, .. T50
- where T1.a T2.a AND T2.a T3.a AND
.. T49.a T50.a - Early greedy strategy combined with accurate
cost-based plan pruning helps in obtaining good
plans quickly - Built-in timer in Galaxy stops long running
optimization - - the best plan chosen thus far is selected
- - extra optimization may not yield a
significantly better plan - - timer limit is user configurable at
server/session/query level - - default 10 of internal query processing
time given to optimization
23Tackles Complex Queries
- Superior plans for Star/Snowflake schema joins
- STAR, SNOWFLAKE schemas have
- - Large FACT table, smaller (relative to FACT)
DIMENSION tables - - Joins only between FACT and DIMENSION tables
- - Index available on joining columns
Star schema
Snowflake schema
24Tackles Complex Queries
- Superior plans for Star/Snowflake schema joins
- select from lineitem, orders, parts, supplier
- where lineitem.l_orderkey orders.o_orderkey
- AND lineitem.l_partkey parts.p_partkey
- AND lineitem.l_suppkey supplier.s_suppkey
- - FACT table lineitem has 300,000 rows
- - DIMENSION tables orders, parts, suppliers have
10 rows each - Galaxy is at least 2 times faster
- recognizes Star/Snowfalke patterns, places FACT
in the end - May use Index intersection Lineitem joining
columns have indices
25Enhanced Query Processing in Galaxy
- Enhancements for specific requests
- Tackles Complex Queries
- Handles Large Data Sets
- Offers Goal Oriented Optimization
- Offers Auto Update Statistics
- Offers Advanced Monitoring Diagnosis
- And much more
26Handles Large Data Sets
- Improved join algorithms
- select count(), count(c.ProcessID),
count(l.ProcessID) - from Clients c, Clients l
- where (isnull(l.ClientName,'x')
rtrim(lClientID) - (isnull(c.ClientName,'x') rtrim(c.ClientID))
-
- - No indices, expression join Client is a large
table ( gt 300,000 rows) - Galaxy is more than 100,000 times faster!
- Galaxy QP selects Hash join method
- Galaxy may select merge join when proper ordering
available
27Handles Large Data Sets
- Support for data partitioning strategies
- Round robin partitioning (available in
pre-Galaxy) - - Data set distribution in round robin fashion
- - No semantics associated, data row can end up in
any partition - Hash based partitioning (NEW)
- - Data set distributed according to a hash
function - - Semantics associated data rows belong to
specific partitions - Range partitioning (NEW)
- - Data set distributed according to an attributes
value range - - Semantics associated data rows belong to
specific partitions
28Handles Large Data Sets
- Intelligent Partition elimination
- create table customer(c_custkey int, c_name
varchar(25), - c_region varchar(4), c_acctbal double) partition
by range (c_custkey) - (PC1 values lt (100), PC2 values lt (200), PC3
values lt (300)) - - data partitioned by range customer has
10,000 rows - select from customer where c_custkey
- between 50 and 150 or c_custkey lt 10
- Galaxy is 1.25 times faster
- Galaxy QP Partition Elimination disqualifies
partitions (PC3 here) - can handle complex AND/OR expression
- can handle variables unknown at compile time
29Handles Large Data Sets
- Directed joins with partitioned data
- create table orders(o_orderkey int, o_custkey
int, - o_orderstatus char(1), o_price double,
o_orderdate smalldatetime) - partition by range (o_custkey)
- (PO1 values lt (100), PO2 values lt (200), PO3
values lt (300)) - - Orders, Customers have 10,000 rows both
partitioned by range - select from customers, orders where c_custkey
o_custkey and c_custkey gt 110 - Galaxy is 2.5 times faster
- Galaxy QP eliminates partitions PC1 and PO1 from
customers and orders. Only corresponding
partitions PC2/PO2, PC3/PO3 are joined
30Handles Large Data Sets
- Parallelism
- select sum(o_price), o_custkey from orders
- where o_orderdate gt 01/31/2003
- group by o_custkey
- - orders has 25,000 rows partitioned by
o_custkey - Almost all SQL operations can be done in parallel
- Galaxy is 1.2 times faster
- Galaxy QP groups each partition separately, but
merges results concurrently
31Handles Large Data Sets
- Parallelism
- SQL operations can be done in parallel and
pipelined to work in tandem - select sum(o_price), c_region from orders,
customers - where o_orderdate gt 01/31/2003 and
- c_custkey o_custkey group by c_region
- Query has 2 partition join of customers and
orders followed by Grouping, all of which are
done in tandem
Pipe 1
Pipe 2
32Enhanced Query Processing in Galaxy
- Enhancements for specific requests
- Tackles Complex Queries
- Handles Large Data Sets
- Offers Goal Oriented Optimization
- Offers Auto Update Statistics
- Offers Advanced Monitoring Diagnosis
- And much more
33Offers Goal Oriented Optimization
- Applications often have varying query processing
needs - Fast response time for first few rows of the
result set - Fast response time for queries in OLTP systems
- - Low to medium complexity transactional queries
- Fast response time for queries in Mixed workload
systems - - Low to medium complexity transactional queries
- - High complexity operational decision support
queries - Galaxy QP is powerful and versatile
- Can be tailored to deliver good performance for
the above demands
34Offers Goal Oriented Optimization
- 3 optimization goals - can be set at
query/session/server level - fastfirstrow goal returns first ltngt rows of
result set quickly - set plan optgoal fastfirstrow
- select from Lineitem where Lineitem.l_suppkey
gt 2900 - order by Lineitem.l_orderkey
- - May use set rowcount 10 to limit to first 10
rows - - Lineitem has 300,000 rows
- Galaxy is 814 times faster with fastfirstrow
goal ! - Galaxy QP selects index based on the orderby
column
35Offers Goal Oriented Optimization
- 3 optimization goals - can be set at
query/session/server level - allrows_oltp generates plans useful for OLTP
queries - - optimizes for complete result set (allrows)
- - considers techniques relevant for OLTP e.g.
Nested Loop Join - allrows goal (default) generates plans for
mixed workload - - optimizes for complete result set (allrows)
- - considers all available optimization
techniques - - balances the needs of OLTP and operational
decision support
36Enhanced Query Processing in Galaxy
- Enhancements for specific requests
- Tackles Complex Queries
- Handles Large Data Sets
- Offers Goal Oriented Optimization
- Offers Auto Update Statistics
- Offers Advanced Monitoring Diagnosis
- And much more
37Offers Auto Update Statistics
- Identifies when update statistics would need to
be run - New built-in datachange() indicates dml changes
on a table - Utilizes idle cycles via housekeeper to schedule
update statistics automatically - Effectively utilizing system resources
- Shrinking maintenance windows
- Utilizes sampling by default
- Speeds up the process of update statistics
38Enhanced Query Processing in Galaxy
- Enhancements for specific requests
- Tackles Complex Queries
- Handles Large Data Sets
- Offers Goal Oriented Optimization
- Offers Auto Update Statistics
- Offers Advanced Monitoring Diagnosis
- And much more
39Offers Advanced Monitoring and Diagnosis
- QP Metrics
- Captures per query execution metrics
- - Min/max/avg PIO/LIO/CPU and number of times
query is executed - - SQL query text is captured as well
- - Stores in a system catalog - sysquerymetrics
- Easy to identify the problem queries
- - Simple ORDER BY selects from the catalog
- - Integrated in front end tools
- Easy to use
- - Metrics gathering can be enabled thru a
configurable option - - Stored metrics can be queried from
sysquerymetrics
40Offers Advanced Monitoring and Diagnosis
- Text format trace output
- No more 302, 310, 317 to diagnose the optimizer
- Set commands to tailor optimizer tracing
- - faster isolation of problems
- - can generate different levels of detail in Text
format - - set option show ltbrief/normal/longgt
- - can generate trace for specific area of concern
- set option show_histograms ltbrief/normal/longgt
- set option show search_engine ltbrief/normal/longgt
41Offers Advanced Monitoring and Diagnosis
- XML format trace output
- Can be queried using ASE built-in XML querying
capabilities - Ease of plan tracking for testing, diagnosis,
tool development - Generate the XML format trace
- Query the XML format trace
42Offers Advanced Monitoring and Diagnosis
- XML format trace output
-
- Generate the XML format trace
- - set plan for show_exec_xml to message on
- - select from A, B where a1
b1 - - select showplan_in_xml(-1)
- XML format trace output
- lt?xml version1.0"?gt
- ltEmitgt
- ltMergeJoingt
- ltDetailsgt
- ltJoinTypegt Inner Join lt/JoinTypegt
- lt/Detailsgt
- ...............
- lt/MergeJoingt
- lt/Emitgt
43Offers Advanced Monitoring and Diagnosis
- XML format trace output
- Query the XML format trace
- - set plan for show_exec_xml to message
on - - select from A, B where a1 b1
- - select "Found Plan" where
- '/Emit/MergeJoin' xmltest showplan_in_xml(-1)
44Offers Advanced Monitoring and Diagnosis
- GUI based plan viewer
- SHOWPLANs difficult to interpret as query
complexity increases - A graphical view of query execution plan more
intuitive - Galaxy provides a new tool planviewer
- Displays query execution plans in graphical
format from SHOWPLANs - Operator drill down capability
45Offers Advanced Monitoring and Diagnosis
- Sample graphical plan generated by planviewer
46Enhanced Query Processing in Galaxy
- Enhancements for specific requests
- Tackles Complex Queries
- Handles Large Data Sets
- Offers Goal Oriented Optimization
- Offers Auto Update Statistics
- Offers Advanced Monitoring Diagnosis
- And much more
47And Much More
- Improved DISTINCT processing
- Sophisticated parallel resource scheduler
- Bushy tree based plans for pipelined parallelism,
outer joins, etc - Complex predicates - search arguments with
expressions - Min/Max scalar aggregate processing with useful
indices - Accurate placement of correlated sub-queries,
outer joins in join orders - Accurate estimation of physical I/Os based on
caching - EXISTS query processing with Semi-Joins
- There is still more
48Current Status
- ASE (Galaxy) is being put through rigorous
quality control - New functional, stress, and regression testing
- In-house customer and ISV application testing
- A recent test with a mixed workload customer
application (using 21 key queries) reveals that
Galaxy server performs - significantly faster in 14 queries (from 1.22 to
18705 times) - equally well in 7 queries
- Standard benchmarking (TPC-C, TPC-H)
49The Ask
- No substitute for your feedback
- - Please join the Galaxy beta program to try out
the enhanced Engine - Your participation is invaluable
- - If you are looking for better query performance
- If you want to migrate applications from the
competition to ASE - With your help we can
50Enhanced Query Processing in Galaxy
unlock the true value of Galaxy query
processing
HIGH
Query Performance
LOW
LOW
HIGH
Query Complexity / Data Volume
Galaxy QP
51Enhanced Query Processing in Galaxy