Title: Arial 28pt.

1 / 51
About This Presentation
Title:

Title: Arial 28pt.

Description:

ASE135: Query processing for Mixed Workload Environments using ASE (Galaxy Release) ... (isnull(c.ClientName,'x') rtrim(c.ClientID) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Title: Arial 28pt.


1
ASE135 Query processing for Mixed Workload
Environments using ASE (Galaxy Release)
Joydeep Das Staff Software Engineer Joydeep.Das_at_sy
base.com August 15-19, 2004
2
The Enterprise. Unwired.
3
The Enterprise. Unwired.
Industry and Cross Platform Solutions
Unwire People
Unwire Information
Manage Information
  • Adaptive Server Enterprise
  • Adaptive Server Anywhere
  • Sybase IQ
  • Dynamic Archive
  • Dynamic ODS
  • Replication Server
  • OpenSwitch
  • Mirror Activator
  • PowerDesigner
  • Connectivity Options
  • EAServer
  • Industry Warehouse Studio
  • Unwired Accelerator
  • Unwired Orchestrator
  • Unwired Toolkit
  • Enterprise Portal
  • Real Time Data Services
  • SQL Anywhere Studio
  • M-Business Anywhere
  • Pylon Family (Mobile Email)
  • Mobile Sales
  • XcelleNet Frontline Solutions
  • PocketBuilder
  • PowerBuilder Family
  • AvantGo

Sybase Workspace
4
Agenda
  • Query Processing (QP) for Mixed Workload
    Environments
  • Motivation
  • Enhanced Query Processing solutions in Galaxy
  • Feedback and QA

5
Agenda
  • Query Processing for Mixed Workload Environments
  • Motivation
  • Enhanced Query Processing solutions in Galaxy
    Release
  • Feedback and QA

6
Motivation ASEs Current QP Capabilities
  • Extremely well-suited for demanding OLTP
    environments
  • Good plans for all types of transactional queries
  • Meets some demands of Operational DSS
  • Good plans for several types of complex queries
  • Meets some demands of mixed workload environments
  • (OLTP Operational DSS)
  • Good plans for mixed workloads if system
    appropriately configured for
  • Cache characteristics Large I/Os, LRU/MRU,
    bindings
  • Parallel operations table scans, index scans,
    joins

7
Motivation - Your Requests
  • Need better mixed workload capabilities from ASE
  • Need better performance for specific
    transactional queries
  • Need better performance for highly complex
    queries (reporting apps)
  • Need efficient query processing for large data
    sets (VLDB)
  • Need advanced database administration
    capabilities
  • More intuitive diagnostics to track problem
    queries
  • Useful tools to monitor query performance
    characteristics

8
Motivation - Our Response
  • We have reviewed your inputs and requests
  • To understand and appreciate your challenges
  • To provide quick solutions wherever possible
  • We have analyzed current and future industry
    trends
  • Based on your inputs and our analysis, we now
    offer
  • A rich and comprehensive set of query processing
    enhancements in the ASE Galaxy Release

9
Agenda
  • Query Processing for Mixed Workload Environments
  • Motivation
  • Enhanced Query Processing solutions in Galaxy
  • Feedback and QA

10
Enhanced Query Processing in Galaxy
  • Enhancements for specific requests
  • Tackles Complex Queries
  • Handles Large Data Sets
  • Offers Goal Oriented Optimization
  • Offers Auto Update Statistics
  • Offers Advanced Monitoring Diagnosis
  • And much more

11
Enhanced Query Processing in Galaxy
  • Enhancements for specific requests
  • Tackles Complex Queries
  • Handles Large Data Sets
  • Offers Goal Oriented Optimization
  • Offers Auto Update Statistics
  • Offers Advanced Monitoring Diagnosis
  • And much more

12
Enhancements For Specific Requests
  • Index Usage with mismatched data types
  • select from Parts, Lineitem where
    Parts.p_partkey Lineitem.l_partkey
  • - Parts.p_partkey is float
  • - Lineitem.l_partkey is int and has an index on
    it
  • - Parts is small (10,000 rows), Lineitem is
    larger (300,000 rows)
  • Galaxy is 3.7 times faster
  • Galaxy QP ensures that
  • - an index scan on Lineitem.l_partkey is selected
  • - the index scan is positioned based on values
    from Parts.p_partkey

13
Enhancements For Specific Requests
  • Accurate Join costing
  • select count() from Lineitem, Orders
  • where Lineitem.l_orderkey Orders.o_orderkey
  • and Lineitem.l_orderkey 3
  • - 60 of Lineitem.l_orderkey 3
  • - Lineitem has 30,000 rows Orders has 200 rows
  • - Highly duplicated value from l_orderkey results
    in data skew
  • Galaxy is 1.2 times faster
  • Galaxy QP estimates the join selectivity very
    accurately
  • join selectivity is based on dynamically computed
    join histograms
  • translates to optimal join order Lineitem, Orders

14
Enhancements For Specific Requests
  • Fewer Materializations avoiding work tables for
    sorting
  • select count(), l_orderkey from lineitem
  • group by l_orderkey
  • order by l_orderkey
  • - Lineitem has 300,000 rows
  • Galaxy is 3.7 times faster
  • Galaxy QP avoids sorting for ORDER BY by
    remembering the ordering from GROUP BY
  • Similar materializations also avoided for
    union-in-view queries

15
Enhancements For Specific Requests
  • Unlimited terms in IN-list
  • select from tab
  • where col1 in (1,2,3,..16000)
  • Galaxy QP will successfully process and execute
    this query
  • Only limited by size of procedure cache

16
Enhancements For Specific Requests
  • Removed column limits with grouped aggregation
  • select col1, . col50, MAX(col52) from tab
  • GROUP BY col1, . Col50
  • Galaxy QP will process and execute it
    successfully if widths of col1 col50 plus the
    result of the MAX is less than 16K
  • Max width increased to 16K, independent of page
    size
  • Limit of 31 columns in a GROUP BY clause removed

17
Enhanced Query Processing in Galaxy
  • Enhancements for specific requests
  • Tackles Complex Queries
  • Handles Large Data Sets
  • Offers Goal Oriented Optimization
  • Offers Auto Update Statistics
  • Offers Advanced Monitoring Diagnosis
  • And much more

18
Tackles Complex Queries
  • Improved Union operator processing
  • select l_orderkey, l_shipdate, l_commitdate,
    l_receiptdate    
  • from Lineitem where l_orderkey lt 10000     
  • union    
  • select l_orderkey, l_shipdate, l_commitdate,
    l_receiptdate     
  • from Lineitem where l_orderkey gt 50
  • - Lineitem has 300,000 rows
  • - No index on l_orderkey
  • Galaxy is 1.5 times faster
  • Galaxy QP uses new feature hash based union
    distinct

19
Tackles Complex Queries
  • Improved Aggregations
  • select sum(l_quantity), max(l_discount),
    l_suppkey, l_partkey    
  • from Lineitem group by l_suppkey, l_partkey
  • - Lineitem has 300,000 rows
  • - Index on l_suppkey, l_partkey
  • Galaxy is 2.8 times faster
  • Galaxy QP uses on-the-fly grouping (hash based)
  • - no work tables created

20
Tackles Complex Queries
  • Index Union
  • select from lineitem
  • where l_orderkey  between 10 and 20 OR
  • l_suppkey between 40 and 50
  • - Lineitem has 300,000 rows
  • - Indices available on l_orderkey and l_suppkey
  • Galaxy is 1.1 times faster
  • Uses new operator hash based union distinct

21
Tackles Complex Queries
  • Index Intersection
  • select Lineitem.l_orderkey, Lineitem.l_suppkey
  • from Lineitem where Lineitem.l_orderkey gt 1000
    and Lineitem.l_ lt 10000
  • and Lineitem.l_suppkey gt 200 and
    Lineitem.l_suppkey lt 2000
  • - Lineitem has 300,000 rows
  • - Indexes available on Lineitem.l_orderkey and
    Lineitem.l_suppkey
  • Galaxy is at least 1.1 times faster
  • Galaxy QP ensures that both indexes scanned and
    hash-intersected
  • - may be selected with Index Union if AND/OR
    present in query
  • - may be selected in STAR queries if FACT table
    has indexes for each DIMENSION table joining
    column

22
Tackles Complex Queries
  • Large number of tables
  • select from T1, T2, .. T50
  • where T1.a T2.a AND T2.a T3.a AND
    .. T49.a T50.a
  • Early greedy strategy combined with accurate
    cost-based plan pruning helps in obtaining good
    plans quickly
  • Built-in timer in Galaxy stops long running
    optimization
  • - the best plan chosen thus far is selected
  • - extra optimization may not yield a
    significantly better plan
  • - timer limit is user configurable at
    server/session/query level
  • - default 10 of internal query processing
    time given to optimization

23
Tackles Complex Queries
  • Superior plans for Star/Snowflake schema joins
  • STAR, SNOWFLAKE schemas have
  • - Large FACT table, smaller (relative to FACT)
    DIMENSION tables
  • - Joins only between FACT and DIMENSION tables
  • - Index available on joining columns

Star schema
Snowflake schema
24
Tackles Complex Queries
  • Superior plans for Star/Snowflake schema joins
  • select from lineitem, orders, parts, supplier
  • where lineitem.l_orderkey orders.o_orderkey
  • AND lineitem.l_partkey parts.p_partkey
  • AND lineitem.l_suppkey supplier.s_suppkey
  • - FACT table lineitem has 300,000 rows
  • - DIMENSION tables orders, parts, suppliers have
    10 rows each
  • Galaxy is at least 2 times faster
  • recognizes Star/Snowfalke patterns, places FACT
    in the end
  • May use Index intersection Lineitem joining
    columns have indices

25
Enhanced Query Processing in Galaxy
  • Enhancements for specific requests
  • Tackles Complex Queries
  • Handles Large Data Sets
  • Offers Goal Oriented Optimization
  • Offers Auto Update Statistics
  • Offers Advanced Monitoring Diagnosis
  • And much more

26
Handles Large Data Sets
  • Improved join algorithms
  • select count(), count(c.ProcessID),
    count(l.ProcessID)
  • from Clients c, Clients l
  • where (isnull(l.ClientName,'x')
    rtrim(lClientID)
  • (isnull(c.ClientName,'x') rtrim(c.ClientID))
  • - No indices, expression join Client is a large
    table ( gt 300,000 rows)
  • Galaxy is more than 100,000 times faster!
  • Galaxy QP selects Hash join method
  • Galaxy may select merge join when proper ordering
    available

27
Handles Large Data Sets
  • Support for data partitioning strategies
  • Round robin partitioning (available in
    pre-Galaxy)
  • - Data set distribution in round robin fashion
  • - No semantics associated, data row can end up in
    any partition
  • Hash based partitioning (NEW)
  • - Data set distributed according to a hash
    function
  • - Semantics associated data rows belong to
    specific partitions
  • Range partitioning (NEW)
  • - Data set distributed according to an attributes
    value range
  • - Semantics associated data rows belong to
    specific partitions

28
Handles Large Data Sets
  • Intelligent Partition elimination
  • create table customer(c_custkey int, c_name
    varchar(25),
  • c_region varchar(4), c_acctbal double) partition
    by range (c_custkey)
  • (PC1 values lt (100), PC2 values lt (200), PC3
    values lt (300))
  • - data partitioned by range customer has
    10,000 rows
  • select from customer where c_custkey
  • between 50 and 150 or c_custkey lt 10
  • Galaxy is 1.25 times faster
  • Galaxy QP Partition Elimination disqualifies
    partitions (PC3 here)
  • can handle complex AND/OR expression
  • can handle variables unknown at compile time

29
Handles Large Data Sets
  • Directed joins with partitioned data
  • create table orders(o_orderkey int, o_custkey
    int,
  • o_orderstatus char(1), o_price double,
    o_orderdate smalldatetime)
  • partition by range (o_custkey)
  • (PO1 values lt (100), PO2 values lt (200), PO3
    values lt (300))
  • - Orders, Customers have 10,000 rows both
    partitioned by range
  • select from customers, orders where c_custkey
    o_custkey and c_custkey gt 110
  • Galaxy is 2.5 times faster
  • Galaxy QP eliminates partitions PC1 and PO1 from
    customers and orders. Only corresponding
    partitions PC2/PO2, PC3/PO3 are joined

30
Handles Large Data Sets
  • Parallelism
  • select sum(o_price), o_custkey from orders
  • where o_orderdate gt 01/31/2003
  • group by o_custkey
  • - orders has 25,000 rows partitioned by
    o_custkey
  • Almost all SQL operations can be done in parallel
  • Galaxy is 1.2 times faster
  • Galaxy QP groups each partition separately, but
    merges results concurrently

31
Handles Large Data Sets
  • Parallelism
  • SQL operations can be done in parallel and
    pipelined to work in tandem
  • select sum(o_price), c_region from orders,
    customers
  • where o_orderdate gt 01/31/2003 and
  • c_custkey o_custkey group by c_region
  • Query has 2 partition join of customers and
    orders followed by Grouping, all of which are
    done in tandem

Pipe 1
Pipe 2
32
Enhanced Query Processing in Galaxy
  • Enhancements for specific requests
  • Tackles Complex Queries
  • Handles Large Data Sets
  • Offers Goal Oriented Optimization
  • Offers Auto Update Statistics
  • Offers Advanced Monitoring Diagnosis
  • And much more

33
Offers Goal Oriented Optimization
  • Applications often have varying query processing
    needs
  • Fast response time for first few rows of the
    result set
  • Fast response time for queries in OLTP systems
  • - Low to medium complexity transactional queries
  • Fast response time for queries in Mixed workload
    systems
  • - Low to medium complexity transactional queries
  • - High complexity operational decision support
    queries
  • Galaxy QP is powerful and versatile
  • Can be tailored to deliver good performance for
    the above demands

34
Offers Goal Oriented Optimization
  • 3 optimization goals - can be set at
    query/session/server level
  • fastfirstrow goal returns first ltngt rows of
    result set quickly
  • set plan optgoal fastfirstrow
  • select from Lineitem where Lineitem.l_suppkey
    gt 2900
  • order by Lineitem.l_orderkey
  • - May use set rowcount 10 to limit to first 10
    rows
  • - Lineitem has 300,000 rows
  • Galaxy is 814 times faster with fastfirstrow
    goal !
  • Galaxy QP selects index based on the orderby
    column

35
Offers Goal Oriented Optimization
  • 3 optimization goals - can be set at
    query/session/server level
  • allrows_oltp generates plans useful for OLTP
    queries
  • - optimizes for complete result set (allrows)
  • - considers techniques relevant for OLTP e.g.
    Nested Loop Join
  • allrows goal (default) generates plans for
    mixed workload
  • - optimizes for complete result set (allrows)
  • - considers all available optimization
    techniques
  • - balances the needs of OLTP and operational
    decision support

36
Enhanced Query Processing in Galaxy
  • Enhancements for specific requests
  • Tackles Complex Queries
  • Handles Large Data Sets
  • Offers Goal Oriented Optimization
  • Offers Auto Update Statistics
  • Offers Advanced Monitoring Diagnosis
  • And much more

37
Offers Auto Update Statistics
  • Identifies when update statistics would need to
    be run
  • New built-in datachange() indicates dml changes
    on a table
  • Utilizes idle cycles via housekeeper to schedule
    update statistics automatically
  • Effectively utilizing system resources
  • Shrinking maintenance windows
  • Utilizes sampling by default
  • Speeds up the process of update statistics

38
Enhanced Query Processing in Galaxy
  • Enhancements for specific requests
  • Tackles Complex Queries
  • Handles Large Data Sets
  • Offers Goal Oriented Optimization
  • Offers Auto Update Statistics
  • Offers Advanced Monitoring Diagnosis
  • And much more

39
Offers Advanced Monitoring and Diagnosis
  • QP Metrics
  • Captures per query execution metrics
  • - Min/max/avg PIO/LIO/CPU and number of times
    query is executed
  • - SQL query text is captured as well
  • - Stores in a system catalog - sysquerymetrics
  • Easy to identify the problem queries
  • - Simple ORDER BY selects from the catalog
  • - Integrated in front end tools
  • Easy to use
  • - Metrics gathering can be enabled thru a
    configurable option
  • - Stored metrics can be queried from
    sysquerymetrics

40
Offers Advanced Monitoring and Diagnosis
  • Text format trace output
  • No more 302, 310, 317 to diagnose the optimizer
  • Set commands to tailor optimizer tracing
  • - faster isolation of problems
  • - can generate different levels of detail in Text
    format
  • - set option show ltbrief/normal/longgt
  • - can generate trace for specific area of concern
  • set option show_histograms ltbrief/normal/longgt
  • set option show search_engine ltbrief/normal/longgt

41
Offers Advanced Monitoring and Diagnosis
  • XML format trace output
  • Can be queried using ASE built-in XML querying
    capabilities
  • Ease of plan tracking for testing, diagnosis,
    tool development   
  • Generate the XML format trace
  • Query the XML format trace

42
Offers Advanced Monitoring and Diagnosis
  • XML format trace output
  •   
  • Generate the XML format trace
  •   - set plan for show_exec_xml to message on
  • - select from A, B where a1
    b1               
  • - select showplan_in_xml(-1)
  • XML format trace output
  • lt?xml version1.0"?gt     
  • ltEmitgt                   
  • ltMergeJoingt                             
  • ltDetailsgt                                     
                                 
  • ltJoinTypegt Inner Join lt/JoinTypegt      
                          
  • lt/Detailsgt                            
  • ...............                     
  • lt/MergeJoingt 
  • lt/Emitgt

43
Offers Advanced Monitoring and Diagnosis
  • XML format trace output  
  • Query the XML format trace
  • - set plan for show_exec_xml to message
    on            
  • - select from A, B where a1 b1            
  • - select "Found Plan" where
  • '/Emit/MergeJoin' xmltest showplan_in_xml(-1) 

44
Offers Advanced Monitoring and Diagnosis
  • GUI based plan viewer
  • SHOWPLANs difficult to interpret as query
    complexity increases
  • A graphical view of query execution plan more
    intuitive
  • Galaxy provides a new tool planviewer
  • Displays query execution plans in graphical
    format from SHOWPLANs
  • Operator drill down capability

45
Offers Advanced Monitoring and Diagnosis
  • Sample graphical plan generated by planviewer

46
Enhanced Query Processing in Galaxy
  • Enhancements for specific requests
  • Tackles Complex Queries
  • Handles Large Data Sets
  • Offers Goal Oriented Optimization
  • Offers Auto Update Statistics
  • Offers Advanced Monitoring Diagnosis
  • And much more

47
And Much More
  • Improved DISTINCT processing
  • Sophisticated parallel resource scheduler
  • Bushy tree based plans for pipelined parallelism,
    outer joins, etc
  • Complex predicates - search arguments with
    expressions
  • Min/Max scalar aggregate processing with useful
    indices
  • Accurate placement of correlated sub-queries,
    outer joins in join orders
  • Accurate estimation of physical I/Os based on
    caching
  • EXISTS query processing with Semi-Joins
  • There is still more

48
Current Status
  • ASE (Galaxy) is being put through rigorous
    quality control
  • New functional, stress, and regression testing
  • In-house customer and ISV application testing
  • A recent test with a mixed workload customer
    application (using 21 key queries) reveals that
    Galaxy server performs
  • significantly faster in 14 queries (from 1.22 to
    18705 times)
  • equally well in 7 queries
  • Standard benchmarking (TPC-C, TPC-H)

49
The Ask
  • No substitute for your feedback
  • - Please join the Galaxy beta program to try out
    the enhanced Engine
  • Your participation is invaluable
  • - If you are looking for better query performance
  • If you want to migrate applications from the
    competition to ASE
  • With your help we can

50
Enhanced Query Processing in Galaxy
unlock the true value of Galaxy query
processing
HIGH
Query Performance
LOW
LOW
HIGH
Query Complexity / Data Volume
Galaxy QP
51
Enhanced Query Processing in Galaxy
  • Feedback Q A
Write a Comment
User Comments (0)