Varsity SQL for the Busy DBA - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Varsity SQL for the Busy DBA

Description:

Varsity SQL for the Busy DBA Fred Sobotka FRS Consulting, Inc. fred_at_frsconsulting.com Stick around for How to spot the SQL shortcut for a seemingly tedious task ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 34
Provided by: Help268
Learn more at: http://mwdug.org
Category:
Tags: dba | sql | busy | complex | data | olap | varsity

less

Transcript and Presenter's Notes

Title: Varsity SQL for the Busy DBA


1
Varsity SQL for the Busy DBA
  • Fred Sobotka
  • FRS Consulting, Inc.
  • fred_at_frsconsulting.com

2
Stick around for
  • How to spot the SQL shortcut for a seemingly
    tedious task
  • Common table expressions, the foundation for
    versatile SQL that is structured, testable, and
    easily debugged
  • Replacing unreliable, hand-coded XML generators
    with ANSI-standard SQL/XML
  • Identifying the most useful parts of the system
    catalog
  • SQL functions and patterns that provide the most
    bang for the buck

3
What does subsistence SQL smell like?
  • Application joins, high ratio of statements per
    transaction
  • Always with the temp tables (or perma-temp
    tables, even)
  • Hand-rolled programs that exist solely to
    generate XML
  • App anti-patterns that cleverly avoid SQLs
    strength in sets
  • Poor filtering of rows, fetching a row only to
    discard it

4
Which is worse - being asked these questions by
developers
  • How can I get this report to run faster?
  • Whats a more flexible alternative to EXPORT?
  • Can DB2 escape special characters for HTML XML?
  • Is it possible to audit changes to a few specific
    tables?
  • Why arent I getting the same access plans in
    production?

or not being asked at all?
5
Why might you as the DBA care about scruffy SQL?
  • Are long-running, inefficiently written report
    jobs crowding your batch window and cutting into
    your maint. time?
  • Have cumbersome ETL tools become a crutch that
    programmers rely on instead of coding smarter
    SQL?
  • Do developers swear up and down that they need
    you to build a replicated ODS just to handle some
    basic reports?
  • Do the problematic programs use cursors to
    iterate through data that is changing underneath
    it?
  • Is your manager hearing complaints that database
    performance is slow or unreliable when you
    know the primary cause is poor application
    design?

6
One Possible Solution Write Better SQL Queries
  • Strive to minimize the number of SQL roundtrips
    from app
  • Make user-defined temp tables the exception, not
    the rule
  • Keep joins inside the database, where they belong
  • Exploit DB2s mature, built-in XML generator
  • Use EXPLAIN to identify which parts drive up the
    cost

7
Common Table ExpressionsNot just for recursive
SQL youll rarely use
  • At the very least, CTEs can add meaning and
    structure
  • Often easier to read and follow than a single
    monster join

WITH mdctables AS ( SELECT DISTINCT tabschema,
tabname FROM syscat.indexesWHERE indextype
'DIM' )SELECT m.tabschema, m.tabname, t.npages,
t.fpagesFROM mdctables m INNER JOIN
syscat.tables t ON t.tabschema m.tabschema
AND t.tabname m.tabname ORDER BY t.fpages
DESC
CTE
Final SELECT
8
More about CTEs
  • Introduced in SQL1999 (a.k.a. SQL3)
  • Multiple CTEs can be declared used in a single
    stmt
  • CTEs can reference each other, or themselves
    (recursion)
  • Some CTE support offered in Oracle, MS SQL, all
    DB2
  • CTEs dont necessarily result in a worse access
    plan (EXPLAIN reveals that the statement is
    heavily re-written)

9
Why not just create views instead?
  • If its a vendor app, you may not be allowed to
    add them
  • Views create hard dependencies that could
    complicate schema evolution or even block a DB2
    upgrade
  • The query may be a one-off or an infrequent
    report/extract
  • Will the benefits be worth the team code review
    that new database objects may (should) require?

10
Ever make stir-fry?
  • The real work is in the prep, not the little bit
    of stirring
  • Do you ever sample ingredients before throwing
    them in?
  • Each CTE is a like a prep bowl with an ingredient
    of data
  • Sample (SELECT), sniff (EXPLAIN) until youre
    confident
  • Then uncomment the final SELECT at the bottom
    cook

11
Reasons to add more CTEs to your SQL diet
  • Testing/debugging a big stmt without disruptive
    changes
  • Grouping/aggregation can be combined w/ detailed
    rows
  • Targeted, precise row numbering ranking
    capabilities
  • Separate the business logic and filtering from
    final output
  • Allows you to query the logical data model you
    really want, not the physical model youre stuck
    with

12
Test any CTE at just about any time
WITH filteredorders AS (SELECT orderID, custID,
saleDT, itemID, salesregion FROM
sales.ordersWHERE referralsource IN
('WWW','EMAIL','FACEBOOK')) --Orig. final
SELECT is temporarily commented out SELECT
COUNT(), MIN(fo.saleDT), MAX(fo.saleDT),
COUNT(DISTINCT fo.itemID), COUNT(DISTINCT
fo.salesregion) FROM filteredorders fo
  • Are we grabbing the orders we really want?
    Theres no need to wait until the query is
    completely coded to find out.

13
Easy switching between literals and parameters
WITH parms( custID, saleDT ) AS ( SELECT--
While testing interactively, comment-- out
parameter markers -- CAST( ? AS INTEGER ), CAST(
? AS DATE ) 999999, '2010-05-01' FROM
sysibm.sysdummy1,
The rest of your query still references
parms.custID and parms.saleDT, and is not
impacted when you switch back and forth between
hardcoded literals and parameter markers.
14
Make ROW_NUMBER() RANK() work your way
WITH ordertimeline AS ( SELECT custID, saleDT,
ROWNUMBER() OVER ( PARTITION BT custID ORDER BY
saleDT ASC ) orderseq FROM sales.ordersWHERE
saleDT gt '2010-01-01 000000' ) SELECT
o1.custID, DATE( o1.saleDT ) saleDT1, DATE(
o3.saleDT ) saleDT3, TIMESTAMPDIFF( 16, CHAR(
o3.saleDT o1.saleDT )) elapsedDaysFROM
ordertimeline o1 INNER JOIN ordertimeline o3ON
o3.custID o1.custIDWHERE o1.orderseq 1 AND
o3.orderseq 3ORDER BY elapsedDays ASC
Which customers have already racked up three or
more orders this year? How many days elapsed
between each customers 1st and 3rd order?
15
Generating one-liners from SQL
  • Use CONCAT or to intersperse text literals in
    output
  • Numerics will need to be cast as CHAR() to mix
    with text
  • Always RTRIM() trailing spaces off of schema
    names
  • Escape any apostrophe (') with a double
    apostrophe ('')
  • Thats two apostrophe characters, not a quote

SELECT 'GRANT SELECT ON 'RTRIM( tabschema
)'.'tabname' TO ROLE READONLY'FROM
syscat.tables WHERE create_time gt CURRENT
TIMESTAMP 2 DAYS
and another
SELECT 'ALTER TABLESPACE 'tbspace'
REBALANCE' FROM syscat.tablespaces
16
Selecting and sorting are two different things
  • Your ORDER BY doesnt need to reference SELECT
    cols
  • ORDER BY can be very sophisticated if necessary
  • Re-order the alphabet with a CASE expression in
    a CTE
  • When something absolutely, positively needs to
    follow an exact ordering sequence, DB2 SQL has
    got you covered

17
Generating more complex code from SQL data
  • Build a query or CTE for each distinct section of
    the code
  • Initial preamble/heading, then repeating list of
    items, then a closer
  • Include a numeric literal in each result set for
    easy sorting
  • Define a second sorting column just for items in
    a list
  • Usually produced by a ROWNUMBER() expression in
    that query
  • Typical column layout for each query/CTE is
  • Key/name, code fragment, major sort seq, minor
    sort seq
  • Glue them all together via UNION ALL order the
    result by the PK/name, major sort seq, and the
    minor sort seq
  • But select just the column that contains the code
    fragments

18
How can SQL generate sophisticated, valid code?
  • obj_name code_fragment outer_sort inner_sort

prod.sales CREATE VIEW 0 0
UNION ALL
prod.sales some_colname 1 0 prod.sales ,
other_colname 1 1 prod.sales ,
yet_another_col 1 2
UNION ALL
prod.sales FROM x.tbl.. 2 0
Just be sure to ORDER BY obj_name, outer_sort
ASC, inner_sort ASCbut dont include them in
your SELECT list!
19
Generating complete CREATE VIEW statements
WITH codelines AS (SELECT tabschema, tabname,
'CREATE VIEW NEWSCHEMA.' tabname' AS SELECT
' AS code, 0 AS outersort, 0 AS innersort FROM
syscat.tables -- THE HEADER UNION ALL SELECT
tabschema, tabname, colname AS code, 1 AS
outersort, ROWNUMBER() OVER (PARTITION BY
tabschema, tabname ORDER BY colno) AS
innersortFROM syscat.columns WHERE remarks ltgt '
' --THE COLUMNS UNION ALL SELECT tabschema,
tabname, 'FROM 'RTRIM(tabschema)'.'tabname
' ' AS code, 2 AS outersort, 0 AS innersort
FROM syscat.tables -- THE FOOTER) -- NOW FOR
THE FINAL SELECT SELECT CASE WHEN ( innersort gt
1 ) THEN ', ' codeELSE code END AS code FROM
codelinesWHERE tabschema in () AND tabname IN
()ORDER BY tabschema, tabname, outersort,
innersort
20
How to handle commas when generating lists
  • You have 2 choices comma at end of line or at
    beginning
  • Its a lot easier to find the first item than the
    last item
  • Dont trust sequence numbers you didnt generate
  • Their sequence number 1 could be duplicated, or
    missing entirely
  • So, generate your own sequence number for your
    list, and make sure you put a comma before items
    2 and up

SELECT CASE WHEN ( innersort gt 1 ) THEN ', '
codeELSE code END AS code FROM
21
So what kind of code should we generate?
  • RUNSTATS/REORGCHK/REORG/REBIND
  • Recreate inoperative views
  • Grants and Revokes
  • Export Import/Load scripts with
    business-specific twists
  • Extract stored procedures to separate, named
    files
  • DDL for change capture tables
  • Mass ATTACH/DETACH scripts for partitioned tables
  • One-at-a-time program calls for each primary key
    of user data in need of some sort of ETL or other
    processing

22
Picking the least worst data for a report
WITH allchoices ( itemid, dataitem, weightscore
)AS ( SELECT itemkey, goodattr, 1 FROM
best.optionsUNION ALLSELECT itemid, altval, 2
FROM notasgood.infoUNION ALLSELECT item_id,
someguess, 3 FROM lastresort.hunch)SELECT
a.itemid,b,c,( SELECT x.dataitem FROM allchoices
x WHERE x.itemid a.itemid ORDER BY
weightscoreFETCH FIRST ROW ONLY ) AS
itemstatus,
Think of how often youve seen people write
entire programs full of subsistence SQL just to
handle this.
23
SQL/XML publishing functions
  • XML may not be in your database, but lots of
    people probably want XML out of your database
  • XML/SQL is a mature solution, introduced in
    SQL2003
  • Creates sophisticated, well-formed XML, including
    namespaces, from your tabular data, entirely from
    SQL
  • Does not require pureXML structures, XPath, or
    XSL
  • DB2 optimizer continues to improve at predicate
    pushdown for SQL/XML, yielding better access plans

24
XMLSERIALIZE and XMLELEMENT
  • XMLSERIALIZE specifies the output data type
    returned
  • Next is typically the outermost (top-level)
    XMLELEMENT
  • Everything else is embedded within that element

SELECT XMLSERIALIZE( XMLELEMENT(NAME
"table", XMLELEMENT(NAME "schemaname",
tabschema ), XMLELEMENT(NAME "tablename",
tabname ) )AS VARCHAR( 255 ) )FROM
syscat.tables
This returns the same number of rows as a
traditional SELECT that doesnt use SQL/XML.
25
Suppressing empty XML tags with XMLFOREST
  • XMLELEMENT will always generate an XML tag, even
    when the payload is NULL ltexcusesgtlt/excusesgt
  • If that isnt what you want, use XMLFOREST, which
    omits the tag whenever the payload happens to be
    NULL
  • XMLFOREST is also handy for generating a series
    of sibling elements for a list of values

SELECT XMLSERIALIZE( XMLELEMENT( NAME
"table", XMLFOREST( tabschema AS
"schemaname", tabname AS
tablename, remarks AS "comment" ) )
AS VARCHAR( 400 ) )FROM syscat.tables
26
Collapsing repeating items into a child element
  • XMLAGG and XMLGROUP are used to squash
    potentially multiple rows into sibling elements
  • XMLGROUP is the more sophisticated of the two
  • Offers more features with less typing
  • You can do one XMLAGG in an expression, but not
    more
  • Cartesian products will occur if you dont
    separate them
  • One XMLAGG containing five rows and another
    XMLAGG for three rows in the same SELECT will
    produce 15 elements inside each XMLAGG, which is
    probably not what you want.

27
Example of unwanted Cartesian product
  • WITH purchases( p ) AS ( VALUES 0 ),
  • coupons( p, c ) AS ( VALUES( 0, 1 ), ( 0, 2 ),(
    0, 3 ) ),
  • purchaseditems( p, s ) AS ( VALUES( 0, 'A' ), (
    0, 'B' ), ( 0,'C' ), ( 0,'D' ), ( 0,'E' )
  • )
  • SELECT XMLSERIALIZE( XMLELEMENT( NAME "Purchase",
  • XMLAGG( XMLELEMENT( NAME "CouponUsed", c ) ) ,
  • XMLAGG( XMLELEMENT( NAME "ItemBought", s ) )
  • )
  • AS VARCHAR( 2048 )
  • )
  • FROM
  • purchases p
  • LEFT OUTER JOIN coupons c ON p.p c.p
  • LEFT OUTER JOIN purchaseditems s ON p.p s.p
  • GROUP BY p.p

Although the GROUP BY collapses the final output
to one row, the number of items produced by each
XMLAGG is incorrect.
28
Correct results when grouping is pushed to CTE
  • WITH purchases( p ) AS ( VALUES 0 ),
  • coupons( p, c ) AS ( VALUES( 0, 1 ), ( 0, 2 ),(
    0, 3 ) ),
  • purchaseditems( p, i ) AS ( VALUES( 0, 'A' ), (
    0, 'B' ), ( 0,'C' ), ( 0,'D'), ( 0,'E' )
  • ),
  • cx( p, cxml ) AS (SELECT p, XMLAGG( XMLELEMENT(
    NAME "CouponUsed", c ) ) FROM coupons GROUP BY p
    ),
  • ix(p, ixml ) AS (SELECT p, XMLAGG( XMLELEMENT(
    NAME "ItemBought", i ) ) FROM purchaseditems
    GROUP BY p )
  • SELECT XMLSERIALIZE( XMLELEMENT( NAME "Purchase",
    cxml , ixml )
  • AS VARCHAR( 2048 )
  • )
  • FROM purchases p
  • LEFT OUTER JOIN cx ON cx.p p.p
  • LEFT OUTER JOIN ix ON ix.p p.p

The CTEs will return either one row or nothing,
which eliminates the risk of a cross-product.
29
Building the Perfect (XML) Beast
  • Build and test CTEs for major components/elements
  • The CTEs should handle any filtering and
    formatting
  • A second layer of CTEs collapses repeating
    children
  • Its easier if the CTEs referenced by the final
    SELECT all return at most one row each
  • The final SELECT lays out the XML tags for all of
    the CTEs, which are connected via LEFT JOIN to
    the main table

30
Varsity SQL isnt necessarily complex SQL
  • Describe the ideal collection of datasets youd
    like to see
  • Make the query do as much heavy lifting as
    possible
  • Assume the client app cant or wont refine the
    results
  • Favor a single moderately expensive query over
    hundreds or thousands of primitive SQL statements
    sent by an app
  • Think in stages finding, filtering, grouping,
    and presenting

31
Noteworthy SQL resources
  • DB2 SQL Cookbook by Graeme Birchall (free e-book)
  • Anything written by Joe Celko
  • DB2 XML Cookbook by Matthias Nicola and Pav
    Kumar-Chatterjee
  • Questions and answers posted to StackOverflow.com
  • DB2-L and IDUG Solutions Journal

32
Just a few presenters to read and watch
  • DB2 z/OS optimizer Terry Purcell
  • DB2 z/OS SQL Sheryl Larsen
  • DB2 LUW optimizer Calisto Zuzarte
  • DB2 LUW SQL Serge Rielau

33
Fred SobotkaIBM Gold Consultant IBM Data
ChampionFRS Consulting, Inc.fred_at_frsconsulting.c
om503-484-5032Skype/Twitter/AIM/Yahoo! db2fred
Write a Comment
User Comments (0)
About PowerShow.com